THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE THE USE OF GENERALIZED METHOD OF MOMENTS WITH BINARY OUTCOMES IN THE PRESENCE

Size: px
Start display at page:

Download "THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE THE USE OF GENERALIZED METHOD OF MOMENTS WITH BINARY OUTCOMES IN THE PRESENCE"

Transcription

1 THE UNIVERSITY OF OKLAHOMA HEALTH SCIENCES CENTER GRADUATE COLLEGE THE USE OF GENERALIZED METHOD OF MOMENTS WITH BINARY OUTCOMES IN THE PRESENCE OF TIME VARYING COVARIATES A DISSERTATION SUBMITTED TO THE GRADUATE FACULTY in partial fulfillment of the requirements for the degree of Doctor of Philosophy BY SUMMER GALE FRANK Oklahoma City, Oklahoma

2 THE USE OF GENERALIZED METHOD OF MOMENTS WITH BINARY OUTCOMES IN THE PRESENCE OF TIME VARYING COVARIATES APPROVED BY: Sara K. Vesely, Ph.D., Chair Michael P. Anderson, Ph.D. Laura A. Beebe, Ph.D. David M. Thompson, Ph.D. Eleni L. Tolma, Ph.D. DISSERTATION COMMITTEE 2

3 COPYRIGHT by Summer Gale Frank August 1,

4 ACKNOWLEDGEMENTS The path I took to reach the momentous achievement of completing my dissertation has been long, winding, and often steep. I have many people to thank for this achievement, because I know that I would not have reached the end of this journey without their encouragement, support, and guidance. First and foremost, I would like to thank my advisor, Dr. Sara Vesely, who invested countless hours to assure my success throughout the program and my dissertation. She always provided support and encouragement as well as an extra push when I needed it. It was an honor to be her first doctoral student and to share this journey with her. I also want to thank my dissertation committee. Dr. David Thompson and Dr. Michael Anderson helped me to refine my analysis/simulation code and research plans and provided assistance when simulation issues seemed insurmountable. Dr. Laura Beebe and Dr. Eleni Tolma kept me grounded, reminding me to keep in mind the realworld applications of my theoretical explorations. I am thankful for the many professors and staff in the Department of Biostatistics and Epidemiology and in the College of Public Health who provided me guidance or encouragement. I am also grateful to the students with whom I shared this rollercoaster ride. I appreciate every presentation practice session, impromptu coffee break, bit of encouragement (hug, smile, or kind word) and attentive ear during stressed ramblings. I am grateful to Dr. Trent Lalonde for sharing his simulation/analysis code and for providing advice as I adapted it for use in my dissertation. I also appreciate Brett Zimmerman and the staff at the OU Supercomputing Center for Education & Research 4

5 for helping me to adapt my code for use on the supercomputer and for helping me with error messages and incessant questions throughout my dissertation. Finally, I would not have been able to finish this dissertation without the support of my family. To my parents, Ilene and Brantley, who have always provided support and encouraged me to follow my dreams. They listened to me work through issues in my simulations, reminded me that they were confident in my abilities and made sure I was taking care of myself. I would not be the person I am today without your love and guidance. To Matt, who listened to my daily dissertation ramblings, fed my heart and my empty belly when I got home after long nights of dissertation work, and shined a light when I couldn t see the one at the end of the tunnel. To my sister, Lela, and her family, I m thankful for the reminder of the important things in life and for the chance to focus on being Aunt Summer when my dissertation became overwhelming. 5

6 TABLE OF CONTENTS LIST OF TABLES... 7 LIST OF FIGURES ABSTRACT Chapter I. INTRODUCTION II. BACKGROUND AND LITERATURE REVIEW III. METHODS AND MATERIALS IV. RESULTS V. DISCUSSION BIBLIOGRAPHY APPENDICES Appendix A

7 LIST OF TABLES Table Page 1. Table 1: Interpretation of strength of correlation (Mukaka, 2012) Table 2: Summary of comparison methods and criteria for estimating equation exclusion Table 3: Values to explore in single variable scenarios Table 4: Summary of components in simulation Table 5: Performance measures for evaluating methods Table 6: Crude analysis results for Youth Asset Study females (n=481) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N) Table 7: Crude analysis results for Youth Asset Study males (n=429) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N) Table 8: Part 1 of adjusted analysis results for Youth Asset Study females (n=481) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parental income, and time/wave Table 9: Part 2 of adjusted analysis results for Youth Asset Study females (n=481) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parental Table 10: Part 3 of adjusted analysis results for Youth Asset Study females (n=481) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parent Table 11: Part 1 of analysis results for Youth Asset Study males (n=429) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parental income, and time/wave

8 LIST OF TABLES CONTINUED 12. Table 12: Part 2 of analysis results for Youth Asset Study males (n=429) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parental income, and time Table 13: Part 3 of analysis results for Youth Asset Study males (n=429) modeling effect of the Positive Peer Role Model asset on the outcome of tobacco use in the last 30 days (Y/N), accounting for effect coded versions of age, race/ethnicity, parental income, and time Table 14: Crude analysis results for rehospitalization data (n=1,625) modeling effect of number of diseases (NDX) on the outcome of rehospitalization within 30 days (Y/N) Table 15: Part 1 of adjusted analysis results for rehospitalization data (n=1,625) modeling effect of number of diseases (NDX) on the outcome of rehospitalization within 30 days (Y/N), accounting for number of procedures (NPR), length of stay (LOS), and effect coded versions of presence of atherosclerosis (DX101) and time (T2 and T3) Table 16: Part 2 of adjusted analysis results for rehospitalization data (n=1,625) modeling effect of number of diseases (NDX) on the outcome of rehospitalization within 30 days (Y/N), accounting for number of procedures (NPR), length of stay (LOS), and effect coded versions of presence of atherosclerosis (DX101) and time (T2 and T3) Table 17: Comparison of performance measures in single Type I covariate with 500 subjects/clusters Table 18: Comparison of performance measures in single Type I covariate with 1000 subjects/clusters Table 19: Comparison of performance measures in single Type II covariate with 500 subjects/clusters and 3 time points Table 20: Comparison of performance measures in single Type II covariate with 500 subjects/clusters and 4 time points Table 21: Comparison of performance measures in single Type II covariate with 500 subjects/clusters and 5 time points Table 22: Comparison of performance measures in single Type II covariate with 1000 subjects/clusters and 3 time points

9 LIST OF TABLES CONTINUED 23. Table 23: Comparison of performance measures in single Type II covariate with 1000 subjects/clusters and 4 time points Table 24: Comparison of performance measures in single Type II covariate with 1000 subjects/clusters and 5 time points Table 25: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 3 time points and a time-dependent covariate weight of Table 26: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 3 time points and a time-dependent covariate weight of Table 27: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 3 time points and a time-dependent covariate weight of Table 28: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 29: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 30: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 31: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 5 time points and a time-dependent covariate weight of Table 32: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 5 time points and a time-dependent covariate weight of Table 33: Comparison of performance measures in single Type III covariate with 500 subjects/clusters, 5 time points and a time-dependent covariate weight of Table 34: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 3 time points and a time-dependent covariate weight of Table 35: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 3 time points and a time-dependent covariate weight of

10 LIST OF TABLES CONTINUED 36. Table 36: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 3 time points and a time-dependent covariate weight of Table 37: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 38: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 39: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 4 time points and a time-dependent covariate weight of Table 40: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 5 time points and a time-dependent covariate weight of Table 41: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 5 time points and a time-dependent covariate weight of Table 42: Comparison of performance measures in single Type III covariate with 1000 subjects/clusters, 5 time points and a time-dependent covariate weight of

11 LIST OF FIGURES Figure Page 1. Figure 1: Simulation process for Type I covariates in an example with three time points Figure 2: Simulation process for Type II covariates in an example with three time points Figure 3: Simulation process for Type III covariates in an example with three time points Figure 4: Mean number of estimating equations excluded in each method by time point in Type I covariates Figure 5: Mean number of estimating equations excluded in each method by TDC weight and time point in Type II covariates Figure 6: Mean number of estimating equations excluded in each method by TDC weight, feedback weight and time point in Type III covariates Figure 7: Absolute value of the bias in each method by time point in Type I covariates Figure 8: Absolute value of the bias in each method by TDC weight and time point in Type II covariates Figure 9: Absolute value of the bias in each method by TDC weight, feedback weight and time point in Type III covariates Figure 10: Empirical SE in each method by time point in Type I covariates Figure 11: Empirical SE in each method by TDC weight and time point in Type II covariates Figure 12: Empirical SE in each method by TDC weight, feedback weight and time point in Type III covariates Figure 13: Coverage versus relative efficiency for all methods in number of subject/cluster values of 500 and 1000 by time-varying covariate type

12 LIST OF FIGURES CONTINUED 14. Figure 14: Coverage versus relative efficiency for all methods in number of subject/cluster values of 500 and 1000 by time-varying covariate type and number of time points plotted in all methods using full X-axis width Figure 15: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by time-varying covariate type and number of time points plotted using a smaller X-axis width Figure 16: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by time point for timevarying covariate Type I Figure 17: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type II Figure 18: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 19: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 20: Coverage versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 21: Absolute value of the bias versus relative efficiency for all methods in number of subject/cluster values of 500 and 1000 by time-varying covariate type Figure 22: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by timevarying covariate type and number of time points Figure 23: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by time point for time-varying covariate Type I

13 LIST OF FIGURES CONTINUED 24. Figure 24: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type II Figure 25: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 26: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 27: Absolute value of the bias versus relative efficiency for GMM-Rho2 and comparison methods in number of subject/cluster values of 500 and 1000 by number of time points and time-dependent covariate weight for time-varying covariate Type III and feedback weight Figure 28: Prevalence versus time point in time-dependent covariate weight scenarios by time-dependent covariate type and number of clusters/subjects Figure 29: Mean versus time point in time-dependent covariate weight scenarios by time-dependent covariate type and number of clusters/subjects Figure 30: Prevalence versus time point in feedback weight scenarios by timedependent covariate type and number of clusters/subjects Figure 31: Mean covariate value versus time point in feedback weight scenarios by time-dependent covariate type and number of clusters/subjects

14 ABSTRACT Introduction: Statistical models for longitudinal data that include covariates that are timevarying can produce biased parameter estimates. The generalized estimating equations (GEE) method employing an independent working covariance structure is considered the safe choice to reduce bias, but can produce inefficient estimates of the standard error of a parameter estimate. Alternatively, the generalized method of moments (GMM) can be used to combine estimating equations (EE). Some researchers propose excluding invalid EE from GMM. EE are invalidated by their assumed covariate type or by directly testing standardized residuals (at time t) and weighted covariates (at time s) in a logistic regression model, for each combination of s and t (α=0.05). Methods: This dissertation proposed identifying invalid EE through the correlation between the studentized residuals at time t and weighted covariate at time s where s t (ρ 0.2 and α=0.05); EE on the diagonal were never excluded. Higher correlation thresholds (ρ=0.3 and 0.4) were also explored. We compared the performance of the GEE independence method and two previously developed GMM methods to my proposed GMM method in three real-world analyses and various simulated data scenarios (cluster size= 500 and 1000; time points= 3, 4, and 5; covariate type = I, II, III). Simulations explored various strengths of weights representing the feedback from the previous outcome to the current covariate (feedback weight) and the previous covariate value on the current outcome (time-dependent covariate weight). Results: The simulation studies demonstrated that higher thresholds for correlation exclude too few EE, producing parameter estimates with larger bias than other methods. When the smallest correlation value (ρ=0.2) was employed, my method had smaller bias 14

15 and standard error in many simulation scenarios, tending to perform best in extreme combinations (i.e. high feedback weight strength combined with low time-dependent covariate weight strength, or vice-versa). My method had the smallest standard deviation in two real-world analyses and shared the second smallest value in the third with another method. Discussion: The comparative performance of my method depends upon the strengths of the feedback and time-dependent covariate weights and the correlation threshold used to identify invalid EE to exclude from the GMM calculation. 15

16 CHAPTER I INTRODUCTION The ability to obtain unbiased estimates of the effect of an exposure on an outcome of interest is an important aspect of public health research. In most cases, researchers will need to account for multiple explanatory variables, or covariates, that also affect the outcome and it is likely that one or more of these variables will vary over time. The presence of a covariate that varies over time can introduce bias in the calculation of parameter estimates in most statistical techniques. Therefore, it is important to address the bias that could be introduced by these variables. However, the methods that assure an unbiased estimate can hinder the researcher s efforts to obtain an efficient estimate and result in an overestimation of the standard error. This overestimation can result in wider confidence intervals surrounding parameter estimates, which can lead to incorrect conclusions regarding the effect of an exposure on the outcome. In public health, research efforts are often directed toward health and behavior outcomes in the general population or in subsets of it such as youth. If we conclude that an exposure does not have an effect on an outcome or that our confidence in its magnitude is not sufficient to warrant pursuit, it is likely to be excluded from future intervention efforts. It is, therefore, essential that research studies utilize analysis methods that can balance this trade-off between bias and efficiency. Many public health studies are longitudinal in nature and aim to estimate the effect of multiple explanatory variables on a dichotomous, or binary, outcome. For example, a cohort of adolescent participants may be enrolled and then followed annually over time to explore the effect of having a positive peer role model on reported tobacco 16

17 use (yes or no). The effect of additional explanatory variables such as the sex, family structure and race/ethnicity of the youth may be of interest. To determine if there is a relationship between these variables and the binary outcome of reported tobacco use, the most appropriate analysis method must be chosen. Methods for independent data can be used when one can be fairly certain that the measurement for one individual has no influence on the measurement for another individual. However, in longitudinal studies we generally observe that repeated measurements for a particular individual are correlated; therefore, one can no longer assume independence. This is an important aspect of the data as it guides the analysis methods that a researcher uses to explore and interpret those data. Generalized Estimating Equations (GEE) can be used to account for this correlation when one is interested in the marginal, or population-averaged, effect of an explanatory variable on the outcome of interest. GEE allows one to model the covariance among measurements based upon an assumed pattern of correlation among outcomes. Liang and Zeger (1986) introduced the GEE method in their seminal paper, which postulated a working covariance structure which could be used to obtain consistent coefficient estimates without having to specify the correct correlation structure. Generally, this method produces unbiased coefficient estimates and standard errors even when the covariance structure is not correctly specified (Diggle, Heagerty, Liang, & Zeger, 2002; Liang & Zeger, 1986). However, GEE may produce biased coefficient estimates in the presence of explanatory variables whose values vary over time, termed time-varying covariates. Time-varying covariates are variables whose values can change during a study. They are in direct contrast to time-invariant covariates whose values are fixed or stationary throughout a study. Examples of time-invariant covariates include biological sex and race/ethnicity. Time-varying covariates can change systematically over time, 17

18 such as the measurement time or the age of a participant for a study in which measurements are taken once per year. They can also change over time by study design, such as when a researcher assigns a treatment to a participant in a randomized crossover study. When time-varying covariates change systematically or by study design, they are referred to as exogenous because the values of the covariate are externally controlled. For exogenous variables, the covariate at a particular time is independent of the preceding outcome measurements because the values of the covariates at the time of measurement are determined in a manner that is unrelated to the outcome. However, when a covariate changes randomly over time in a manner that is not systematic or fixed, it is referred to as endogenous. Examples of endogenous variables include body mass index (BMI) and having a positive peer role model. For endogenous variables, the covariate at a particular time can depend upon the preceding covariate values and outcome measurements. The categorization of time-varying covariates was refined by Lai and Small into Types I, II, and III (Lai & Small, 2007). They redefine exogenous covariates as Type I and endogenous covariates as either Type II or Type III. The most complex covariate type is the Type III variable. For a Type III covariate, the current outcome value can be affected by the previous covariate value and the current covariate value can be affected by the previous outcome value such that there is a feedback from the previous outcome to the current covariate value. For a Type II covariate, only the current outcome value can only be affected by the previous covariate value; there is no feedback loop from the previous outcome value to the current covariate value. Consider the example of a study that examines the effect of having a positive peer role model on reported tobacco use. If having a positive peer role model was a Type II covariate, then having a positive peer role model at the previous time point could affect the current reported tobacco use, but reported tobacco use at the previous measurement would not affect whether a person 18

19 had a positive peer role model at the current measurement. Conversely, if having a positive peer role model was a Type III variable, then having a positive peer role model at the previous time point could affect the current reported tobacco use and having a positive peer role model at the current measurement time could be affected by the previous reported tobacco use. This might be observed if a potential positive peer role model chose not to spend time with a person who uses tobacco, but would reconsider after the person quit using tobacco. Pepe and colleagues have shown that using a non-diagonal working covariance matrix produces biased estimates of the parameters when GEE methodology is used in the presence of time-varying covariates (Diggle et al., 2002; Pepe & Anderson, 1994). Pepe and Anderson (1994) concluded that to obtain unbiased estimates, the safest choice for the working covariance structure is the diagonal, independence structure. However, the diagonal structure specifies that outcomes are uncorrelated and ignores the correlation between an individual s repeated measures. This results in an overestimation of the standard error and, in turn, a loss of efficiency (G. M. Fitzmaurice, 1995). This dissertation addresses this balance between bias and efficiency when estimating the population-averaged effect of explanatory variables on a binary outcome. Discussions of this balancing act have continued over the past 20 years and various methods have been proposed to deal with the issues introduced by time-varying covariates. Some methods address the issues less effectively than others. One method proposed to account for independent variables whose values change over time is the process of lagging data. A lagged covariate can be used to assess the effect of a time-varying exposure at a prior time on an outcome. However, the length of the lag is at the discretion of the researcher and depends on the situation. Consider the example where a researcher takes measurements of an exposure and an outcome at the same time annually for five years. If the researcher deems the most 19

20 recent measurement to be important in assessing the effect of the exposure on the outcome (i.e. the effect of a quickly metabolized drug on patient symptoms), then they may choose to lag the data by one year. Conversely, the researcher may be interested in the effect of a drug that must build up in the system before impacting patient symptoms such that there is a longer interval between the exposure and its anticipated effect on the outcome. In that case, the researcher might consider the effect of the exposure at two or three measurement times before the outcome measurement to be of importance. The appropriate lag time is commonly unknown, so often a researcher must explore a number of possible lags. Though one can use two or more lagged versions of a single covariate when a single lagged variable is insufficient, this can introduce issues with highly correlated predictors, uncertainty regarding the number of lagged covariates to include, and specification of the functional form of the coefficients (Diggle et al., 2002). Even once a researcher decides which lag to apply, use of a non-diagonal working covariance structure can still introduce bias and care must be taken when exploring alternative structures (Schildcrout & Heagerty, 2005). Moreover, most applications would only use the exposure that directly preceded the outcome measurement to model the entire outcome process, which is unhelpful when the entire covariate history is of interest (Diggle et al., 2002). Transition models also make use of multiple lagged variables, but the marginal methods developed for these models have similar issues of bias in the presence of time-varying covariates (Azzalini, 1997; Heagerty, 2002; Heagerty & Zeger, 2000). Marginal structural models, using inverse probability weighting, have been proposed to model time-varying covariates in a longitudinal setting. However, the use of weights in this method also introduces efficiency issues, which worsen as the weight increases (Robins, Greenland, & Hu, 1999; Robins, Hernán, & Brumback, 2000). 20

21 Other methods, which do not involve decisions about lagged versions of timevarying covariates, have been demonstrated to be more effective in striking a balance between bias and efficiency when estimating the population-averaged effect of explanatory variables that vary over time. The Generalized Method of Moments (GMM) technique allows one to derive estimators from moment conditions where moments are included in the final calculation of the parameter estimates based on a weight matrix that gives more weight to the most efficient estimators (Hansen, 1982). Lai and Small (2007) and Qu, Lindsay, and Li (2000) have adapted the GMM for use in marginal regression analysis of longitudinal data in the presence of time-varying covariates; however, these studies were directed at specific types of time-varying covariates. Additionally, Lai and Small required the categorization of variables into the specific types of time-varying variables such that the type of variable dictated which moments would be included in the final calculation of the parameter estimates (Lai & Small, 2007). Lalonde, Wilson, and Yin (2014) extended the GMM beyond these specific types of time-varying covariates without dictating the inclusion of sample moments by covariate type. However, the work published by Lalonde and colleagues did not explore how the method worked in various simulated data conditions. Furthermore, their published work explored only one metric for identifying sample moments for inclusion. No study to date has provided a thorough examination of the methods proposed to analyze data with time-varying covariates in a variety of simulated data settings. Thus, in addition to examining other metrics for identifying appropriate sample moments, this dissertation fills an important gap in the literature by exploring the performance of multiple methods in various simulated data conditions. 21

22 Research Goals This study investigates a GMM technique for combining estimating equations in a manner that achieves an estimate with an improved balance between bias and efficiency. The primary aim is to improve the GMM calculation of the standard error of the parameter estimate introduced by Lalonde et al. and explore alternative metrics to identify which valid estimating equations to include in the sample moment estimation (T. L. Lalonde et al., 2014). I compare performance in simulated data conditions using the same mechanisms as those used by Lalonde et al. and in two real-world datasets. I compare my method to 1) the GEE method using the independence matrix as proposed by Pepe and Anderson, 2) the GMM method introduced by Lai and Small which categorizes time-varying covariate by type, and 3) the GMM method proposed by Lalonde et al. (Lai & Small, 2007; T. L. Lalonde et al., 2014; Pepe & Anderson, 1994). Additionally, I will provide a more thorough comparison of these methods using simulated data that explores a wider array of scenarios which are intended to emulate the variability of features that might be seen in real-world datasets. Specific Aims By using data from real-world data sets and simulated data scenarios, this study will address the following specific aims. I. Improve the GMM method proposed by Lalonde et al. by using additional metrics to identify which sample moments to include in parameter estimation a. Studentized residuals instead of standardized residuals b. Correlation (R) value in addition to p-value cut-off i. Values of R to explore include 0.2, 0.3, and 0.4 c. Require inclusion of estimating equations along the diagonal (where s=t) 22

23 II. Compare the improved GMM method in a variety of simulated data scenarios to three methods (independent GEE, GMM method proposed by Lai and Small, and GMM method proposed by Lalonde et al.) a. Specific aspects of simulated data scenarios to explore i. Number of clusters 1. Values to explore include 500, and 1000 ii. Number of repeated observations per cluster 1. Values to explore include 3, 4, and 5 iii. Time-dependent covariate and feedback weights 1. Values to explore for time-dependent covariate weight include 0.7, 0.8, and Values to explore for feedback weight include 1.3, 1.5, and 1.7 b. Real-world datasets i. Rehospitalization ii. Youth Asset Study 23

24 CHAPTER II BACKGROUND AND LITERATURE REVIEW Notation The mathematics in this research will be described with the following notation. Vectors and matrices will be bolded. A transpose will be notated using the prime symbol (i.e. the transpose of the vector A will be notated as A ). The total number of observations in a dataset will be indicated with the letter N, while the repeated measures within a cluster will be represented by the letter t. The number of parameters to estimate will be identified by the letter p. As we have repeated measurements, subjects will be indexed using the letter i, while the observations within each subject will be indexed with a j. The number of repeated measurements on a subject will be identified, n i. This means that n ij would be the observation at the j th measurement time for the i th subject. Each subject has the outcome/response values, Y i = (Y i1,, Y ini ) and time-varying covariate matrix X i = (x i1,, x ini ). Although the components of Y i are generally correlated, when i j we assume that Y i and Y j are independent. Z i is a matrix of timeinvariant covariates where the number of columns is equal to the number of timeinvariant covariates. Marginal Regression Models The purpose of this research is to investigate the performance of an improved form of the generalized method of moments (GMM) approach to combining generalized estimating equations (GEE) in marginal regression models introduced by Lalonde and colleagues (T. L. Lalonde et al., 2014). Marginal regression models account for the correlation present in longitudinal or clustered data for which independence can no longer be assumed. Because marginal methods model the outcome as a function of the 24

25 explanatory variables separately from the association among the outcome s repeated measures for each individual, they are appropriate when one intends to make inferences about the population-average parameters. The generalized linear model (GLM) is a general class of marginal regression models that share a few salient features. It is assumed that the mean of the outcome, μ i = E(Y i ), is related to a vector of covariates, x, through a link function such that h(μ i ) = x i β and the variance of Y i is a known function of the mean of the outcome (Garrett M Fitzmaurice, Laird, & Ware, 2012). Regression coefficients in GLM can be solved using an iteratively reweighted least squares method of maximum likelihood estimation introduced by Nelder and Wedderburn (Nelder & Wedderburn, 1972). In GLM, the form of the likelihood function is a member of the exponential family of distributions. However, this method was extended to other distributions outside of the exponential family in the quasi-likelihood (QL) method first described by Wedderburn (Wedderburn, 1974). This QL method makes assumptions about the link and variance functions without specifying the entire distribution of the outcome, Y i ; however, the data are assumed to be independent. The generalized estimating equations (GEE) method extends the QL method to correlated data. GEE Overview First introduced by Liang and Zeger, GEE allows one to account for the correlation, or non-independence, of the repeated measurements taken on an individual or in observations that have natural groupings, or clusters, such as families, classrooms, or clinics (Liang & Zeger, 1986). The parameter estimates from GEE describe the changes in the population mean of a particular outcome given changes in the covariates while accounting for the correlation observed in the clusters of repeated measurements. Termed a marginal or population-averaged model, this method requires one to specify 25

26 the marginal distributions of the outcome at each time point, y i, and the working correlation matrix describing associations among elements of the vector of each subject s repeated observations. However, one does not need to specify the full joint distribution of a subject s observations. The t generalized estimating equations are shown below. N D 1 i V i (y i µ i ) = 0 i=1 The vector of predicted mean responses/outcomes for each individual is represented by the symbol µ i where the i indicates the mean value is for the i th individual, while y i is the vector of actual outcomes for each individual. D i is a derivative matrix that contains the derivative of µ i with respect to the parameter estimates, β. The D matrices transform each individual s set of predicted and actual outcomes (µ i and y i ) from their original units to that of the link function that we use to estimate the expected value of the outcome. V i is an approximation of the true covariance matrix for the outcome, Y i and is composed of the variances on the diagonal and the working covariances on the off-diagonal (Garrett M Fitzmaurice et al., 2012). V i = A i 12 Corr(Y i )A i 1 2 A i is a diagonal matrix whose non-zero elements represents the variance of the outcomes Y i (and when the 1/2 is applied, the standard deviation) and depends upon the parameter estimates, β. Corr(Y i ) represents the assumed correlation matrix and is made up of pairwise association parameters, α, which are assumed to represent the pairwise correlations among the outcomes/responses. Since the working covariance matrix in GEE depends upon the assumed correlation matrix in addition to the parameter estimates, the GEE are functions of β and α (Diggle et al., 2002). 26

27 GEE Performance This study will assess the performance of the new method by comparing various estimates of the bias and efficiency of its estimator to those of other methods. The GEE method is among many longitudinal methods that implicitly assume that the exposure of interest does not affect other covariates modeled to estimate the outcome of interest and that the outcome does not affect the exposure of interest or other covariates. Therefore, these methods do not account for the effect of the outcome on either the exposure or covariates of interest or for the effect of previous outcomes on current or future outcomes. In other words, these methods do not properly account for time-varying covariates. Failure to account for time-varying covariates can introduce bias into the calculation of the parameter estimates (Rothman, Greenland, & Lash, 2008). Pepe and Anderson recognized that the parameter estimates might be biased when modeling time-varying covariates using GEE with a non-diagonal working covariance structure (Pepe & Anderson, 1994). The potential for bias is due to the violation of an important assumption in GEE models. In GEE it is assumed that the expectation of Y i (the expected value of the outcome) given the entire exposure history X i and the baseline covariates depends only on the past exposure history X i-1 and baseline covariates. E(Y i X i1, X i2,, X it,, X it, Z 1 ) = E(Y i X i1, X i2,, X it 1, Z 1 ) In other words, the expected value of the outcome at a particular time, t, depends only on the covariates prior to that time. If the current value of the outcome Y i given the current value of the covariate X i predicts the subsequent value of the covariate X i+1 then this assumption will be violated (Garrett M Fitzmaurice et al., 2012). Diggle refers to this as the full covariate conditional mean (FCCM) (Diggle et al., 2002). When a covariate changes over time, this assumption might not hold because the outcome at that time, t, 27

28 would need to take into account the entire covariate process not just the covariate values prior to time t. Therefore, the FCCM assumption is not met. E(Y i X i1, X i2,, X it,, X it, Z 1 ) E(Y i X i1, X i2,, X it 1, Z 1 ) In general, one can quantify the bias in an estimator by subtracting the value of the parameter estimated by the particular analysis method, θ, from the expected value of the parameter, E(θ). An estimator is unbiased when this difference is equal to zero. bias(θ) = E(θ) θ = 0 The GEE estimator for the parameter of interest, S β (β, α), is calculated using the following estimating equations. N S β (β, α) = D i V 1 (y i µ i ) = 0 i=1 Therefore, a GEE estimator is unbiased when the following equation is true: bias[s β (β, α)] = E[S β (β, α)] S β(β, α) = 0 Since the estimated value of the parameter in GEE, S β (β, α), is equal to zero, then this equation becomes, bias[s β (β, α)] = E[S β (β, α)] 0 = 0. Thus, the estimator S β (β, α) is unbiased when its expectation is equal to zero. Pepe and Anderson showed that when one can ensure that the assumption of the FCCM holds, then the expected value of the estimator, S β (β, α), is equal to zero, so the estimator is unbiased (Pepe & Anderson, 1994). E[S β (β, α)] = 0 However, when the FCCM assumption does not hold, then we are not assured an unbiased estimator i.e. we cannot be assured that the expectation will equal zero. 28

29 In the GEE framework, when an independence working correlation matrix is used, no pairwise correlation coefficients are estimated (i.e. the working correlation matrix is estimated with zero elements off its diagonal) because the multiple measurements on the same sample/cluster are assumed to be uncorrelated. This simplified estimation of the outcome assures that it will meet the implicit assumption of GEE such that the FCCM assumption will hold, but results in an inefficient estimator. Conversely, utilizing a non-diagonal working correlation matrix introduces the pairwise correlation coefficients that help us to efficiently estimate the effect of the covariate process on the outcome. Yet, these pairwise correlations introduce the possibility that our estimation of the outcome process would need to account for the entire exposure history, X i, not just the covariates prior to that time, X i-1. Therefore, it is possible that the FCCM assumption would no longer hold. Time varying-covariates Not all time-varying variables violate this assumption of GEE; therefore, it is important to understand which of the various types of time-varying variables introduce bias in the GEE framework. Time-varying covariates can have a process that is considered either exogenous or endogenous with respect to the outcome process. When the time-varying covariate is considered exogenous, then the covariate value at a particular time is independent of all previous outcome measurements and is controlled externally rather than being determined by factors within the covariate process. An example of an exogenous variable would be the measurement time which is not stochastic and is externally controlled. Another example of an exogenous variable would be the treatment assignment in a randomized cross-over study design; the treatment assignment is determined by study design not by the covariate or outcome process. For exogenous covariates, the assumption holds that the expected value of the 29

30 outcome at a particular time, t, depends only upon the covariates prior to that time, and so coefficients estimated from modeling this covariate using GEE will be unbiased. Alternatively, a time-varying covariate is considered endogenous when its value is determined by factors within the covariate process, or by factors within the outcome process that introduce a feedback loop. One example of an endogenous, time-varying covariate is found in a study of the association between body mass index (BMI) and child morbidity discussed by Lai and Small (Lai & Small, 2007). A child s BMI or morbidity at a particular time could be determined by the child s BMI at the previous time i.e. by factors within the covariate process. Another possibility is that the child s BMI could be determined by the child s BMI at the previous time while also being determined by the outcome (whether or not the child was ill) at the previous measurement. For instance, a child who is sick may not eat as much as usual, resulting in a lower weight at the next measurement. A feedback loop exists from the outcome process to the covariate process in the second, but not the first example. However, in both examples, previous values of the covariate could affect the covariate s current value. The GMM estimation method introduced by Lai and Small distinguishes between exogenous and endogenous types of time-varying covariates depending on the effect of variable on the covariate and outcome processes (Lai & Small, 2007). Lai and Small more narrowly define endogenous covariates into two types, Type II and Type III, whereas they use the term, Type I covariate, to identify exogenous variables. They identify the variable in the former example (where BMI and its effect on the outcome is only determined by previous values of the covariate) as a Type II covariate and the variable in the latter example (where BMI and its effect on the outcome is altered by a feedback loop from the outcome to the covariate as well as previous values of the covariate) as a Type III covariate. These variable types (Type I, Type II, and Type III) dictate which estimating equations will be included in the GMM estimation. 30

31 Time-varying Covariates and Model Residuals Lai and Small further describe how these covariate types are correlated with the residuals of the model in which they are included (Lai & Small, 2007). For Type I covariates, the effect of the current covariate value on the outcome is not dependent upon past covariate values and there is no feedback from the outcome process to the covariate process. This means that both future and past values of the covariate are uncorrelated with current residuals of the marginal regression model. For Type II covariates, the effect of the current covariate value on the outcome is dependent upon past covariate values but there is no feedback from the outcome process to the covariate process. Therefore, future values of the covariate are uncorrelated with current residuals, but past values of the covariate are correlated with current residuals. For Type III covariates, the effect of the current covariate value on the outcome is dependent upon past covariate values and there is a feedback from the outcome process to the covariate process. Therefore, both future and past values of the covariate are correlated with current residuals. Lai and Small postulate that, when the independent working correlation structure is used, valid estimating equations are being excluded from the estimation of the outcome. This can be illustrated through the calculation of the expected value of the outcome in GEE which is shown below in an example with three correlated measurements within the cluster. In the expected value, the working correlation, Corr(Y i ), can be represented as a matrix that indicates which estimating equations will be included in the calculation. If a particular estimating equation should be included (i.e. that estimating equation is considered not biased), then a correlation value will be present (represented below as m) in that row/column location. However, if the estimating equation is considered biased and should not be included, then the value will 31

32 be zero. Recall that D i is a matrix of partial derivatives of µ i with respect to the parameter estimates, β. E[D i A i 12 Corr(Y i )A i 12 (y i µ i )] = 1/σ 0 0 m 11 m 12 m 13 = D i ( 0 1/σ 0 ) ( m 21 m 22 m 23 ) ( 0 0 1/σ m 31 m 32 m 33 1/σ /σ /σ y 11 µ 11 ) ( y 12 µ 12 ) y 13 µ 13 When a non-independence structure is employed, all pairwise correlation values are used to calculate the estimator i.e. the Corr(Y i ) matrix will have a value in each row/column combination, i.e. m 11, m 12, etc.). m 11 m 12 m 13 Corr(Y i ) = ( m 21 m 31 m 22 m 32 m 23 ) m 33 For a Type I variable, for which there is no reason to believe that any of the estimating equations would be invalid, Lai and Small advocate using all values in the correlation matrix rather than the independence structure, as shown above. In this case, when the appropriate matrix multiplication is performed, all possible estimating equations contribute to estimation. Where T indicates the number of functions to estimate and is equal to t, the number of repeated measures per subject, all T 2 functions will be used i.e. 3*3 = 9. However, in the presence of a Type II covariate, Lai and Small postulate that only T(T+1)/2 estimating equations will be valid i.e. 3(3+1)/2=6. They recommend using the following correlation structure as it defines future values of the covariate as uncorrelated with current residuals. Corr(Y i ) = ( m m 21 m 22 0 m 31 m 32 m 33 ) 32

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach

Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach Tze Leung Lai Stanford University Dylan Small University of Pennsylvania August

More information

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Paper 10260-2016 SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Katherine Cai, Jeffrey Wilson, Arizona State University ABSTRACT Longitudinal

More information

Power Analysis of Longitudinal Data with Time- Dependent Covariates Using Generalized Method of Moments

Power Analysis of Longitudinal Data with Time- Dependent Covariates Using Generalized Method of Moments University of Northern Colorado Scholarship & Creative Works @ Digital UNC Dissertations Student Research 8-2017 Power Analysis of Longitudinal Data with Time- Dependent Covariates Using Generalized Method

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates

Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates Bias Study of the Naive Estimator in a Longitudinal Binary Mixed-effects Model with Measurement Error and Misclassification in Covariates by c Ernest Dankwa A thesis submitted to the School of Graduate

More information

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI

ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI ESTIMATING STATISTICAL CHARACTERISTICS UNDER INTERVAL UNCERTAINTY AND CONSTRAINTS: MEAN, VARIANCE, COVARIANCE, AND CORRELATION ALI JALAL-KAMALI Department of Computer Science APPROVED: Vladik Kreinovich,

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents Longitudinal and Panel Data Preface / i Longitudinal and Panel Data: Analysis and Applications for the Social Sciences Table of Contents August, 2003 Table of Contents Preface i vi 1. Introduction 1.1

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

6.3 How the Associational Criterion Fails

6.3 How the Associational Criterion Fails 6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Clemson University TigerPrints All Dissertations Dissertations 8-207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com

More information

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1

1. Regressions and Regression Models. 2. Model Example. EEP/IAS Introductory Applied Econometrics Fall Erin Kelley Section Handout 1 1. Regressions and Regression Models Simply put, economists use regression models to study the relationship between two variables. If Y and X are two variables, representing some population, we are interested

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements: Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data

Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data University of Iowa Iowa Research Online Theses and Dissertations Summer 2011 Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data Yunlong Xie University of Iowa

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding

Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of Unmeasured Confounding University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations Summer 8-13-2010 Causal Effect Estimation Under Linear and Log- Linear Structural Nested Mean Models in the Presence of

More information

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA441: Spring Multiple Regression. More than one explanatory variable at the same time STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Working correlation selection in generalized estimating equations

Working correlation selection in generalized estimating equations University of Iowa Iowa Research Online Theses and Dissertations Fall 2011 Working correlation selection in generalized estimating equations Mi Jin Jang University of Iowa Copyright 2011 Mijin Jang This

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017 Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS Page 1 MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level

More information

Effect Modification and Interaction

Effect Modification and Interaction By Sander Greenland Keywords: antagonism, causal coaction, effect-measure modification, effect modification, heterogeneity of effect, interaction, synergism Abstract: This article discusses definitions

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1 Topic -2 Probability Larson & Farber, Elementary Statistics: Picturing the World, 3e 1 Probability Experiments Experiment : An experiment is an act that can be repeated under given condition. Rolling a

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

CRP 272 Introduction To Regression Analysis

CRP 272 Introduction To Regression Analysis CRP 272 Introduction To Regression Analysis 30 Relationships Among Two Variables: Interpretations One variable is used to explain another variable X Variable Independent Variable Explaining Variable Exogenous

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1) Identifying serial correlation. Plot Y t versus Y t 1. See

More information

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the Comparison of methods for repeated measures binary data with missing values by Farhood Mohammadi A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Biostatistics

More information

Week 2: Review of probability and statistics

Week 2: Review of probability and statistics Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Impact Evaluation of Mindspark Centres

Impact Evaluation of Mindspark Centres Impact Evaluation of Mindspark Centres March 27th, 2014 Executive Summary About Educational Initiatives and Mindspark Educational Initiatives (EI) is a prominent education organization in India with the

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung

ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES. Yu-Fai Leung ASSESSING AND EVALUATING RECREATION RESOURCE IMPACTS: SPATIAL ANALYTICAL APPROACHES by Yu-Fai Leung Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department

More information

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Yona Rubinstein July 2016 Yona Rubinstein (LSE) Instrumental Variables 07/16 1 / 31 The Limitation of Panel Data So far we learned how to account for selection on time invariant

More information

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility

Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:

More information

Reliability of Acceptance Criteria in Nonlinear Response History Analysis of Tall Buildings

Reliability of Acceptance Criteria in Nonlinear Response History Analysis of Tall Buildings Reliability of Acceptance Criteria in Nonlinear Response History Analysis of Tall Buildings M.M. Talaat, PhD, PE Senior Staff - Simpson Gumpertz & Heger Inc Adjunct Assistant Professor - Cairo University

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Job Training Partnership Act (JTPA)

Job Training Partnership Act (JTPA) Causal inference Part I.b: randomized experiments, matching and regression (this lecture starts with other slides on randomized experiments) Frank Venmans Example of a randomized experiment: Job Training

More information

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Sensitivity Analysis for Linear Structural Equation Models, Longitudinal Mediation With Latent Growth Models and Blended Learning in Biostatistics Education The Harvard community has made this article

More information

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison

Treatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison Treatment Effects Christopher Taber Department of Economics University of Wisconsin-Madison September 6, 2017 Notation First a word on notation I like to use i subscripts on random variables to be clear

More information

Experimental designs for multiple responses with different models

Experimental designs for multiple responses with different models Graduate Theses and Dissertations Graduate College 2015 Experimental designs for multiple responses with different models Wilmina Mary Marget Iowa State University Follow this and additional works at:

More information

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls

e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls e author and the promoter give permission to consult this master dissertation and to copy it or parts of it for personal use. Each other use falls under the restrictions of the copyright, in particular

More information

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University Calculating Effect-Sizes David B. Wilson, PhD George Mason University The Heart and Soul of Meta-analysis: The Effect Size Meta-analysis shifts focus from statistical significance to the direction and

More information

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing

Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing Notes 3: Statistical Inference: Sampling, Sampling Distributions Confidence Intervals, and Hypothesis Testing 1. Purpose of statistical inference Statistical inference provides a means of generalizing

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics STAT-S-301 Experiments and Quasi-Experiments (2016/2017) Lecturer: Yves Dominicy Teaching Assistant: Elise Petit 1 Why study experiments? Ideal randomized controlled experiments

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information