Class Notes. Examining Repeated Measures Data on Individuals
|
|
- Arabella Bruce
- 5 years ago
- Views:
Transcription
1 Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical outcomes into situaons where there are clustered data structures. One of the attracve properes of the GLMM is that it allows for linear as well as non-linear models under a single framework which will address issues of clustering. It is possible to fit models with outcomes resulng from various probability distribuons including normal (or Gaussian), inverse Gaussian, gamma, Poisson, mulnomial, binomial, and negave binomial through an appropriate link funcon g(). At level 1, repeated observaons (e.g., students proficiency status in math, students enrollment over successive semesters in college, changes in clinical or health status) are nested within individuals, perhaps with addional me-varying covariates. At level 2, we can define variables describing differences between individuals (e.g., treatment groups, parcipaon status, subject background variables and attudes). Generalized Esmang Equaons Alternavely, by using the Generalized Esmated Equaons (GEE) approach, we can examine a number of categorical measurements nested within individuals (i.e., individuals represent the clusters), but where individuals themselves are considered to be independent and randomly sampled from a populaon of interest. More specifically, in this latter type of model, the pairs of dependent and independent variables ( Y i ; X i ) for individuals are assumed to be independent and idencally distributed (Ziegler, Kastner, & Blettner, 1998) rather than clustered within organizaons. GEE is used to characterize the marginal expectaon of a set of repeated measures (i.e., average response for observaons sharing the same covariates) as a funcon of a set of study variables. As a result, the important point is that the growth parameters are not assumed to vary randomly across individuals (or higher groups) as in a typical random-coefficients (or mixed) model. This is an important disncon between the two types of models to keep in mind that is, while random-coefficient models explicitly address variaon across individuals as well as clustering among subjects in higher-order groups, GEE models assume simple random sampling of subjects represenng a populaon as opposed to at set of higher-order groups. Hence, GEE models provide what are called populaon average results; that is, they model the marginal expectaon as a funcon of the explanatory variables. In contrast, typical mullevel model provide unit specific results. Regression coefficients based on populaon averages (GEE) will be generally similar to unitspecific (random-effect models) coefficients but smaller in size (Raudenbush & Bryk, 2002). This disncon does not arise in models with connuous outcomes and identy link funcons. For example, for a GEE model, the odds rao is the average esmate in the populaon that is, the expected increase for a unit change X in the populaon. In contrast, in random-effect (unitspecific) models, the odds rao will be the subject-specific effect for a parcular level of clustering (i.e., the person or unit of clustering) given a unit change in X.
2 Ronald Heck Week 12: Class Notes 2 We first begin with a within- and between-subjects model esmated using the GEE (or fixedeffect) approach. GEE was developed to extend GLM further by accommodang repeated categorical measures, logisc regression, and various other models for me series or other correlated data where relaonships between successive measurements on the same individual are assumed to influence the esmaon of model parameters (Horton & Lipsitz, 1999; Liang & Zeger, 1986; Zeger, Liang, & Albert, 1988). The GEE analyc approach handles a number of different types of categorical outcomes, their associated sampling distribuons, and corresponding link funcons. It is suitable to use where the repeated observaons are nested within individuals over me, but the individuals are considered to be a random sample of a populaon. One scenario is where individuals are randomly assigned to treatment condions that unfold over me. If the outcome is a count, we can make use of an addional exposure parameter (i.e., referred to as an offset term) which as you will recall is a "structural" predictor that can be added to the model. Its coefficient is not esmated by the model but is assumed to have the value 1.0; thus, the values of the offset are simply added to the linear predictor of the dependent variable. This extra parameter can be especially useful in Poisson regression models, where each case may have different levels of exposure to the event of interest. At present in IBM SPSS, the GEE approach only accommodates a two-level data hierarchy (measurements nested in individuals). If we intend to add a group-level variable, we would need to use GENLIN MIXED to specify the group structure. Students Proficiency in Reading Over Time Consider a study to examine students likelihood to be proficient in reading over me and to assess whether their background might affect their varying patterns of meeng proficiency or not. We may first be interested in answering whether a change takes place over me in students likelihood to be proficient. This concern addresses whether the probability of a student being proficient is the same or different over the occasions of measurement. The assumpon is that if we can reject the hypothesis that the likelihood of being proficient is the same over me, it implies that a change in individuals has taken place. In this situaon, occasions of measurement are assumed to be nested within subjects but independent between subjects. We may have a number of research quesons we are interested in examining such as the following: What is the probability of students being proficient in reading over me? Do probabilies of being proficient change over me? What do students trends look like over me? Are there between-individual variables that explain students likelihood to be proficient over me? Vercal Alignment of Data Within Individuals The data in this study consist of 2,228 individuals who were measured on four occasions regarding their proficiency in reading. To examine growth within and between individuals using GEE (or GENLIN MIXED), the data must first be organized differently (see Chapter 2 in the text). The me-related observaons must be organized vercally, which will require four lines for each subject, since there are four repeated observaons regarding proficiency. You will recall that an intercept is defined as the level of Y when X (Time) is 0. For categorical outcomes, the me variable funcons to separate contrasts between me, for example, between a baseline
3 Ronald Heck Week 12: Class Notes 3 measurement and end of a treatment intervenon or to examine change over a parcular me period. This coding pattern for Time (0, 1, 2, 3) idenfies the intercept in the model as students inial (me1) proficiency status (i.e., since it is coded 0, and the intercept represents the individual s status when the other predictors are 0). This is the most common type of coding for models involving individual change. There are several important steps that must be specified in conducng the analysis. Users idenfy the type of outcome and appropriate link funcon, define the regression model, select the correlaon structure between repeated measures, and select either model-based or robust standard errors. There are a number of different ways to notate the models. We will let Y be the dichotomous response at me t (t = 1,2,, T ) for individual i (i = 1,2,, N), where we assume the observaons of different individuals are independent, but we allow for an associaon between the repeated measures for the same subject. This will allow us later in the chapter to add the subscript j to define random effects of individuals nested within groups such as classrooms or schools. We assume the following marginal regression model for the expected value of Y : g ( E [ Y ] ) x where x is a (p +1) x 1 vector (prime designates a vector) of covariates for the i th subject on the t th measurement occasion (t = 1,2,, T), represents the corresponding regression parameters, and g() refers to one of several corresponding link funcons, depending on the measurement ofy. This suggests that the data can be summarized to the vectory i and the matrix. The slope can be interpreted as the rate of change in the populaon-averaged Y i with X i (Zeger et al., 1988). Typically, the parameters are constant for all t (Ziegler et al., 1998). Where the data are dichotomous, the marginal mean a probability is most commonly modeled via the logit link (i.e., whether a child is proficient or not at me t). The coefficients are then interpreted as log odds. For the Bernoulli case (i.e., where the number of trials is 1), Y has a binomial distribuon with probability of success and variance of π(1-π). For binary data with the logit link funcon, we have the familiar = log( /(1 ) x, where is the underlying transformed predictor of Y, in this case, the log of the odds of /(1 ). It should again be noted that the model represents a rao of the probability of the event coded 1 occurring versus the probability of the event coded 0 occurring at a parcular me point. There is no residual variance parameter ( i ), as the variance is related to the expected value of and therefore cannot be uniquely defined.
4 Ronald Heck Week 12: Class Notes 4 In the first model, we specify the repeated measures outcomes in two parameters which describe the intercept and me-related slope as follows: log( /1 ) ( ), 0 1 me where me is coded to indicate the interval between successive measurements, 0 is an intercept and 1 describes the rate of change on a logit scale in the fracon of posive responses in the populaon of subjects per unit me, rather than the typical change for an individual subject. As the above equaon suggests, 0 is the log odds of response when me is 0 (i.e, inial status). In this case, 1 is the log odds associated with a one-year interval. The model assumes there are no between-subject random effects; therefore, there are two parameters to esmate. Since this is a single-level model, for convenience we ll drop the subscripts referring to the predictors. Correlaon Structures Between Repeated Measures It is possible to specify several different types of correlaon structures to describe the withinsubject dependencies over me. However, because one does not often know what the correct structure is ahead of me, different choices can make some difference in the model s parameter esmates; therefore, the structure is chosen to improve efficiency. It often does take a bit of preliminary work to determine the opmal working correlaon matrix for a parcular data structure. Examples of GEE correlaon/covariance structure specificaons include independence, exchangeable, autoregressive, staonary m-dependent, and unstructured. The independent matrix assumes that the repeated measurements are uncorrelated; however, this will not be the case in most instances. Generally, in longitudinal models the successive measurements are correlated at least to some extent. An exchangeable (or compound symmetry) covariance (or correlaon) matrix assumes homogenous correlaons between elements (which is somemes difficult to assume in longitudinal studies); that is, the correlaons are assumed to be the same over me. This can somemes be difficult to support in a longitudinal study, however. The autoregressive, or AR(1) matrix, assumes the repeated measures have a first-order autoregressive structure. This implies that the correlaon between any two adjacent elements is equal to (rho), to 2 1< <1. for elements separated by a third, and so on, with constrained such that - An m-dependent matrix assumes consecuve measurement have a common correlaon coefficient, pairs of measurements separated by a third have a common correlaon coefficient, and so on, through pairs of measurements separated by m-1other measurements. Where measurements are note evenly spaced, it may be reasonable to consider a model where the correlaon is a funcon of the me between observaons (i.e., M-dependent or autoregressive). Measurements with greater separaon are assumed to be uncorrelated. When choosing this structure, specify a value of m less than the order of the working correlaon matrix.
5 Ronald Heck Week 12: Class Notes 5 Finally, an unstructured correlaon (or covariance) matrix provides a separate coefficient for each covariance. As with cross-seconal models, we have found that model esmates can vary slightly according to the matrix structure specified. Standard Errors and Esmaon Model-based standard errors are based on the correlaonal structure chosen. Hence, they may be inconsistent if the correlaon structure is incorrectly specified. They are usually a little smaller than the robust standard errors (SEs). For smaller numbers of clusters, model-based SEs are generally preferred over robust SEs. In contrast, robust standard errors vary only slightly depending on the choice of hypothesized correlaonal structure among the repeated measures; that is, the esmates are consistent even if the correlaonal structure is specified incorrectly. The robust SE approach uses a sandwich esmator based on an approximaon to maximum likelihood. Because of this, there can be occasions that occur when one approach will converge and the other may not. Robust standard errors are often preferred when the number of clustered observaons is large. We will esmate our models in this example using robust standard errors since we have a considerable amount of data. Once again, we note that users should keep in mind that GEE uses a type of quasi-likelihood esmaon (as opposed to full informaon ML), which can make direct model comparison based on fit stascs that depend on the real likelihood (e.g., deviance, AIC, BIC) not very accurate (Hox, 2010). Table 1. Model Informaon Dependent Variable readprof a Probability Distribuon Binomial Link Funcon Logit Subject Effect 1 Id Within-Subject Effect 1 Time Working Correlaon Matrix Structure Exchangeable a. The procedure models 1 as the response, treang 0 as the reference category. Table 1 provides informaon about how the model is defined (e.g., probability distribuon and link funcon, number of effects in the model, type of correlaon matrix used to describe withinsubject structure). As the output shows, the distribuon is binomial and a logit link funcon is used to transformy. The working correlaon structure is exchangeable, which is the same as compound symmetry. This implies that the correlaons are the same over each me interval. We can subsequently invesgate whether this is a viable assumpon for these data. Next, we can observe how many of the total cases for the dependent variable (reading proficiency) are coded 1 (proficient) versus 0 (not proficient). As the table suggests, across the four me periods, an average 68% of the individuals were proficient and 32% were not.
6 Ronald Heck Week 12: Class Notes 6 Table 2. Reading Proficiency Informaon N Percent Dependent Variable readprof % % Total % If we did not include the me variable, the log odds intercept would be (not tabled) which would be the grand mean log odds coefficient across the four me periods. We can translate the odds rao back to the predicted populaon probability of = 1 [odds/(1+odds)], which would be 2.128/3.128, or 0.680, which fits with the Table 2 esmate. Next in Table 3 are the fixed effect results for the intercept and the me-related predictor. The esmated intercept log odds coefficient is 0.838, which because of the coding of the me variable (i.e., 0, 1, 2, 3), can be interpreted as the percentage of individuals who are proficient at the start of the study. The intercept represents the predicted log odds when any variables in the model are 0. If we exponenate the log odds, we obtain the corresponding odds rao of This suggests individuals are almost 2.3 mes more likely to be proficient than non-proficient at the beginning of the study (.70/.30 ~2.3). Table 3. Parameter Esmates Parameter B Std. Error 95% Wald Confidence Interval Lower Upper Hypothesis Test 95% Wald Confidence Interval for Exp(B) Wald Chi- Square Df Sig. Exp(B) Lower Upper (Intercept) Time (Scale) 1 Dependent Variable: readprof Model: (Intercept), me Regarding the me variable, the coefficient suggests that over each interval students likelihood of being proficient decreases significantly (log odds = , p <.001). We can translate this into a predicted probability by adding it to the intercept. Inially (i.e., at me = 0), the log odds of being proficient is For the second interval (me = 1) the esmated log odds will then be the [ (-0.055) = 0.783]. We could then esmate the new probability as 0.69, which is esmated as follows: 1/[1+( ) -(.783) which reduces to 1/ Note this esmate is slightly different from the actual observed probability in the table below, since there was no actual change that took place between me 0 and me 1. The odds rao suggests the odds of being proficient are mulplied by.947 (or reduced by 5.3%) over the first interval. We can see in
7 Ronald Heck Week 12: Class Notes 7 this situaon an assumed negave linear me trend in reduced probability of being proficient does not quite fit the data opmally. Table 4. Proporon of proficient students Readprof Time Mean N Std. Deviaon Total In this case, we might decide to code the data somewhat differently to obtain results that model the trend a bit better. We might wish to treat the me-related variable as ordinal (1,2,C) rather than scale. If we make this change, we will have C-1 esmates, since one category will serve as the reference group. In this case, we will specify descending for the factor category order so that the first category (Time = 0) will serve as the reference group. This is the same as creang a series of C-1 dummy variables for a categorical factor and specifying them in the model. Table 5. Model 1.2 Parameter Esmates Parameter B Std. Error Hypothesis Test 95% Wald Confidence Interval for Exp(B) Wald Chi- Square df Sig. Exp(B) Lower Upper (Intercept) [me=3] [me=2] [me=1] [me=0] 0 a (Scale) 1 Dependent Variable: readprof Model: (Intercept), me (ordinal) a. Set to zero because this parameter is redundant. The intercept log odds is now This is only slightly different from the last table. If we calculate the predicted probability of being proficient inially (Time = 0), we see it will be (.840) 1/(1 e ) or 1/1.432 = Note we can also use the odds rao to esmate the probability (2.315/3.315). This probability is consistent with the observed probability of in the previous table. We can see further that at Time = 1, there was little change in log odds units regarding students probability of being proficient (log odds = 0.002, p =.904). At Time 2 (log odds = , p <.001) and Time 3 (log odds = , p <.001), however, students were significantly lower in probability of being proficient relave to their proficiency status at Time 0. Regarding the odds raos (OR), we can interpret the nonsignificant relaonship at Time = 1 as indicang there was no significant change in odds of being proficient at Time 1 (OR = 1.002, p =
8 Ronald Heck Week 12: Class Notes 8.904). In contrast, the odds of being proficient at Time = 2 versus me 0 are mulplied by (or reduced by 20.5%) compared to the inial level. At Time = 3, it suggests that the odds of being proficient at Time 3 versus Time 0 (i.e., inial status intercept) are mulplied by (or reduced by 11%). We can esmate the probability of being proficient at Time 3 versus Time 0 in several ways. We can add the two log odds coefficients ( = 0.734). This will provide the log odds of being proficient at Time 3. The exponenated slope can be interpreted as the change in the odds that Y = 1 relave to the reference category (i.e., Time 0). If we exponenate the log odds ( e.734 ), we obtain the odds rao of We can then calculate the probability of being proficient at Time 3 as 2.08/3.08 = 0.675, which is consistent with the in the previous table. Alternavely, we can also represent the new odds rao as the product of the two odds raos (2.315*0.899) = 2.08; that is, we mulply the odds rao for Time = 0 by the difference in odds between Time 0 and Time 3 (0.899), which provides the new odds rao (2.08), and will lead to the same probability. Applying this approach for Time = 2, we have 2.315*0.795 =1.840, which is then (1.84/2.84 =0.648). This esmate of the probability Y = 1 is consistent with the observed proporon of in the previous table. We can see that defining the me trend as categorical in this instance provides some benefits in represenng the change probability of being proficient that takes place between each measurement more accurately. Adding a Predictor We can next add one or more between-subjects predictors, but the outcome parameters are treated as fixed; that is, the slopes cannot vary across individuals in the sample. We provide an example where we add gender (female coded 1; male coded 0) to the model. We can define this model as follows:. log[ /1 )] 0 1me 2 female We will do this one in class and compare me defined as interval and ordinal. References Horton, N. J., & Lipsitz, S. R. (1999). Review of software to fit Generalized Esmang Equaon (GEE) regression models. The American Stascian, 53, Hox, Joop J. (2010). Mullevel analysis: Techniques and applicaons (2nd ed.). New York: Routledge. Liang, Kung-Lee, & Zeger, Scott L. (1986). Longitudinal analysis using generalized linear models. Biometrika, 73(1), Raudenbush, Stephen W., & Bryk, Anthony S. (2002). Hierarchical linear models: Applicaons and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publicaons. Zeger, Scott L., & Liang, Kung-Lee. (1986). Longitudinal data analysis for discrete and connuous outcomes. Biometrics, 42(1), Ziegler, A., Kastner, C., & Blettner, M. (1998). The Generalised Esmang Equaons: An annotated bibliography. Biometrical Journal(2),
Class Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationRonald Heck Week 14 1 EDEP 768E: Seminar in Categorical Data Modeling (F2012) Nov. 17, 2012
Ronald Heck Week 14 1 From Single Level to Multilevel Categorical Models This week we develop a two-level model to examine the event probability for an ordinal response variable with three categories (persist
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationLinear Regression and Correla/on. Correla/on and Regression Analysis. Three Ques/ons 9/14/14. Chapter 13. Dr. Richard Jerz
Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques
More informationLinear Regression and Correla/on
Linear Regression and Correla/on Chapter 13 Dr. Richard Jerz 1 Correla/on and Regression Analysis Correla/on Analysis is the study of the rela/onship between variables. It is also defined as group of techniques
More informationRon Heck, Fall Week 3: Notes Building a Two-Level Model
Ron Heck, Fall 2011 1 EDEP 768E: Seminar on Multilevel Modeling rev. 9/6/2011@11:27pm Week 3: Notes Building a Two-Level Model We will build a model to explain student math achievement using student-level
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationAdditional Notes: Investigating a Random Slope. When we have fixed level-1 predictors at level 2 we show them like this:
Ron Heck, Summer 01 Seminars 1 Multilevel Regression Models and Their Applications Seminar Additional Notes: Investigating a Random Slope We can begin with Model 3 and add a Random slope parameter. If
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationCorrela'on. Keegan Korthauer Department of Sta's'cs UW Madison
Correla'on Keegan Korthauer Department of Sta's'cs UW Madison 1 Rela'onship Between Two Con'nuous Variables When we have measured two con$nuous random variables for each item in a sample, we can study
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was
More informationModel and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data
The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 1 Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data Dr Jisheng Cui Public Health
More informationREGRESSION AND CORRELATION ANALYSIS
Problem 1 Problem 2 A group of 625 students has a mean age of 15.8 years with a standard devia>on of 0.6 years. The ages are normally distributed. How many students are younger than 16.2 years? REGRESSION
More informationSome Review and Hypothesis Tes4ng. Friday, March 15, 13
Some Review and Hypothesis Tes4ng Outline Discussing the homework ques4ons from Joey and Phoebe Review of Sta4s4cal Inference Proper4es of OLS under the normality assump4on Confidence Intervals, T test,
More informationStructural Equa+on Models: The General Case. STA431: Spring 2013
Structural Equa+on Models: The General Case STA431: Spring 2013 An Extension of Mul+ple Regression More than one regression- like equa+on Includes latent variables Variables can be explanatory in one equa+on
More informationLecture 3.1 Basic Logistic LDA
y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationCS 6140: Machine Learning Spring What We Learned Last Week 2/26/16
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign
More informationTrends in Human Development Index of European Union
Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development
More informationMore Accurately Analyze Complex Relationships
SPSS Advanced Statistics 17.0 Specifications More Accurately Analyze Complex Relationships Make your analysis more accurate and reach more dependable conclusions with statistics designed to fit the inherent
More informationSpecifying Latent Curve and Other Growth Models Using Mplus. (Revised )
Ronald H. Heck 1 University of Hawai i at Mānoa Handout #20 Specifying Latent Curve and Other Growth Models Using Mplus (Revised 12-1-2014) The SEM approach offers a contrasting framework for use in analyzing
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationT- test recap. Week 7. One- sample t- test. One- sample t- test 5/13/12. t = x " µ s x. One- sample t- test Paired t- test Independent samples t- test
T- test recap Week 7 One- sample t- test Paired t- test Independent samples t- test T- test review Addi5onal tests of significance: correla5ons, qualita5ve data In each case, we re looking to see whether
More informationReview of the General Linear Model
Review of the General Linear Model EPSY 905: Multivariate Analysis Online Lecture #2 Learning Objectives Types of distributions: Ø Conditional distributions The General Linear Model Ø Regression Ø Analysis
More informationGarvan Ins)tute Biosta)s)cal Workshop 16/6/2015. Tuan V. Nguyen. Garvan Ins)tute of Medical Research Sydney, Australia
Garvan Ins)tute Biosta)s)cal Workshop 16/6/2015 Tuan V. Nguyen Tuan V. Nguyen Garvan Ins)tute of Medical Research Sydney, Australia Introduction to linear regression analysis Purposes Ideas of regression
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationReview of Multiple Regression
Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationAssessing the relation between language comprehension and performance in general chemistry. Appendices
Assessing the relation between language comprehension and performance in general chemistry Daniel T. Pyburn a, Samuel Pazicni* a, Victor A. Benassi b, and Elizabeth E. Tappin c a Department of Chemistry,
More informationHypothesis Testing for Var-Cov Components
Hypothesis Testing for Var-Cov Components When the specification of coefficients as fixed, random or non-randomly varying is considered, a null hypothesis of the form is considered, where Additional output
More informationDART Tutorial Sec'on 1: Filtering For a One Variable System
DART Tutorial Sec'on 1: Filtering For a One Variable System UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons
More informationRegression Part II. One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons
Regression Part II One- factor ANOVA Another dummy variable coding scheme Contrasts Mul?ple comparisons Interac?ons One- factor Analysis of variance Categorical Explanatory variable Quan?ta?ve Response
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More information1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data
Lecture 3: Bivariate Data & Linear Regression 1. Introduc9on 2. Bivariate Data 3. Linear Analysis of Data a) Freehand Linear Fit b) Least Squares Fit c) Interpola9on/Extrapola9on 4. Correla9on 1. Introduc9on
More informationData Processing Techniques
Universitas Gadjah Mada Department of Civil and Environmental Engineering Master in Engineering in Natural Disaster Management Data Processing Techniques Hypothesis Tes,ng 1 Hypothesis Testing Mathema,cal
More informationOne- factor ANOVA. F Ra5o. If H 0 is true. F Distribu5on. If H 1 is true 5/25/12. One- way ANOVA: A supersized independent- samples t- test
F Ra5o F = variability between groups variability within groups One- factor ANOVA If H 0 is true random error F = random error " µ F =1 If H 1 is true random error +(treatment effect)2 F = " µ F >1 random
More informationShort introduc,on to the
OXFORD NEUROIMAGING PRIMERS Short introduc,on to the An General Introduction Linear Model to Neuroimaging for Neuroimaging Analysis Mark Jenkinson Mark Jenkinson Janine Michael Bijsterbosch Chappell Michael
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationSpring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations
Spring RMC Professional Development Series January 14, 2016 Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Ann A. O Connell, Ed.D. Professor, Educational Studies (QREM) Director,
More informationz-scores z-scores z-scores and the Normal Distribu4on PSYC 300A - Lecture 3 Dr. J. Nicol
z-scores and the Normal Distribu4on PSYC 300A - Lecture 3 Dr. J. Nicol z-scores Knowing a raw score does not inform us about the rela4ve loca4on of that score in the distribu4on The rela4ve loca4on of
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 7: Dummy Variable Regression So far, we ve only considered quantitative variables in our models. We can integrate categorical predictors by constructing artificial
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors
More informationRandom Intercept Models
Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept
More informationIntroduction to Within-Person Analysis and RM ANOVA
Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012
More informationLongitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois
Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control
More informationGeneralized Models: Part 1
Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationDesigning Multilevel Models Using SPSS 11.5 Mixed Model. John Painter, Ph.D.
Designing Multilevel Models Using SPSS 11.5 Mixed Model John Painter, Ph.D. Jordan Institute for Families School of Social Work University of North Carolina at Chapel Hill 1 Creating Multilevel Models
More informationIntroduc)on to Ar)ficial Intelligence
Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationPractical Biostatistics
Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:
More informationIntroduc)on to RNA- Seq Data Analysis. Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas
Introduc)on to RNA- Seq Data Analysis Dr. Benilton S Carvalho Department of Medical Gene)cs Faculty of Medical Sciences State University of Campinas Material: hep://)ny.cc/rnaseq Slides: hep://)ny.cc/slidesrnaseq
More information13.1 Categorical Data and the Multinomial Experiment
Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)
More informationNELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation
NELS 88 Table 2.3 Adjusted odds ratios of eighth-grade students in 988 performing below basic levels of reading and mathematics in 988 and dropping out of school, 988 to 990, by basic demographics Variable
More informationAnnouncements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook!
Announcements Topics: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook! Work On: - Prac0ce problems from the textbook and assignments from the coursepack as assigned
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationAn Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012
An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person
More informationAn R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM
An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationGraphical Models. Lecture 3: Local Condi6onal Probability Distribu6ons. Andrew McCallum
Graphical Models Lecture 3: Local Condi6onal Probability Distribu6ons Andrew McCallum mccallum@cs.umass.edu Thanks to Noah Smith and Carlos Guestrin for some slide materials. 1 Condi6onal Probability Distribu6ons
More informationExperimental Designs for Planning Efficient Accelerated Life Tests
Experimental Designs for Planning Efficient Accelerated Life Tests Kangwon Seo and Rong Pan School of Compu@ng, Informa@cs, and Decision Systems Engineering Arizona State University ASTR 2015, Sep 9-11,
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationOverview: In addi:on to considering various summary sta:s:cs, it is also common to consider some visual display of the data Outline:
Lecture 2: Visual Display of Data Overview: In addi:on to considering various summary sta:s:cs, it is also common to consider some visual display of the data Outline: 1. Histograms 2. ScaCer Plots 3. Assignment
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More information12 Generalized linear models
12 Generalized linear models In this chapter, we combine regression models with other parametric probability models like the binomial and Poisson distributions. Binary responses In many situations, we
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationLecture 1 Introduction to Multi-level Models
Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course
More information