Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department of Educational Psychology & Department of Statistics The Pennsylvania State University May 22, 2013
The Problem Use of non-experimental two-wave panel (prospective) data to estimate the effects of parental behavior control strategies on the trajectory of child problem behavior. Two approaches have been used The Lagged Dependent Variable (LDV) method, also referred to as the residualized gain score method. The Change Score (SC) or the simple gain score method. In empirical studies, these often give different and often contradictory findings (Larzelere et. al, 2010). The purpose of this study is to examine some of the assumptions of these models and test them in simulated data where the true effect is know.
This is an extension of a comparison of twowave LDV and CS approaches in family research (Johnson, 2005) The focus here is on the two-wave case, although extension to multiple waves of both approaches is straightforward and will be briefly discussed.
The Substantive Model How do parent s practices of controlling rule violation and problem behavior by their child affect change in the child s observed problem behavior over time? Assume an observational or non-randomized experimental study rather than a randomized experiment. For example, assessing a measure of child problem behavior at two points of time and also assessing parental reports on their child control practices either at time 1, between 1 and 2, or at time 2.
Two-wave panel data: Assumptions Measurements are obtained at two time points. Y 1 : the child's problem behavior score at time 1 Y 2 : the child's problem behavior score at time 2 Both Y 1 and Y 2 are influenced by the same stable (non-time varying) variable Y (the child's disposition for problem behaviors). X is a parental intervention or control method: zero at time 1 and one at time 2 if intervention occurs somewhere between the two time points. X can also be specified a stable measure of parental control behavior consistent across time periods measured as a continuous variable or a binary variable. X can also be specified as a time-varying measure of parental control behavior that is measured at time 1 and time 2. S is a time-invariant variable measuring stable traits and background influences of both the parent and the child.
LDV and CS models Lagged Dependent Variable (LDV) method, also known as the residualized gain score or analysis of covariance (ANCOVA) method. Without any time-varying independent variables, the model is given by Y 2 = α + βx + γs + δy 1 + ε Change Score (CS) method, also commonly referred to as the simple gain score method. From Y 2 = α 2 + βx + γs + ε 2, and Y 1 = α 1 + γs + ε 1, we have Y 2 -Y 1 = α + βx + ε, where α = α 2 - α 1 and ε = ε 2 - ε 1.
Alternative Specifications of the CS model. The CS model can also be parameterized as a fixed effect pooled time-series model (Allison, 2009). Each wave is a separate record in the dataset linked by an ID (long vs. wide format). Y t = βx t + λt + α + ε, where t is the wave, X is a time-varying treatment, T is 0 for wave 1 and 1 for wave2, α is fixed effect for each individual in the dataset, and ε is a within-individual error term. Only time-varying variables can be included in the model. All variables are expressed as a deviation from their mean across waves.
Alternative Specifications of the CS model. When X is an intervention between w1 and w2 then X is time varying. If X is a measure of parenting practice assessed at both time periods then is also time varying. If X is a stable measure of parental practice measured only at time 1 (or earlier) then it is not time-varying. However, it can still be included in the fixed-effects model when interacted with time T. Estimates are the same as the CS in two-wave models.
LDV versus CS Both control Y 1 while examining the relationship between X and Y 2 in an attempt to estimate the causal effect of X on Y (Allison, 1990) important if X is not randomly assigned. The LDV method treats Y 2 as the dependent variable, regressed on X, S and Y 1 as independent variables. The CS method treats Y 2 -Y 1 as the dependent variable, regressed only on X. The LDV method has traditionally been considered superior to the CS method because: Y 2 -Y 1 tends to have a lower reliability than Y 1 or Y 2 itself, and a correlation between X and Y 1 will lead to a spurious negative relationship between X and Y 2 -Y 1 due to regression to the mean (Allison, 1990). The CS method is more advantageous in terms of: controlling for all time-invariant variables whether measured or unmeasured, as long as their effects are invariant over time, and allowing Y 1 to have measurement error
LDV versus CS Analyses of the same data using the two methods often lead to different conclusions. The two methods test different null hypotheses The LDV model is preferable over the CS model if Y 1 has a causal effect on Y 2 or if X is influenced by period-specific components of Y 1 (Allison, 1990). The LDV method is inappropriate to use when X (e.g., parental control) is not randomized and large differences on Y 1 scores exist between groups with different values of X. Also will be biased when measurement error is present in Y 1.
Simulation Study: Method Measurement errors: no measurement error, measurement errors on all four variables Y 1, Y 2, X, S. Time effect on the child's problem behavior at time 2: no effect and a moderate effect of 0.3. Parental intervention effect on the child's problem behavior: a positive effect of 0.5, a negative effect of -0.5, and no effect. A causal effect of the child's behavior at time 1 on his/her behavior at time 2: no effect and a positive effect of 0.4. Factors affecting the parental intervention: no factor affecting X, only S, only Y, only Y 1, both S and Y, both S and Y 1. Models compared: LDV model without control of S, LDV model with control of S, CS model.
Simulation Study: Method To generate the variable X that represents parental intervention, a continuous variable X c was first generated. Then define X=1 if X c > 0.67 and 0 otherwise. In models with measurement errors, the variables Y 1, Y 2, X c, S are substituted by y 1 =Y 1 +0.5r 1, y 2 =Y 2 +0.5r 2, x c =X c +0.5r 3, and s=s+0.7r 4, respectively, where r 1, r 2, r 3, and r 4 are standard normal random variates. 100 replications were generated with 2,000 subjects within each replication.
Model 1: X is random S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Model 2: Causal Effect from S to X S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Model 3: Causal Effect from Y to X S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Model 4: Causal Effect from Y 1 to X S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Model 5: Causal Effects from both S and Y to X S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Model 6: Causal Effects from S and Y 1 to X S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Models 7-12: Causal Effect from Y 1 to Y 2 added to Models 1-6 S = Stable Personality background factors: Y = Stable propensity for problem behavior; X = Parental Control; T = Developmental (time) effect; Y 1, Y 2 = problem behavior in waves 1 and 2 S Y T s Y 1 Y 2 X y 1 y 2 x
Simulation Study: Assessment Criteria In each model, the regression coefficient estimate representing the effect of parental intervention X was evaluated against its true effect on Y 2 using and Bias β = 1 100 100 r=1 β r β MSE β = 1 β 100 r β 2 r=1 Also compared was the proportion of models where the regression coefficient estimate was significant in the direction of the true effect. Will focus here only on the Bias. MSE and proportion significant available on handout for all models. 100
Descriptive Information The only pieces of information available in the estimation models without measurement error are Y 1, Y 2, X and S. When measurement error was assumed, then y 1, y 2, x and s were available. The means of the Y variables at each wave when X = 0 and X = 1 serve as an indicator of the pattern generated by the simulation. Will only present situations where the effect of T is 0. (estimate of effect of X does not vary by different values of T)
0.1 Means of Y at Wave 1 and 2 for Model 1 (X is Random) 0-0.1-0.2-0.3 Model 1 X=0 Model 1 X=1-0.4-0.5-0.6 W1 W2
1 Comparison of Means of Y at Waves 1 and 2 for Model 3 (Y -> X) and Model 4 (Y1 -> X) 0.8 0.6 Model 3 X=0 Model 3 X=1 Model 4 X=0 Model 4 X=1 0.4 0.2 0-0.2-0.4 W1 W2
0.8 Models 3 (Y -> X) & 4 (y1 -> X) with Measurement Error 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Model 3 X=0 Model 3 X=1 Model 4 X=0 Model 4 X=1 0-0.1-0.2-0.3 W1 W2
Summary of the Simulation Results Full simulation results available on handout. Present results for an effect of X of -.5. Assumes that parental control reduces problem behavior. Present models with no measurement error and with measurement error in all variables. Also exam some models where X is assumed to have no effect (X = 0) to assess whether either model is likely to yield an effect when none is present.
Bias in Estimated of X (-.5) when no Measurement Error is Present -0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 M1: X random M2: S -> X M3: Y -> X M4: Y1 -> X M5: Y, S -> X M6: Y1, S -> X M7: X random; Y1 -> Y2 M8: S -> X ; Y1 -> Y2 M9: Y -> X ; Y1 -> Y2 M10: Y1 -> X ; Y1 -> Y2 M11: Y, S -> X ; Y1 -> Y2 M12: Y1, S -> X ; Y1 -> Y2 CS LDV
Bias in Effect of X with Measurement Error when the True Effect of X is negative (-.5) M1: X random M2: S -> X M3: Y -> X M4: Y1 -> X M5: Y, S -> X M6: Y1, S -> X M7: X random; Y1 -> Y2 M8: S -> X ; Y1 -> Y2 M9: Y -> X ; Y1 -> Y2 M10: Y1 -> X ; Y1 -> Y2 M11: Y, S -> X ; Y1 -> Y2 M12: Y1, S -> X ; Y1 -> Y2-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 CS LDV
M1: X random M2: S -> X M3: Y -> X M4: Y1 -> X M5: Y, S -> X M6: Y1, S -> X M7: X random; Y1 -> Y2 M8: S -> X ; Y1 -> Y2 M9: Y -> X ; Y1 -> Y2 M10: Y1 -> X ; Y1 -> Y2 M11: Y, S -> X ; Y1 -> Y2 M12: Y1, S -> X ; Y1 -> Y2 Bias in Effect of X when the True Effect of X is 0. Measurement Error in Y1, Y2, S, & X -0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 CS LDV
Major findings Whether Y or Y 1 has a causal effect on X makes a substantial difference in the estimates of the CS and LDV models. When Y 1 has a causal effect on X, CS overestimates the negative effect and LDV is unbiased. When Y has a causal effect on X, LDV underestimates the effect and CS is unbiased. When measurement error is present, the CS method outperforms the LDV method overall. When Y 1 has a causal effect on Y 2, The LDV method is less biased except when measurement error is present.
Implications for choice of an analysis model. How and when the parental control method is measured is critical for choice of methods. If measured shortly after an episode of child problem behavior then there is likely to be a causal effect of that episode on the parental report, favoring the LDV method. If parental control methods tend to be stable, reflecting in generally the propensity of the child to problem behavior, then the CS method is likely to be preferred. If measurement error is expected to be present then CS would be preferred, or a method such as multiple indicators in SEM need to be used to adjust for measurement error. Computing both models is not of much help in deciding on which is more appropriate. Even if there is no effect, both models would under certain conditions show significant effects of the parental control.
Alternative Approaches? The economists Heckman and Robb (1985) examined models for a similar situation: the effect of training programs on earnings. They concluded that panel models involved many assumptions that were likely to be violated yielding biased estimates. However, methods based upon repeated cross-sections also involved assumptions but these were more reasonable and had less serious impact on the estimates. Two waves is likely not enough to deal with the complexities. LDV models can readily extent to multiple waves. CS within the fixed effects framework also permits multiple waves. With additional waves and multiple indicators of main variables allows more degrees of freedom to identify more of the model assumptions. Recent work has integrated the fixed effects approach with SEM models, including Latent Growth Curve models (Bollen & Brand, 2010 ; Teachman, 2012).
Thank You Contact Information: David R. Johnson drj10@psu.edu Haskell Sie hxs265@psu.edu
References Allison, P. D. (1990). Change scores as dependent variables in regression analysis. Sociological Methodology, 20, 93-114. Allison, P. D. (2009). Fixed effects regression models. Los Angeles: Sage. Bollen, K. A., & Brand, J. E. (2010). A general panel model with random and fixed effects: A structural equations approach. Social Forces. 89: 1-34. Cribbie, R. A., & Jamieson, J. (2004). Decreases in posttest variance and the measurement of change. Methods of Psychological Research Online, 9, 37-55. Heckman, J. J., & Robb, R. (1985). Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics. 30:239-267. Johnson, D R. (1995). "Alternative methods for the quantitative analysis of panel data in family research: Pooled time-series models," Journal of Marriage and the Family. 57:1065-1077. Johnson, D. R. (2005). Two-wave panel analysis: Comparing statistical methods for studying the effects of transitions. Journal of Marriage and Family, 67, 1061-1075. Larzelere, R. E., Ferrer, E., Kuhn, B. R., & Danelia, K. (2010). Differences in causal estimates from longitudinal analyses of residualized versus simple gain scores: Contrasting controls for selection and regression artifacts. International Journal of Behavioral Development, 34, 180-189. Teachman, J. (2012). Latent growth curve models with random and fixed effects. Paper presented at the 20 th Annual Symposium on Family Issues, Pennsylvania State Univeristy.