Psychological Methods

Size: px

Start display at page:

Download "Psychological Methods"

Bethany Holmes
5 years ago
Views:

1 Psychological Methods A Cautionary Note on Modeling Growth Trends in Longitudinal Data Goran Kuljanin, Michael T. Braun, and Richard P. DeShon Online First Publication, April 5, 011. doi: /a CITATION Kuljanin, G., Braun, M. T., & DeShon, R. P. (011, April 5). A Cautionary Note on Modeling Growth Trends in Longitudinal Data. Psychological Methods. Advance online publication. doi: /a003348

2 Psychological Methods 011, Vol., No., American Psychological Association X/11/$1.00 DOI: /a A Cautionary Note on Modeling Growth Trends in Longitudinal Data Goran Kuljanin, Michael T. Braun, and Richard P. DeShon Michigan State University Random coefficient and latent growth curve modeling are currently the dominant approaches to the analysis of longitudinal data in psychology. The application of these models to longitudinal data assumes that the data-generating mechanism behind the psychological process under investigation contains only a deterministic trend. However, if a process, at least partially, contains a stochastic trend, then random coefficient regression results are likely to be spurious. This problem is demonstrated via a data example, previous research on simple regression models, and Monte Carlo simulations. A data analytic strategy is proposed to help researchers avoid making inaccurate inferences when observed trends may be due to stochastic processes. Keywords: random coefficient model, latent growth curve model, stochastic trend, unit root tests, spurious regression Longitudinal data structures enable the evaluation and modeling of psychological processes that evolve over time. These data can often reveal interesting, dynamic relationships not readily apparent in crosssectional data (e.g., Block, 1995; Mitchell & James, 001; Molenaar, 004; Vancouver, Thompson, & Williams, 001). It is not surprising, then, that longitudinal data collections occur more frequently in psychological research than in the past (Collins, 006). At the same time, technological advances in experience sampling methods make it easier to collect more observations over a longer time span yielding intensive longitudinal data structures (i.e., Walls & Schafer, 006). The unique problems encountered in longitudinal data, compared with cross sectional data, necessitate the use of more complex methods to support accurate psychological inference. Advances in the development of longitudinal data analysis provide researchers with multiple analytical tools. Repeated measures analysis of variance and regression provide researchers with average trajectories across a group of individuals. However, interest in interindividual differences in intraindividual change led to the development of more complex analytic methods, such as latent growth and random coefficient models (a.k.a. multilevel models, hierarchical linear models, mixed effects models). These more complex analytic methods provide both the average trajectory across a group of individuals and capture heterogeneity in individual trajectories if it exists. Irrespective of whether trajectory heterogeneity is substantively interesting or thought of as nuisance variance, it must be represented in the model, and thus, random coefficient modeling is currently the most widely used longitudinal data analysis technique in psychology. Although random coefficient modeling has substantially advanced longitudinal data analysis in psychology, the validity of the parameter estimates, significance tests, and the resulting scientific inferences depend on understanding the applicability of the model assumptions to the longitudinal processes under investigation. Goran Kuljanin, Michael T. Braun, and Richard P. DeShon, Department of Pyschology, Michigan State University. Correspondence concerning this article should be addressed to Goran Kuljanin, Department of Psychology, Michigan State University, East Lansing, MI kuljanin@msu.edu A model fit to a longitudinal process results in a trajectory or set of trajectories. When modeling multiple longitudinal processes (e.g., individuals, teams), the observed trajectories are frequently heterogeneous (Collins & Horn, 1991; Collins & Sayer, 001). Across individuals, trajectory slopes may differentially increase or decrease. Slope heterogeneity further implies a change in the variance of the modeled variable over time, and commonly, the model-implied variance increases especially when there is a positive covariance between intercepts and slopes (e.g., McArdle & Nesselroade, 003). Figure 1 is a graphic depiction of heterogeneous trajectories consistent with those commonly observed in psychological research (e.g., Bollen & Curran, 006; DeLucia & Pitts, 006; Grimm, 007; McArdle & Nesselroade, 003; Willett, 1989). Such observed trajectories are uniformly modeled as a noisy deterministic process by entering a polynomial time index (e.g., linear, quadratic) as a response predictor. Although not widely recognized in psychology, trajectories can also result, at least partially, from a non-deterministic or stochastic process. If the trends present in the trajectories observed in Figure 1 reflect the functioning of a stochastic process, then using a random coefficient model to reflect the dependencies present in the longitudinal data will almost certainly yield spurious results and inferences. The purpose of this presentation is to highlight the existence of this statistical and inferential problem, to evaluate its magnitude, and to provide recommendations for avoiding spurious inference when using random coefficient models to analyze longitudinal data. The Problem Presented The heterogeneous trajectories depicted in Figure 1 are the result of regressing a dependent variable, Y, onto an index variable, Time, either separately for each individual or jointly using random coefficient modeling. The recommended analytic approach when using random coefficient modeling to represent the variances and covariances in this longitudinal data is to fit a sequence of increasingly complex models (e.g., Singer & Willett, 003). The initial model, termed the unconditional means model, is used to evaluate the proportion of variance due to clustering of observations within individ- 1

3 KULJANIN, BRAUN, AND DESHON 4 Y Time Figure 1. Regression lines fit to 50 simulated longitudinal trajectories representative of psychological data. uals. This model also serves as a baseline model for evaluating improvements in model fit achieved by fitting more complex models. After fitting the unconditional means model, a more complex model, termed the unconditional growth model, is used to evaluate the variability of intercepts and slopes. Next, conditional growth models may be fit that incorporate predictors of the observed heterogeneity. To understand the extent to which the modeling recommendations are implemented in empirical research, we reviewed the modeling practices reported in five of the leading American Psychological Association journals (Developmental Psychology, Journal of Applied Psychology, Journal of Clinical and Consulting Psychology, Journal of Educational Psychology, and Journal of Personality and Social Psychology) from January 006 to August 010. In this time span, researchers in 73 journal articles reported using either random coefficient or latent growth curve models to analyze longitudinal data. We found it interesting that researchers in only 1 (i.e., less than 17%) journal articles reported examining the unconditional means model. This is surprising because knowledge of the relative magnitudes of between and within variance can productively inform inference in these models. In contrast, researchers in 55 (i.e., approximately 75%) journal articles reported and interpreted the unconditional growth model. Given that these unconditional models are recommended analytic procedure and the high frequency of analyzing unconditional growth models, we first examine the impact of stochastic processes on the unconditional models. Following this analysis, we switch attention to the impact of stochastic processes on conditional growth models. Unconditional Means Model Using the notation presented in Singer and Willett (003), the unconditional means model is specified as Y ij 0i ε ij 0i 00 0i, (1) where ε ij N 0, ε and 0i N 0, 0, Y ij is the dependent variable measured for person i on occasion j, 0i is the mean of Y for individual i, 00 is the mean of Y across everyone in the population, ε ij is the residual for individual i on occasion j, ε is the pooled within-person variance of each individual s data around his or her mean, 0i is the random effect for individual i (i.e., deviation of the person-specific mean from the grand mean), and 0 is the random effect variance. The unconditional means model splits the total variance into within- and between-person variance. For the data presented in Figure 1, the within-person variation, ˆ, is 1.08, and the between person ε variation, ˆ, is.09. An intraclass correlation coefficient, ICC(1), 0 indicating the proportion of total variance due to individual differences, may be computed using these variance component estimates. For these data, ICC(1) is ˆ 0 /( ˆ 0 ˆ ε ).09/( ) That is, 66% of the total variance resides between individuals. A test may be used to evaluate whether the estimated between-person variance, ˆ, differs from zero (Raudenbush & Bryk, 00). In this 0 example, 49, N 50) , p.05, and thus, there is statistically significant between-person variation. This result suggests that it is reasonable to examine predictors that may explain the observed between-person variability. Another useful statistic for examining variability is the reliability of the estimate, indicating how much of the between-person variation in observed scores is due to variability in true scores. This index is equivalent to ICC(1), with the within-person variance divided by the number of time periods observed. If the number of measurement points for each individual is the same, then this estimate is referred to as ICC() (Bliese, 000). In this example, the reliability of the sample means is ˆ , indicating that 90% of the between-person variation in observed means is variability in population means. In other words, a large proportion of

4 STOCHASTIC TRENDS 3 the observed mean differences between individuals reflects differences in the underlying population means. Thus, the unconditional means model indicates that the majority of variance is betweenperson, the between-person variation is significant, and the variability in sample means is, in large part, a reflection of the variability in populations means. Unconditional Growth Model The unconditional growth model, as specified by Singer and Willett (003), is where Y ij 0i i1 Time ij ε ij 0i 00 0i 1i 10 1i, () ε ij N 0, ε and 0i 1i N 0 0, , 0i is now the initial status (i.e., intercept) of Y for individual i, 00 is the average initial status of Y across everyone in the population, 1i is the rate of change (i.e., slope) of Y for individual i, 10 is the average rate of change of Y across everyone in the population, ε is the pooled variance of each individuals data around his linear change trajectory, 0i is the intercept random effect for individual i, 0 is the variance of intercept random effects, 1i is the slope random effect for individual i, 1 is the variance of slope random effects, 10 is the population covariance between intercepts and slopes, and all other terms are as defined above. Although we focus on the most common unconditional growth model applied in psychological research, other unconditional growth models are possible, such as those that model nonlinear growth or alternative assumptions about random effects and error structures. A log-likelihood ratio test allows the researcher to evaluate whether the unconditional growth model fits the data better than the unconditional means model. For the data presented in Figure 1, the loglikelihood ratio test is significant, (3, N 50) 85.06, p.05, indicating that the unconditional growth model provides a better representation of the data than the unconditional means model. Following this omnibus evaluation, individual variance components are typically examined. In the example data, the estimated intercept variance, ˆ 0 1.3, (49, N 50) 67.66, p.05, and slope variance, ˆ 1 0.3, (49, N 50) 74.48, p.05, both differ from zero. In other words, there is significant between-person heterogeneity in both intercepts and slopes. The estimated average intercept ( ˆ , p.05) and slope ( ˆ , p.05) do not differ from zero. The reliability of observed intercepts, ˆ , and slopes, ˆ 1 0.8, indicates that a large proportion of differences in sample intercepts and slopes are differences in population intercepts and slopes. Given this pattern of results, additional models would likely be investigated by adding predictors to the model (i.e., conditional growth models) to explain the observed variance in initial status (i.e., intercepts) and rate of change (i.e., slopes) across individuals. The random coefficient, longitudinal modeling process just provided is representative of the presentations that may be found in any one of the hundreds of publications that now exist using this methodology. However, an important assumption concerning the source of the modeled heterogeneity using this class of methods has received virtually no attention. In reality, the data presented in Figure 1 were generated from a completely random process with no individual differences in slope trajectories, and yet, the conclusion reached from the unconditional growth model indicated substantial slope heterogeneity across individuals. This result is entirely spurious because, given the data generating mechanism, any observed variance in slopes is a result of only sampling error and not actual variance in population slopes, as indicated by the slope reliability statistic. The recommended next steps in this modeling effort would be to search for predictors of the observed heterogeneity. However, given the data-generating mechanism responsible for the observed heterogeneity, this search would be erroneous, and any predictors found to significantly reduce the observed heterogeneity must be Type I errors. Thus, random coefficient models may yield spurious results supporting incorrect inferences when applied to completely random longitudinal processes of a certain type. The specific nature of this stochastic process is described in the following section. Random Walks and Spurious Regression Random walks are one of the most commonly encountered and studied stochastic processes, and they are prevalent in virtually every scientific discipline, including computer science models of information search (Tang, Jin, & Zhang, 008), physics models of Brownian motion (Uhlenbeck & Ornstein, 1930), genetics models of genetic drift (Wright, 1931), ecological models of biodiffusion (Skellam, 1951) and population dynamics (Wang & Getz, 007), and economic models of real gross national product and employment (Nelson & Plosser, 198). In psychology, random walks are fundamental to the study of neuronal firing (Gerstein & Mandelbrot, 1964), speeded categorization (Nosofsky & Palmeri, 1997), diffusion models of decision processes (Busemeyer & Townsend, 1993), and consumer behavior such as new product adoption (Eliashberg & Chatterjee, 1986). The trajectories in Figure 1 represent regression lines fitted to realizations of an underlying random walk (with drift) described by Y t Y t 1 ε t, (3) where Y t is the current status on a variable, Y t 1 is the status at time t 1, is a constant (set to 0 for the data in Figure 1) known as drift, and ε t is a series with mean zero and constant variance ε. 1 Note that this variance is distinguished from the residual variance resulting from the application of regression to random walks, ε. At any time point, t, the expected value of such a random walk is Y 0 t (where Y 0 is the initial value), the variance is t ε, and the elements of the covariance matrix,, are j,k min j, k ε. Additional insight into the functioning of random walks is obtained using an equivalent representation of Equation 3 as 1 The random walk model as presented here assumes no measurement error in Y. However, the substantive conclusions drawn in this article do not change when measurement error is included in Y.

5 4 KULJANIN, BRAUN, AND DESHON t Y t Y 0 t ε i. (4) i 1 Starting with an initial value, Y 0, a random walk process is an accumulation of a deterministic trend component (i.e., t) and t error (i.e., i 1 ε i ). It is the accumulation of random errors that results in a stochastic trend. If 0, then the trends in the data are only stochastic in nature. When using regression methods to model a data generating process, it is either explicitly or implicitly assumed that the observed trajectory trends are the sole result of a deterministic process (e.g., Y t t ε t ). Unfortunately, it is difficult to visually distinguish trends resulting from deterministic and stochastic processes. As an example, Figure plots a regression line on data generated with a purely deterministic trend (i.e., Y t t ε t )or t a purely stochastic trend (i.e., Y t Y 0 i 1 ε i ). The similarity of these two trajectories, generated from very different mechanisms, highlights the nature of the problem. It is difficult to determine whether observed trends reflect the functioning of a deterministic or a stochastic process, and the distinction is critical because it has long been recognized that applying regression models to data generating processes that contain stochastic trends results in spurious results and inferences. Nelson and Kang (1984) demonstrated the pitfalls of applying the simple regression model (i.e., Y t t ε t ) to data containing only stochastic trends (i.e., random walks with no drift, t Y t Y 0 i 1 ε i ). Using both mathematical analysis and Monte Carlo simulations, they found that the deterministic time trend (Time) explained 44% of the variation in the random walk, even though the dependent variable, Y, did not in fact depend on Time. The true null hypotheses a 0 and b 0 were rejected in 80% and 87% of 1,000 replications, respectively, at a nominal 5% significance level when there were 100 time periods. When the regression model accounted for the autocorrelation present in the data, the model still incorrectly indicated that Time explained 1% of the variation in the random walk. The true null hypotheses a 0 and b 0 were now rejected with 45% and 58% frequency, respectively. That is, in both cases, the regression model performs poorly by providing inaccurate fixed effects tests and leading to mistaken inferences more often than not. The spurious regression results, found largely in the economics literature, focus on a single longitudinal series analogous to single-subject research in psychology. This existing literature does not address the situation most interesting to psychologists, in which a large set of individuals is observed over time and inference is often directed at trajectory heterogeneity. The implications of the spurious regression results for random coefficient models are developed in the following section. Spurious Random Coefficient Model Results The standard, fixed-effects regression models considered in the spurious regression literature possess a single probability distribution associated with the errors, ε t. Random coefficient models represent a generalization of the fixed-effects model where additional probability distributions are associated with the model coefficients. In this case, a single regression represents a sampled realization of an infinite set of possible regressions that are consistent with the underlying coefficient probability distribution(s). To understand the effects of applying a random coefficient model to a set of trajectories generated by a random walk process, a number of mathematical results are needed. Nelson and Kang (1984) provided some useful mathematics, but the bulk of their results are based on simulation. Durlauf and Phillips (1988) provided a rigorous mathematical foundation (i.e., the coefficient sampling distributions) for the spurious regression simulation results described in Nelson and Kang (1984). Using this work, we discuss the expected random coefficient regression results when the unconditional means model and the unconditional growth model are fit to random walks (i.e., stochastic trends). For the unconditional growth model, the fixed effects estimates and tests are discussed first followed by a discussion of the variance component estimates and tests. Finally, to keep the presentation as simple as possible, the current focus is on random walks with no drift. Generalization to random walks with drift is straightforward using the mathematical results discussed in the next two sections, unless otherwise noted. Unconditional Means Model When applied to longitudinal data, the primary purpose of the unconditional means model is to estimate the variance between and within individual trajectories. The resulting variance components are typically interpreted using the ICC(1). A surprising result occurs when the unconditional means model is used to summarize data generated by an underlying random walk process with no drift. As highlighted above, the variance of Y t for a set of random walk trajectories at any point in time is t ε. Assuming that each trajectory is of equal length, the total variance in a longitudinal sample of trajectories is simply the average of these values at each time point Ytot 1 ε ε T ε, (5) T where T is the length of all trajectories. This simplifies to T 1 ε. Nelson and Kang (1984) showed that the within trajectory (i.e., individual) variance in the sample is T 1 6 ε. The between trajectory variance may then be determined by subtracting the within trajectory variance from the total variance resulting in T 1 3 ε as the approximate between-person variance. As a result, the ICC(1) is closely approximated by ICC 1 T 1 3 ε T 1 3 ε 6 T 1 ε 3. (6) Terms such as random, disturbance, and stochastic process may need some clarification. A disturbance is a random realization from a distribution of possible values. A stochastic process is the evolution of a trajectory subject to disturbances at each point in time (Basu, 003).

6 STOCHASTIC TRENDS 5 DeterministicTrend StochasticTrend Response Response Time Time Figure. Modeling deterministic and stochastic trends with a regression line. In words, the ICC(1) for the unconditional means model when applied to trajectories generated by a random walk with no drift will always be approximately This value could serve the function of a diagnostic for the inappropriate application of a random coefficient model applied to random walks. Unfortunately, more complex random walk processes, such as random walks with drift, result in different values for ICC(1). Using the equation for ICC(1), the reliability of the sample means is then closely approximated by 0 T 1 3 ε T 1 3 ε T 1 T 6T T 1. (7) ε Therefore, the reliability of the sample means will approach 1 as the length of the trajectories increases. Unconditional Growth Model Fixed effects: Estimates. Durlauf and Phillips (1988) derived the asymptotic theory for applying simple regression (i.e., Y t t t ε t ) to random walks with no drift (i.e., Y t Y 0 i 1 ε i ). Here, the initial condition, Y 0, and error, ε t, for the random walks are assumed to be realizations from a normal distribution with a mean of zero and a constant variance, ε. In such a case, Durlauf and Phillips showed that the expected value of both the intercept and slope is 0 for the simple regression applied to random walks. Therefore, the estimate of the intercept (i.e., ˆ ) and slope (i.e., ˆ) is unbiased and will, on average, equal the true intercept (i.e., 0) and slope (i.e., 0) of the data-generating mechanism (i.e., initial condition and drift of the random walk). This mathematical result is supported by the simulation results provided in Nelson and Kang (1984), where the average intercept (i.e., ˆ ) and slope (i.e., ˆ) estimates across 1,000 replications were effectively 0. Thus, we expect that, on average, the fixed effects in the unconditional growth model, ˆ 00 and ˆ 10, will equal the true fixed effects, 00 and 10, which are both 0. Fixed effects: Tests. Nelson and Kang (1984) found that the statistical tests on the fixed effects in the simple regression model did not perform well, as the true null hypotheses a 0 and b 0 were rejected with frequency 80% and 87%, respectively, at a 5% significance level. Durlauf and Phillips (1988) found that the tests for both the intercept and slope diverge as the number of time periods increases. These results occur because the standard errors for the both the intercept and slope greatly underestimate the actual standard deviation of their respective sampling distributions. However, as we show in Appendix A, the standard errors for the fixed effects in the unconditional growth model closely approximate the standard deviation of the sampling distributions for the fixed effects 00 and 10. Therefore, we expect that the tests on the fixed effects in the unconditional growth model will perform well and not exceed the nominal Type I error rate. Variance components: Estimates. The variance of the single time series regression parameters is equal to the variance of intercepts and slopes in the unconditional growth model. Durlauf and Phillips (1988) provided the needed distributional theory, showing that the variances of the intercepts and slopes, respectively, are 0 T 15 ε, (8)

7 6 KULJANIN, BRAUN, AND DESHON 1 6 5T ε. (9) The simulation results in Nelson and Kang (1984) are consistent with these values. Therefore, as the number of time periods increases, the variance of intercepts will tend to approach infinity, whereas the variance of slopes will tend to approach zero. If the initial conditions in the random walks are allowed to vary randomly and the variance of the drift is set to zero, then intercept variance is either under- or overestimated, and slope variance is only correct when many time periods are observed. Variance components: Tests. Single-parameters tests for the variance components 0 and 1 are discussed by Raudenbush and Bryk (00). Given the variance estimates from Durlauf and Phillips (1988) and the residual variance estimate from Nelson and Kang (1984), the behavior of the variance components tests is discussed in Appendix B. When the unconditional growth model is fit to random walks, the variance component tests are effectively increasing functions of the number of time periods and individuals sampled. This is not surprising for the variance of intercepts, because the intercept variance increases without bound as the number of time periods observed increases. More surprising, the variance of slopes decreases as the number of time periods observed increases, but the test on the slope variance diverges. Thus, it would lead researchers to reject the null-hypothesis of no variance, even if the estimate of slope variance is very small, as is the case when the number of time periods is large. Model fit statistics. A number of additional statistics are often reported for the unconditional growth model to evaluate model fit. Implications for the reliability of intercepts and slopes, the likelihood ratio test, and the pseudo-r are presented here. The problem examined here is fundamentally one of model misspecification, and as a result, virtually all known model fit indices result in similarly inaccurate results. For the unconditional growth model, the reliability of intercepts is approximately 0 T 15 ε. (10) T 15 4T ε 15T ε The second term in the denominator is rapidly dominated by the first term as T increases and, as a result, the reliability of intercepts approaches 1.0. The reliability of slopes is approximately 1 6 5T ε 6 5T ε. (11) 4 5T T 1 ε As is the case for the reliability of intercepts, the second term in the denominator is rapidly dominated by the first term as T increases, and thus, the reliability of slopes also approaches 1.0. The likelihood ratio chi-square test ( ) is used to evaluate whether the unconditional growth model yields improved model fit relative to the unconditional means model. If the trajectory trends are purely stochastic then a deterministic Time trend has no impact on the response variable. In this case, the null hypothesis associated with the test is true (i.e., the unconditional growth model should not result in improved model fit relative to the unconditional means model), and the test should maintain the stated alpha rate. As discussed above, however, the test associated with the slope variance increases without bound as the number of time periods increases, and so the likelihood ratio test will fail to maintain the selected Type I error rate. Despite many ambiguities and inconsistencies, it is increasingly common to report a pseudo-r as an index of effect size for random coefficient models. One approach to this index (e.g., Snijders & Bosker, 1999) is to compare the error variances associated with the unconditional means model ( ε1 ) with the error variance resulting from the unconditional growth model ( ε ). For the random walks with no drift case, the expected value is pseudo-r Within ε 1 ε ε1 T 1 6 ε T 1 T 1 6 ε 15 ε 3 5, (1) which estimates the amount of within-variance in the response variable explained by Time. Although Y is not a function of Time in the data-generating mechanism, the pseudo-r statistic indicates that Time is responsible for 60% of the within-trajectory variance. Summary of Mathematical Results To summarize, when a random coefficient model is used to represent trends that are due, at least in part, to a stochastic process, the fixed effects estimates in the unconditional growth model are unbiased, and the tests on the fixed effects maintain nominal alpha levels. In contrast, the estimates of the intercept and slope variance will largely be inaccurate, and their tests will lead researchers to conclude that there is significant variance in intercepts and slopes. In addition, the model fit indices will indicate that the unconditional growth model better explains the observed trajectories than the unconditional means model. These results would likely result in an inappropriate search for predictors of individual differences in trajectories. Because random walks result in random trends, any predictors found to be significant must be Type I errors. Monte Carlo Evidence for Unconditional Growth Models The mathematical results presented above are clear and general. To verify the mathematical results and to provide estimates of the magnitude of the problem under realistic conditions, a small set of Monte Carlo simulations are now presented. Using the statistical software R (Development Core Team, 008), random walks were generated by Equation 3 with the drift parameter set to zero and ε t sampled from a standard normal distribution ( 0, 1.0) including the initial value (i.e., Y 0 ). This implies that the initial conditions for the trajectories (i.e., intercepts in regression terminology) have an expected value of zero and a variance equal to one, and the drift (i.e., slope in regression terminology) has an

8 STOCHASTIC TRENDS 7 expected value and variance of zero. Four simulated conditions were examined by varying the number of time periods (five or 0) and the number of individuals (50 or 100) for both the unconditional means and growth models. These values are consistent with the sample size and length of longitudinal research designs in psychology. For each combination of simulation parameters 1,000 data sets were generated, and both the unconditional means and growth models were fit to the resulting data sets. Table 1 presents the simulation results for the unconditional means model, and Table presents the results of the unconditional growth model. Unconditional Means Model When 50 random walks (e.g., individuals) are observed over five time points, and the unconditional means model is fit, the average fixed effect is ˆ , and the rejection rate of the true null hypothesis, 00 0, is 6.7% at the 5% nominal significance level. Thus, the observed rejection rate approximated the nominal significance level. However, the average variance of the means is ˆ and is significant in every replication. The average reliability of the observed means is ˆ , implying that 90% of the interindividual differences in observed means are interindividual differences in true (population) means. Thus, a researcher may conclude not only that is there significant variation in observed means but also that this variation of observed means captures a large proportion of variance in true means. As expected, the average ICC(1) 0.66, and thus, researchers would conclude that 66% of the variance in the dependent variable is attributable to between-person variance. Examining the other simulations in Table 1, where the number of time periods or sample size increases, the average grand mean is effectively zero and the rejection rate maintains the nominal 5% significance level, the variance of means is always significant, the reliability of observed means increases as the number of time periods increases, and the average ICC(1) The simulations of fitting the unconditional means model to random walks suggests that researchers would conclude that there is heterogeneity in observed means and that much of this heterogeneity is variability in true means. Unconditional Growth Model Now consider fitting the unconditional growth model to 50 random walks observed over five time points. The average fixed effects are ˆ and ˆ , and the rejection rates of the true null hypotheses, 00 0 and 10 0, are 6.8% and 5.4%, respectively, using a 5% Type I error rate ( ). Thus, observed rejection rates maintained the nominal rejection rates as the standard errors of the fixed effects closely approximate their standard deviations. The average variance of the intercepts is ˆ 0.95, and 0 slopes ˆ 1 0.1, and both are significant in every replication. Both of the variance component estimates are incorrect, as the variance of intercepts was set to one and the variance of slopes was set to zero. The fact that the intercept variance was close to one is clearly due to the particular number of time periods used in this condition, and, as can be seen in Table, more observations results in clearly inaccurate results. The average reliability of the observed intercepts is ˆ , and slopes ˆ , which implies that about 80% of the interindividual differences in observed intercepts and slopes are interindividual differences in true (population) intercepts and slopes. Thus, researchers would conclude that much of the variation in observed intercepts and slopes is variability in true intercepts and slopes. However, as previously mentioned, there is no variability in slopes. The average log-likelihood ratio comparing the fit of the unconditional growth model with the unconditional means model was 3, N 50) and was significant for every replication. Based on this result, the natural but inaccurate conclusion would be that the unconditional growth model is a better representation of the data than the unconditional means model. This result is largely due to the significance of the slope variance, which, again, does not exist in the actual data. Together, these results indicate that random coefficient models will lead researchers to conclude that there is between-person variation in intercepts and slopes, that this observed variation largely represents true variation, and, ultimately, that a search for predictors of the observed heterogeneity is warranted. Given the data generating mechanism, any predictors identified as significant must be Type I errors. Examining the other simulations presented in Table, when the number of time periods or sample size increases, the average intercepts and slopes are all close to the true value of zero, and the observed rejection rates are close to the nominal 5% significance level. The amount of variance in intercepts increases, whereas the variance in slopes decreases as the number of time periods observed increases, and they both approach the expected values from Equations 8 and 9. The variance of intercepts and slopes is significant in every replication, the reliabilities of the observed intercepts and slopes increases as the number of time Table 1 Random Coefficient Model Parameter Estimates and Tests For the Unconditional Means Model Across 1,000 Replications N T 00 S 00 SE 00 p 00 0 p 0 ε ICC (0.1) (0.37) (0.14) (0.7) Note. T length of each time series; 00 average estimate of grand mean; S 00 standard deviation of grand mean; SE 00 standard error of grand mean; p 00 rejection rate of hypothesis test on the grand mean; 0 average variance of individual means; p 0 rejection rate of hypothesis test on the variance of individual means; ε average within-person variance; ICC 1 intraclass correlation coefficient; 0 average reliability of sample means.

9 8 KULJANIN, BRAUN, AND DESHON Table Random Coefficient Model Parameter Estimates and Tests for the Unconditional Growth Model Across 1,000 Replications Fixed effects Variance components Model statistics N T 00 (S 00 ) SE 00 p (S 10 ) SE 10 p 10 0 p 0 1 p 1 ε 01 p p pr w (0.16) (0.07) (0.5) (0.04) (0.11) (0.05) (0.18) (0.0) Note. T length of each time series; 00 average estimate of population intercept; S 00 standard deviation of population intercept; SE 00 standard error of intercept estimate; p 00 rejection rate of hypothesis test on the population intercept; 10 average estimate of population slope; S 10 standard deviation of population slope; SE 10 standard error of slope estimate; p 10 rejection rate of hypothesis test on the population slope; 0 average variance of intercepts; p 0 rejection rate of hypothesis test on the variance of intercepts; 1 average variance of slopes; p 1 rejection rate of hypothesis test on the variance of slopes; ε average within-person variance around the linear trajectories; 01 average correlation between intercepts and slopes; p 01 rejection rate of hypothesis test on correlation between intercepts and slopes; 0 average reliability of sample intercepts; 1 average reliability of sample slopes; p rejection rate of the hypothesis test on the difference between model fit of unconditional means and unconditional growth models; pr w proportion of within person variance explained by Time. points increases, and the log-likelihood ratio test is significant in every replication. All of these results follow expectations, and without regard to the size of the longitudinal data set, these results indicate that statistics from random coefficient models would mislead researchers into believing that much of the significant variation in intercepts and slopes is attributable to variation in true intercepts and slopes. However, the true slope for each person in every simulation and replication is zero. Thus, there is no variability in true slopes. The observed variance in intercepts and slopes would lead researchers to model explanatory variables of this variance. Consistent with this conclusion, researchers in 46 of 73 (i.e., approximately 63%) journal articles in our literature review used the tests on variance components in their models to justify the search for predictors of intercept or slope heterogeneity. If the underlying data-generating mechanism is random, as is the case here, this search can only result in the identification of predictors of heterogeneity that are entirely spurious. Monte Carlo Evidence for Conditional Growth Models The simulation results presented above highlight the consequences of fitting unconditional models to a stochastic process. The results for these models indicate that the variance component estimates and tests are inaccurate, whereas the fixed effect estimates and tests are accurate. Investigations of growth typically focus on predictors of heterogeneity in the random effects, and the potential predictors of heterogeneity are modeled as fixed effects in random coefficient and latent growth curve models. The fact that the fixed effects estimates and tests are well-behaved in the results presented above may result in a mistaken belief that random coefficient and latent growth curve models protect against finding spurious predictors of random effect heterogeneity. The following simulations evaluate the accuracy of this inference. The focus here is to examine what happens to the tests on fixed effects and variance components when a Level predictor is incorporated into the unconditional growth model. In our literature review of 73 journal articles utilizing random coefficient or latent growth curve models, we identified two distinct approaches for incorporating predictors into the unconditional growth model. In 45 (i.e., approximately 6%) journal articles researchers added a Level predictor as a fixed effect to the unconditional growth model. However, in the other 8 journal articles, researchers examined predictors of heterogeneity by including a Level predictor as a fixed effect and excluding the random slope effects. To reflect this practice, we examined four new simulated conditions where stochastic processes were created as described in the previous section with the same set of time (five or 0) and sample size (50 or 100) conditions. Two conditional growth models were then fit to the resulting data by including a Level predictor in Equation as a predictor of both intercepts and slopes. Then, either the random effect of slopes was included (i.e., 1i P i 1i, where P is for predictor) or excluded (i.e., 1i P i ) in the conditional growth model. Without loss of generality to continuous predictors, values of 0 or 1 were randomly assigned with equal probability to each random walk, which, in substantive research, may represent dichotomous predictors, such as gender, race, or experimental condition, among other variables. Conditional Growth Model Without a Random Effect for Slopes In this model, there are four fixed effects and random effects for intercepts. Focusing on the accuracy of the tests on fixed effects, the results in Table 3 indicate that when a conditional model without a random effect for slopes is fit to random walks, the fixed effects tests for all four fixed effects are not close to the nominal 5% significance level. Bradley (1978) proposed a liberal and a stringent criterion for robustness of a statistical test. According to his liberal criterion, a test is robust if the probability of a Type I error is between 0.5 (i.e., 0.05) and 1.5 (i.e., 0.075). The average slope (i.e., 10 ) and the moderating effect (i.e., 11 ) of the Level predictor are well above the upper limit of the liberal criterion, and the fixed effects tests on those two parameters become less accurate with longer time series. On the other hand, the fixed effects tests on the average intercept (i.e., 00 ) and the main effect (i.e., 01 ) of the Level predictor are well below the lower limit of the liberal criterion and similarly become less accurate with longer time series. These results are particularly interesting when one considers that, in the actual data-generating mechanism, there are

10 STOCHASTIC TRENDS 9 Table 3 Random Coefficient Model Parameter Estimates and Tests for the Conditional Growth Model Without a Random Effect for Slopes Across 1,000 Replications N T Fixed effects Variance components 00 SE 00 p SE 01 p SE 10 p SE 11 p 11 0 p 0 ε Note. T length of each time series; 00 average estimate of population intercept; SE 00 standard error of intercept estimate; p 00 rejection rate of hypothesis test on the population intercept; 01 estimate of main effect of level predictor; SE 01 standard error of main effect of level predictor; p 01 rejection rate of hypothesis test on the main effect of level predictor; 10 average estimate of population slope; SE 10 standard error of slope estimate; p 10 rejection rate of hypothesis test on the population slope; 11 estimate of moderating effect of level predictor; SE 11 standard error of moderating effect of level predictor; p 11 rejection rate of hypothesis test on the moderating effect of level predictor; 0 average variance of intercepts; p 0 rejection rate of hypothesis test on the variance of intercepts; ε average within-person variance around the linear trajectories. no differences in slopes because every random walk is generated without drift. Although this conditional growth model correctly specifies that there is no variance in slopes, it suffers from highly inflated Type I error rates for the fixed effects tests related to slopes. Conditional Growth Model With a Random Effect for Slopes When the conditional growth model includes the random effect for slopes, then the results in Table 4 indicate that the previously problematic fixed effects tests on the average slope (i.e., 10 ) and the moderating effect (i.e., 11 ) of the Level predictor are accurate. The previously conservative tests on the average intercept (i.e., 00 ) and the main effect (i.e., 01 ) of the Level predictor are now at the 5% nominal significance level as well. Although the fixed effects tests pertaining to slopes are accurate, the variance component test for slopes is inaccurate as it indicates significant variance between slopes even though the data-generating mechanism does not contain any heterogeneity in slopes. Thus, when a random coefficient model appropriately excludes heterogeneity in slopes, the model becomes inaccurate in its fixed effects tests (see Table 3). On the other hand, when a random coefficient model includes a random effect for slopes (i.e., growth heterogeneity), the test on this random effect indicates that it is inaccurately significant (see Table 4). In either case, the random coefficient model is incapable of correctly representing the data-generating mechanism when that mechanism contains a stochastic trend. The simulation results from the unconditional growth model indicate that the variance components for the random effects of intercepts and slopes are always significant if stochastic trends are present in the data. As discussed above, our literature review demonstrated that researchers relied on this evidence to justify their search for predictors of heterogeneity of intercepts and slopes. The results in Table 4 indicate that the Type I error rate for each fixed effect test maintains the nominal alpha level when a random effect for slopes is included in a conditional growth model. Because the primary focus of conditional models is the fixed effects tests of predictors, this may lead researchers to the mistaken belief that, as long as all random effects are included, the random coefficient model behaves well. Although the probability of a Type I error for a single predictor of intercept or slope heterogeneity Table 4 Random Coefficient Model Parameter Estimates and Tests for the Conditional Growth Model With a Random Effect for Slopes Across 1,000 Replications N T Fixed effects 00 SE 00 p SE 01 p SE 10 p SE 11 p 11 0 p 0 1 Variance components p 1 ε Note. T length of each time series; 00 average estimate of population intercept; SE 00 standard error of intercept estimate; p 00 rejection rate of hypothesis test on the population intercept; 01 estimate of main effect of level predictor; SE 01 standard error of main effect of level predictor; p 01 rejection rate of hypothesis test on the main effect of level predictor; 10 average estimate of population slope; SE 10 standard error of slope estimate; p 10 rejection rate of hypothesis test on the population slope; 11 estimate of moderating effect of level predictor; SE 11 standard error of moderating effect of level predictor; p 11 rejection rate of hypothesis test on the moderating effect of level predictor; 0 average variance of intercepts; p 0 rejection rate of hypothesis test on the variance of intercepts; 1 average variance of slopes; p 1 rejection rate of hypothesis test on the variance of slopes; ε average within-person variance around the linear trajectories; 01 average correlation between intercepts and slopes; p 01 rejection rate of hypothesis test on correlation between intercepts and slopes. 01 p 01

11 10 KULJANIN, BRAUN, AND DESHON may be 5%, as suggested by our simulation results, our literature review indicated that researchers typically include somewhere between four and eight predictors of heterogeneity of intercepts and slopes. As is the case for other modeling techniques, such as regression and structural equations models, the probability of obtaining at least one Type I error increases well beyond the nominal 5% level with several predictors in a model. If eight independent tests are conducted at a nominal 5% level, then the probability of at least one Type I error is [1 (1 0.05) 8 ] 0.34, or 34%. If the tests are dependent, then the probability of at least one Type I error may be lower or higher than this value. In any case, if stochastic trends are present in the data, researchers cannot rely on random coefficient or latent growth curve models to protect against inferential mistakes, even when all random effects are modeled, because each examined predictor increases the risk of finding at least one spurious predictor of intercept or slope heterogeneity. Recommendations Random walks and stochastic trends, as previously mentioned, are encountered in virtually all scientific disciplines. In particular, economists commonly deal with stochastic trends in their data. Because of the statistical and inferential problems that arise in regression models as a result of stochastic trends, economists first attempt to identify whether the trends present in their data are due to a deterministic or stochastic process before applying a statistical model. To do so, economists use one or more statistical tests known as unit root tests. If a unit root test indicates that the trends present in the data are not stochastic, then economists frequently use regression models similar to those found in psychology. However, if a unit root test indicates that a time series is not distinguishable from a random walk, then economists use an alternative class of models that reflect the stochastic trend in the data. Further research is needed to evaluate comprehensively whether this practice of using unit root tests as a precondition of applying regressionbased models is appropriate for psychological data structures. For now, we recommend that longitudinal data analysts report the results of one or more unit root tests and appropriately qualify inferences from random coefficient models if it is found that the trends present in the data are not distinguishable from random walks. One example of a unit root test is the augmented Dickey-Fuller (ADF) test (Dickey & Fuller, 1979; Said & Dickey, 1984). It is the most common method used in economics to distinguish between deterministic and stochastic trends in growth data. For a single time series, the equation for the ADF is p Y t Y t 1 p Y t p ε t, (13) i 1 where Y t is the series, Y t Y t Y t 1, is the drift, 0is the null hypothesis associated with a random walk, p is the lag order of the autoregressive process, p are the structural autoregressive effects, and ε t is the error term. If the null hypothesis, 0, is rejected, then the series is distinguishable from a random walk with drift. To run the test, the analyst needs to determine the lag structure of the time series and if there is a drift. This is not a high-power test, and estimating unnecessary parameters for long lags and drift wastes degrees of freedom. The lag structure of a time series is investigated by looking at the autocorrelation and partial autocorrelation functions, whereas the existence of drift in the series is generally assessed visually. The free statistical software R includes the ADF in its set of analytical techniques as well as the autocorrelation and partial autocorrelation functions. The standard ADF test is used to evaluate the trends present in a single trajectory. Psychologists generally gather data on several individuals, and a panel version of the ADF is needed to determine whether the sample of trajectories, as a whole, is distinguishable from multiple random walks. To use the test, it is necessary to examine the autocorrelation and partial autocorrelation functions of each series to determine the most common lag structure across the sample and to determine whether drift exists in at least the majority of the series. This process is described in most introductory time series texts (e.g., Enders, 1989). Once these decisions are made, the panel version of the ADF test developed by Im, Pesaran, and Shin (003) is computed by applying the standard ADF on each series and then taking the average value of ˆ in Equation 13. This average value is compared with a percentile (e.g., 90th or 95th) from the distribution of estimated unit roots (i.e., ) on random walks for the specified lag order, drift, and length and number of time series (see Im et al., 003). An example of running the panel ADF test is given in Appendix C using one of the data sets discussed in the next section. If stochastic trends are present in psychological data, then the common longitudinal methods (i.e., latent growth curve and random coefficient models) used in psychology do not adequately capture the data-generating mechanism. As previously mentioned, economists use methods that are capable of modeling stochastic trends. Most commonly, economists use autoregressive integrated moving average (ARIMA) models. The dependent variable in these models is differenced the appropriate number of times to produce a series without a stochastic trend. Then researchers estimate the desired model parameters (Enders, 1989). Although this is the dominant approach in economics, it is primarily useful for a single time series. Therefore, ARIMA methods are not particularly useful for the typical longitudinal data structures found in psychology. Less common, but perhaps more promising, in economics is the use of state space models and seemingly unrelated time series equations (Chu & Durango-Cohen, 008; Harvey & Shephard, 1993). Chow, Ho, Hamaker, and Dolan (010) and Yang and Chow (010) provide examples of state-space analyses using psychological data. These methods make it possible to model simultaneously deterministic and stochastic trends, while offering the flexibility to include predictors of deterministic trend components. Although these models are most commonly applied to single time series, they may also be used to examine time series obtained from multiple participants (see Harvey & Koopman, 1996). State space models provide a general longitudinal modeling framework, and psychologists may find them useful, even when their data do not contain stochastic trends.

Christopher Dougherty London School of Economics and Political Science

Introduction to Econometrics FIFTH EDITION Christopher Dougherty London School of Economics and Political Science OXFORD UNIVERSITY PRESS Contents INTRODU CTION 1 Why study econometrics? 1 Aim of this