RT Analysis 2 Contents Abstract 3 Analysis of Response Time Distributions 4 Characterization of Random Variables 5 Density Function

Size: px

Start display at page:

Download "RT Analysis 2 Contents Abstract 3 Analysis of Response Time Distributions 4 Characterization of Random Variables 5 Density Function"

Shawn Hunter
6 years ago
Views:

1 RT Analysis 1 Analysis of Response Time Distributions Trisha Van Zandt Ohio State University Running Head: RT Distributions Word Count: Approximately 25,000 Address correspondence to: Trisha Van Zandt Psychology Department Ohio State University 1885 Neil Avenue Mall Columbus, OH

2 RT Analysis 2 Contents Abstract 3 Analysis of Response Time Distributions 4 Characterization of Random Variables 5 Density Function Distribution Function Survivor and Hazard Functions Estimation 10 Properties of Estimators Parameter Estimation Nonparametric Function Estimation The Density Function The Hazard Function Model Fitting Summary Model and Hypothesis Testing 36 Orderings of Distributions RT Decomposition Mixture Distributions Concluding Remarks 60 References 64 Appendix 73

3 RT Analysis 3 Abstract This chapter addresses the problem of response time analysis, including estimates of the mean and variance, outlier techniques, estimation of distribution parameters, and function estimation. The use of distributional analysis in testing processing models is also discussed.

4 RT Analysis 4 Analysis of Response Time Distributions Response time is ubiquitous in experimental psychology. It is perhaps the most important measure used to investigate hypotheses about mental processing. Fifteen of the 24 studies published in the February, 2000 issue of the Journal of Experimental Psychology: Human Perception and Performance used some form of response time analysis. Not restricted to cognitive psychology, response times (RTs) are collected routinely in empirical investigations in biological, social, developmental, and clinical psychology as well. RTs are collected from both human and animal subjects. Over 27,000 abstracts in the PsychInfo database spanning from 1887 to the end of April, 2000 make reference to reaction or response time or latency. One reason that RTs are so important in experimental psychology is because they, like other physical measurements such as length or weight or force, are dened on a ratio scale (Townsend, 1992). This means that we can bring all possible mathematical machinery to bear on the analysis of RT. We can devise precise mathematical models of cognitive processing, make predictions about process durations as a function of physical (numerical) variables in the experimental context, and how those processes change with changes in these variables. Since Kinnebrook's brief tenure in Maskelyne's observatory in 1796 (Mollon & Perkins, 1996), variations in RT have seemed to be a clean and easy way to get at how mental processing unfolds over time. Although the use of RTs in cognitive research has been established for over a century, basic issues of analysis still arise. Techniques such as Neyman-Pearson null hypothesis testing are routinely applied to RTs, even though the basic assumptions required for such analyses, such as normality and independence, are known to be violated. 1 RT distributions are not normally distributed; rather, they are positively skewed. Individual RTs are not typically independent of one another; some trial-by-trial sequential eects persist even in the most carefully controlled experiments. For these and other reasons, RT analysis is not always as straightforward as it appears. 1 These violations are particularly obvious in the analysis of single-subject data, where raw RTs are subjected to Neyman-Pearson methods. However, even in the more common group analysis, where the mean RTs from individual subjects are used to compute the test statistics, the critical assumption of identically distributed observations is violated, even though the individual means can safely be assumed to be normal.

5 RT Analysis 5 For the most part, RT analyses occur at the level of the means. That is, hypotheses are formulated with regard to predicted average increases or decreases in RT, as opposed to eects on less obvious parameters of the RT distribution such as skew. Of the fteen studies in the issue of the journal cited above, twelve reported only mean RTs. The remaining three studies investigated the RT distributions in greater depth. This disparity actually reects a recent trend: it is becoming more and more important to consider the overall distribution, as many important and interesting ndings are typically hidden when collapsing observations into averages. Some of my colleagues have seen changes in patterns of RTs in their data across conditions but faced the frustration that ANOVAs on the mean RTs failed to show signicant main eects. They then performed the same ANOVAs on the RT medians, or on some trimmed or otherwise modied averages, in an attempt to demonstrate for some measure of central tendency what was obvious at the level of the distribution. The purpose of this chapter is to provide an outline of RT analyses, particularly at the distributional level. Space constrains me from writing a complete primer or handbook of such analyses, but appropriate references will be provided along the way. There are many statistical and mathematical packages that can help those who are less skilled at programming to perform these analyses, and pointers to these packages will also be provided. In particular, I have provided in an appendix to this chapter some MATLAB routines that can perform some of the analyses we will examine. I will begin by discussing RTs as random variables and the ways that random variables can be characterized. I will present problems of estimation, not only of central tendency but also of the parameters of a given theoretical distribution. I will also discuss estimation of the functional forms of the distribution. In the second half of the paper, I will give some thought to how distributional estimates can be used to test theories about the structure of processing. Characterization of Random Variables RTs are random variables. A sample of RTs, such as might be collected in an experiment (under xed conditions), is drawn at random from the entire population or distribution of possible RTs. The observations in the sample are assumed to be independent and

6 RT Analysis 6 identically distributed, or iid. That is, each observation was taken from exactly the same distribution as each other observation (there are no changes in the distributional parameters or in the form of the distribution across trials), and the observations are statistically independent from each other (so the probability of the i th observation is unaected by the value of the j th observation). The iid assumption is important for a number of reasons, not the least of which is the desire that the many trials within identical experimental conditions all have exactly the same eect on the human observer. (Note, however, that the independence assumption is likely to be violated by sequential eects and parameter drift (Burbeck & Luce, 1982). Unfortunately, we don't have the tools to deal with this problem. We must assume that the covariances between observations are small enough that they can be neglected.) A random variable can be characterized in a number of ways. One way is to specify parameters of the distribution from which the variable is sampled, such as its mean or variance. Another way is to specify the functional form of the distribution itself. There are several useful functional forms, including the density, distribution, survivor and hazard functions. Each of these dierent functions describes a dierent aspect of the behavior of the random variable, but the relationship between the functions is unique. That is, once a particular density function has been specied, for example, the distribution, survivor and hazard functions are then determined. Random variables from dierent distributions cannot have the same functional form, regardless of which kind of function is examined. Therefore, only one of these functions needs to be specied to completely characterize the behavior of the random variable. I will now discuss each functional form and the relationships between them. For additional information, the reader might consult Luce (Luce, 1986), Townsend and Ashby (1983), or any good textbook on probability theory (e.g., Hoel, Port & Stone, 1971). Density Function The density functions f(t) for an ex-gaussian (the sum of a normal and an exponential random variable) and a Wald random variable are shown in Figure 1, top panel. The x-axis is time, or potential values of RT, and the y-axis is the height of the density function f(t).

7 RT Analysis 7 If RT were a discrete random variable, the height of the function at a point t would give the probability of observing an RT equal to t. However, RT is generally considered to be a continuous random variable, and therefore the value of the density function does not represent probability. Instead, it is the area under the density function that gives measures of probability. For example, suppose that we want to know the probability of observing an RT between values a and b. This probability is measured by integrating the density function between the values a and b: P (a RT b) = Z b a f(t)dt: More generally, any positive function f that has total area of 1 when integrated from negative to positive innity is a density function. For RT data, the shapes of the empirical (estimated) density functions are typically unimodal and positively skewed, like those shown in Figure 1. The majority of the observations are generally fast, but a large proportion of the observations are slower, pulling the positive tail of the density function to the right. There are a number of distributions that have density functions of this form, including the gamma, Wald, and Weibull distributions, but by far the most popular characterization of RT densities is that of the ex-gaussian (e.g., Balota & Spieler, 1999; Heathcote, Popiel & Mewhort, 1991; Ratcli & Murdock, 1976). The ex-gaussian density can capture a very large number of shapes from almost symmetric (like the normal density) to very asymmetric (like the exponential density, which is shown in Figure 2). The density function is a useful way of characterizing RTs because it is an intuitive way to think about how likely dierent RTs are to be observed. The shape of the density function can be useful in discriminating between some classes of random variables, but it should be noted that many unimodal, positively skewed distributions exist, and many are exible enough that they can look very similar to each other. This is demonstrated by the close correspondence between the ex-gaussian and Wald densities in the gure. I will talk again about similarities between distributions at the close of this chapter.

8 RT Analysis 8 Distribution Function The cumulative distribution function (CDF) F (t) of the same ex-gaussian and Wald random variables are shown in Figure 1, center panel. The x-axis gives the possible values of RT, and the y-axis gives the values of the CDF F (t). For a particular point t along the x-axis, the value of F (t) gives the probability of observing an RT less than the value t: F (t) = P (RT < t): The CDF is found from the density function by integration from?1 to the point t: F (t) = Z t?1 f(u)du: Any positive function F (t) that is nondecreasing and has the properties that lim t!?1 F (t) = 0 and lim t!1 F (t) = 1, is a CDF. The CDF gives the percentile ranks of each possible value of RT. In Figure 1, for example, the median RT can been found by drawing a horizontal line from the.50 point on the y-axis and dropping a vertical line to the x-axis at the point of intersection with the desired curve F (t). This relationship between the CDF and the quantiles of the distribution is particularly useful for simulating dierent random variables. To simulate a random variable RT with a known CDF F (t), select a point u on the y-axis at random (using a uniform [0; 1] generator), and then compute the corresponding quantile RT = F?1 (u). Inverting F may require numerical methods, but is typically straightforward. Unlike the density function, the CDF is not as useful in distinguishing between dierent kinds of random variables. Most CDFs are S-shaped; all distributions increase from 0 to 1. Therefore all CDFs tend to look similar regardless of the random variables they describe. For example, in Figure 1, the Wald and ex-gaussian CDFs are practically identical. Survivor and Hazard Functions The survivor and hazard functions are two alternative functions that characterize a random variable. These functions arise frequently in reliability and survival analysis (when machines break or people die), hence the choice of vocabulary.

9 RT Analysis 9 The survivor function F (t) is the probability that the \lifetime" of an object is at least t, that is, the probability that failure occurs after t. In terms of RT, F (t) = P (RT > t) = 1? F (t) : the survivor function is simply one minus the CDF. The hazard function h(t) gives the likelihood that an event will occur in the next small interval dt in time, given that it has not occurred before that point in time. From the denition of a conditional probability, h(t) = lim dt!0 P (t RT t + dtjrt t)=dt = f(t) F (t) : The hazard function for the ex-gaussian and Wald distributions are shown in Figure 1, bottom panel. When F (t) is dierentiable, the hazard function can be expressed as a function of the survivor function: h(t) =? d ln F (t); dt where ln indicates the natural logarithm. The function? ln F (t) is called the log survivor function or integrated hazard function, and it can be useful for distinguishing between dierent distributions. We will later make use of it to answer questions about certain properties of cognitive processes. The density function can also be expressed in terms of the hazard function: Z t f(t) = h(t) exp? h(u)du :?1 Some attention has been paid to hazard function analysis in RT work, because the hazard function can frequently be very useful in discriminating between dierent random variables (Ashby, Tein & Balakrishnan, 1993; Bloxom, 1984; Colonius, 1988; Luce, 1986; Maddox, Ashby & Gottlob, 1998; Thomas, 1971). Although it is very dicult to discriminate between the ex-gaussian and Wald densities shown in the top panel of Figure 1, the dierences between the two variables become clear by examining the hazard functions in the bottom panel. Whereas the ex-gaussian hazard function increases to an asymptote, the Wald hazard function is nonmonotonic, increasing and then decreasing. It has been suggested, therefore, that hazard function estimation may provide a way to

10 RT Analysis 10 identify RT distributions, and hence the process underlying the execution of an RT task (Burbeck & Luce, 1982). If we can determine how RTs are distributed, then we have gone a long way toward isolating the process responsible for generating the RTs. In practice, then, we might try to estimate one or more of these functions in an attempt to identify or rule out various distributional forms for consideration. In the next section, we will talk about problems associated with estimating these functions. The problem of estimation is not a trivial one, however, and I must begin by discussing estimation in general. I will then talk about estimation of parameters, and then estimation of functional forms. Estimation The goal of estimation is to determine from RT data the properties of the distribution from which the data were sampled. We may wish to know only gross properties such as the mean or skewness, or we may hope to determine the exact functional form of the distribution. The most common sort of statistical analysis involves inferences about the mean and variance of the population distribution. Given a particular sample of size n, we can attempt to estimate the central tendency and dispersion of the population, perhaps using the sample mean X and variance s 2. Or, we might try to estimate the shape of the distribution itself, perhaps by constructing a histogram indicating the relative frequency of each observation in the sample. Both of these estimation problems are commonly encountered, and they are not necessarily separate problems. We will discuss each in this section. It is important to remember that any estimate, whether of a parameter or of a function, is a random variable. Associated with it is some degree of variation and some (often unknown) distributional form. The goal of estimation, therefore, is not just estimating values of parameters, but the estimation of those parameters in such a way that the distributions of those estimates have desirable properties. For example, the sample mean X is an estimate of a parameter,, the population mean. The estimate X is a random variable, because its value will change for dierent samples, even when those samples are taken from the same population. There is a distribution associated with X, which, by the Central Limit Theorem, we know to be approximately normal. The mean of this

11 RT Analysis 11 distribution is, the mean of the population from which the sample was drawn. We know something also about the variance of the distribution of X. As long as the sample is iid, the variance of the distribution is 2 =n, the variance of the population from which the sample is drawn divided by the size of the sample n. I will begin by discussing problems of parameter estimation, including the mean and variance of the RT distribution, and also the more explicit problem of estimating the parameters of a theoretical RT distribution. I will then discuss the problem of distribution estimation, including the estimation of the density, distribution, and hazard functions. Throughout the rest of this chapter, I will refer to the data presented in Table 1 to motivate and illustrate the discussion. These data were collected in a simple detection experiment, roughly designed after an experiment presented by Townsend and Nozawa (1995), in which one, two or no dots appeared in two locations on a computer screen. The observer's task was to press one key as soon as a dot was detected, and the other key if no dots were detected. Each dot could be either large or small. Only the target (dot) present data are provided in the table, and error RTs are not included. This experiment is an example of what Townsend and Nozawa have called the double factorial paradigm, which has proved to be very important in examining hypotheses about processing information from multiple sources of information, as we will see later. Properties of Estimators Consider an iid sample fx 1 ; X 2 ; : : : ; X n g of size n from a population with mean and variance 2. The sample mean X = P n i=1 X i =n is an estimate of the parameter. The sample mean has a number of desirable properties as an estimate of : it is (1) unbiased, (2) consistent, (3) ecient, and, if the variable X is normally distributed, (4) a maximum likelihood estimate. If an estimator is unbiased, then its expected value (mean) is equal to the parameter being estimated. For the sample mean, the expected value of X is equal to, the parameter that X estimates. In other words, on average, the sample mean will equal the population mean. More generally, suppose that we wish to estimate a parameter with

12 RT Analysis 12 the estimator ^. The estimator ^ is unbiased if E(^) = ^ = ; where E is the expected value operator, dened as E(X) = Z 1?1 xf(x)dx: If an estimator is also consistent, then as the sample size n grows, the probability that it diers from the estimated parameter shrinks to zero. We sometimes call this property \convergence in probability." For the sample mean X, this property takes the form of the Law of Large Numbers. As n grows very large, the probability that X diers from shrinks to zero. More generally, for a consistent estimator ^, lim P (j^? j ) = 0; n!1 for any > 0, no matter how small. For practical purposes, this means that the accuracy of the estimator ^ can be improved by increasing sample size n. Consistency is therefore a very important property. We might wish to sacrice unbiasedness for consistency, as long as the variance of the estimator decreases fairly rapidly with n. A property closely related to consistency is that of asymptotic unbiasedness, meaning that as n grows, the expected value of the estimator approaches the value of the estimated parameter. Although this may seem just the same as consistency, it is not. It is a stronger property, meaning that the conditions under which it holds are more stringent than those for consistency. Therefore, an estimator may be consistent but asymptotically biased. The third property, eciency, refers to the variance of the estimator. Because the estimate of a parameter is a random variable, we would like for the variance of that variable to be relatively small. So, if the estimator is unbiased, we can be fairly certain that it does not vary too much from the true value of the parameter. The sample mean is ecient, which means that it has a smaller variance relative to any other estimate of that we might choose, such as the median or the mode. The fourth property, maximum likelihood, has to do with the probability of the observed sample. That is, there is some probability associated with having sampled exactly the observations that we obtained: an n-fold joint probability distribution. To use a

13 RT Analysis 13 simplistic perspective, if we observe a particular sample, then it must have been highly probable, and so we should choose the estimates of the population parameters that make this probability as large as possible. If X is normally distributed, then the sample mean is a maximum likelihood estimator of. Note that maximum likelihood estimators are not guaranteed to be unique or to exist at all. I will discuss maximum likelihood estimation in some detail later. Although we have been talking about properties of point estimators, that is, estimators of a single population parameter, it is important to remember that these properties also hold for estimates of functions, such as densities or CDFs. For example, suppose that a density function were to be estimated by a relative frequency histogram of a sample. The height of the histogram at every point is a random variable, and the shape of the histogram will change with every new sample. Therefore, we can characterize empirical estimates of the density, distribution, survivor and hazard functions as unbiased, consistent, ecient, and so forth. Parameter Estimation Mean and Variance By far the most common parameters estimated in an RT analysis are the mean and variance ( and 2 ) of the RT distribution. However, because of the asymmetric shape of the RT distribution, it is important to recognize that these parameters are not necessarily (perhaps not even frequently) the best parameters to characterize the RT distribution. Because of the RT distribution's skewness (see Figure 1), the mean does not represent the most typical or likely RT. It is pulled upward, in the direction of the skew. A similar problem exists with the variance 2. The large upper tail in the RT distribution has the eect of creating \outliers," values in the sample that are much longer than the majority of the observations. Outliers are a problem in all areas of statistical analysis, but the unusual aspect of outliers in RT data is that they potentially derive from the process of interest. That is, they are not necessarily outliers in the sense of contaminations of the data. Because the sample variance is greatly increased by such outliers, the power of the statistical tests to be performed on the data is greatly reduced.

14 RT Analysis 14 As an example, consider the RTs shown in the condition labeled \Left(s)" in Table 1. These are the detection RTs for when a single, small dot appeared in the left position. These data are positively skewed: although most of the observations fall between 300 ms and 600 ms, a small proportion of the sample extends as high as 1029 ms. The sample mean is X = 550:60 ms and the sample standard deviation s = 164:58 ms. Let's change the value of the slowest RT, 1029 ms, to 2060 ms { a value that might typically be observed in such an experiment. We nd now that the sample mean X = 573:50 ms and the sample standard deviation s = 270:42 ms. As a result of changing a single observation the sample mean increased by 23 ms (almost a detectable eect, in some RT experiments) and the sample standard deviation increased by over 60%. We say that neither the mean nor the variance 2 are robust parameters, meaning that very small changes in the shape of the distribution, like the change of a single observation, can produce very large changes in the values of these parameters. In a practical sense, this means that condence intervals constructed for these parameters are likely to be too large and placed incorrectly, reducing the power of hypothesis tests involving the mean RT (Wilcox, 1998). This is particularly a concern for RT data in which not only large degrees of skew are to be expected, but a good number of outliers (either extremely long or short RTs) are to be expected as well. An alternative is to estimate parameters that aren't as sensitive to outliers or skew, such as the median or interquartile range. Using the same example as in the previous paragraph, the original sample has median Md = 485 ms and interquartile range IQR = 169 ms. After doubling the slowest RT to 2060 ms, these estimates are unchanged. Unfortunately, the standard errors of X and s 2 are typically considerably smaller than the standard errors of Md and IQR (Stuart & Ord, 1999). For example, while the standard error of X is = p n, if the sample is taken from a normal population, the standard error of Md is approximately 1:25= p n. Although the sampling distributions of X, s 2, Md and IQR are asymptotically normal, the sample sizes required to approximate normality are much larger for Md and IQR than for X and s 2. Furthermore, whereas the sample mean X is an unbiased estimator of, the sample median Md is a biased estimator of the population median when the population is skewed (Miller, 1988). 2 These factors 2 For even very skewed distributions, median bias is generally less than 10 ms for samples of size 25 or

15 RT Analysis 15 have prevented widespread use of the statistics Md and IQR, despite the fact that they are probably better for characterizing central tendency and dispersion for RT data than are X and s 2. Using Monte Carlo simulations, Ratcli (1993) investigated a number of methods of RT data treatment that reduce or eliminate the eects of outliers on the mean and variance. These included cuto values, above and below which RTs are eliminated from the sample, and transformations such as the inverse and logarithm. For each of these strategies, he computed power and the probability of Type I errors for analyses of variance, with and without outliers mixed into the data. While no method had strong eects on the number of Type I errors, the method chosen had strong eects on power. Using Md as a measure of central tendency generally resulted in lower power than using cutos or transformations, probably because of the greater variance and bias of Md. Fixed cutos (e.g., 2000 or 2500 ms) maintained the highest power. The use of xed cutos, however, is highly dependent on the range of the data in dierent experimental conditions; a 2000 ms cuto in one experiment might eliminate half the observations in another experiment. Thus there is no hard and fast rule that could be used to establish cutos. Because of this problem, it is common to nd examples in which cutos are based on the sample standard deviation. For example, we might eliminate all observations greater than 3 standard deviations above the mean. Unfortunately, Ratcli (1993) found that basing cutos on the standard deviation could have disastrous eects on power, depending on whether the experimental factors had their eects on the fast or slow RTs. Ratcli instead recommended exploring a range of potential cuto values for dierent experimental conditions. However, Ulrich and Miller (1994) have noted that using cutos can introduce asymmetric biases into statistics such as the sample mean, median, and standard deviation, and caution strongly against the use of cutos without consideration of these eects. Van Selst and Jolicoeur (1994) present a procedure that produces a uniform bias across sample sizes, thus minimizing the potential for artifactual dierences between conditions. Unfortunately, even Van Selst and Jolicoeur's method produces a highly biased estimate higher. When one or more groups has sample size less than 25 the median should not be used, as the bias dierence between the groups could introduce an artifactual eect. See Miller, 1988, for more details.

16 RT Analysis 16 (as great as 30 ms too small for the distributions they examined), and this could also result in signicant artifacts, especially in the presence of oor or ceiling eects. The next most powerful approach to minimizing the eects of outliers was the inverse transformation, which Ratcli (1993) recommended as a way to verify the appropriateness of the selected cutos if cutos were used. Transforming RTs to speed, 1/RT, reduces the eect of slow outliers and maintains good power. In the example above taken from Table 1, the mean transformed RT is X = 1:947 10?3 = ms with standard deviation s = 0:466 10?3 = ms. If the slowest RT is doubled, the mean transformed RT is X = 1:936 10?3 = ms with standard deviation s = 0:494 10?3 = ms; there is some eect of the long outlier, but it is greatly reduced relative to the eect on the mean RT. To see how outliers reduce power, consider the two conditions in the Table denoted Both(ss) and Right(s). The nature of the experiment was such that we might have expected that RTs in the Right(s) conditions would be slower than RTs in the Both(ss) condition. This is because, essentially, twice the amount of information was available to the observer in the Both(ss) condition, resulting in a faster accumulation of evidence that targets were present. As predicted, the mean RT for Both(ss) responses is 524:61 ms and the mean RT for Right(s) responses is 563:70 ms. Unfortunately, this dierence (39 ms) does not reach statistical signicance (t(88) = 1:3734, p > :08). If we consider instead the response rates (1:85 10?3 = ms and 2:00 10?3 =ms for Both(ss) and Right(s), respectively), then the rate dierence (?:15 10?3 =ms) is signicant (t(88) = 1:94, p < :05). 3 A large statistical literature has developed over the past few decades addressing robust statistics and how to reduce the eects of outliers or skewed distributions (Barnett & Lewis, 1994). Not many psychology researchers are considering these alternative techniques for their data. Wilcox (1997, 1998) has made a special eort to bring robust analyses to the attention of experimental psychologists, and hopefully, as more and more statistical packages incorporate robust techniques, we will see more interest in these alternatives in the years to come. In the meantime, however, the best approach to dealing with outliers in 3 Note that it is also possible to eliminate signicant eects by transformation. Also note that transformed RTs will not necessarily be useful for testing some kinds of model predictions, e.g., serial stage models in which durations are summed. I am only advocating the use of transformed RTs for null hypothesis testing, not for verifying model predictions.

17 RT Analysis 17 RT data is probably the inverse transformation, which gives the greatest power. Cutos should be avoided, unless one is willing to undertake either Ulrich and Miller's (1994) or Van Selst and Jolicoeur's (1994) procedures. As with any procedures that involve removing data from a sample, when attempting to employ cutos to reduce the eects of outliers, statistical results should be presented for both the complete and trimmed data sets, so that readers are aware of potential artifactual results. The ex-gaussian Parameters Because outliers are not only bothersome but also potentially interesting in the context of RTs, some have argued that the ex-gaussian distribution could be used as a parametric (although atheoretical) estimate of RT distributions (Ratcli, 1979; Ratcli & Murdock, 1976; Heathcote, Popiel & Mewhort, 1991) without excluding any data that are suspiciously slow or fast. The ex-gaussian distribution, the distribution of the sum of a normal and an exponentially distributed variable, has three parameters. The normal component has parameters and 2, the normal mean and variance. The exponential component has parameter, the exponential mean. Figure 2 shows the ex-gaussian density presented in Figure 1, top panel, together with its component normal and exponential densities. (The exponential has been shifted from its minimum, 0, to demonstrate the relation between the tails of the exponential and ex-gaussian densities.) The normal distribution determines the leading edge of the ex-gaussian density, and the exponential distribution determines skewness, or the height of the positive tail. The mean of the ex-gaussian is +, and its variance is Rather than discussing RT means and variances, estimates of the parameters, 2, and could be used to characterize RT data and isolate the eects of experimental variables either in the slow or fast RTs (Heathcote et al., 1991; Hockley, 1982). I estimated these parameters for the data from the Both(ss) and Right(s) condition using maximum likelihood. For Both(ss), ^ = 390:47 ms, ^ = 16:60 ms, and ^ = 134:09 ms. For Right(s), ^ = 458:64 ms, ^ = 47:54 ms, and ^ = 105:06 ms. Several routines are publicly available to assist in performing these computations (Cousineau & Larochelle, 1997; Dawson, 1988; Heathcote, 1996), and MATLAB routines are provided in the Appendix.

18 RT Analysis 18 Given the estimated ex-gaussian parameters, we might want to infer that the presence of two dots decreased and and increased. However, because the distributions of the estimates of these parameters are unknown, inferential statistical procedures are dicult. One approach is to estimate the standard errors of the estimates by \bootstrapping" the samples (Efron, 1979). This is a simple procedure in which bootstrapped samples of size n are obtained from the original sample by selecting n observations from the sample with replacement. The ex-gaussian parameters are then estimated from each bootstrapped sample, and the standard deviation of those estimates is a (pretty good) estimate of the standard error. I simulated 100 bootstrapped samples for each condition and computed the standard errors of ^, ^ and ^ to be 14:80 ms, 13:09 ms, and 22:37 ms, respectively, for the Both(ss) condition, and 25:00 ms, 23:02 ms, and 31:79 ms, respectively, for the Right(s) condition. Given these standard errors, we can argue that is greater for the Right(s) than for the Both(ss) condition. However, we cannot conclude that any dierences exist between or for the two conditions because the variance of the estimates are too large. This nding is consistent with earlier work showing the variance of the estimates in and to be quite large for smaller samples (Van Zandt, 2000). In sum, the ex-gaussian characterization of RTs could potentially be quite useful in skirting problems of skewness and outliers. Unfortunately, the sampling distributions of ^, ^ and ^ cannot be determined explicitly, and their distributions also depend on the underlying (and unknown) RT distribution. It may not be possible, therefore, to argue conclusively about the eects that experimental manipulations have on these parameters. An additional problem arises when one considers that, despite the utility of the ex-gaussian distribution, RTs are not generally distributed as ex-gaussians (Burbeck & Luce, 1982; Luce, 1986; Van Zandt, 2001). Because the ex-gaussian is an atheoretical model of the RT distribution, it is dicult, if not impossible, to attribute psychological meaning to changes in the dierent parameters. As Ratcli (1993) argues, the use of the ex-gaussian parameters to characterize RT distributions might not be very useful in the absence of a model explaining the processes that generated the RTs in the rst place.

19 RT Analysis 19 Nonparametric Function Estimation Some estimation procedures, such as least-squares minimization and maximum likelihood estimation, insure that estimates are unbiased, consistent, ecient, and so forth, given certain constraints on the sample. Unfortunately, these constraints are rarely satised when dealing with RT distributions. Furthermore, when estimating density functions, the extent of bias depends on the true form of the underlying population distribution, which is unknown. Therefore, when estimating RT densities, the extent of error is also unknown. There are a number of issues that bear on the estimation procedure. The issue of primary importance is whether the analysis is model-driven. That is, has a process been specied that states how RTs should be distributed and explains the relationship between the physical parameters of the experiment and the theoretical parameters of the RT distribution? If a model of this degree of precision has been specied, then a parametric procedure will be used to recover the theoretical parameters of the RT distribution. If not, then nonparametric procedures will probably be more appropriate. Quantiles and the CDF The CDF is the easiest of the functional forms to estimate, because an unbiased, consistent, maximum likelihood estimate for the CDF exists in the form of the cumulative relative frequency distribution, or EDF (empirical distribution function). That is, ^F (t) = [number of observations less than or equal to t]=n: (1) The EDF is an estimator of the percentile ranks of the possible RTs that might be observed, and is asymptotically normal at every point. This means that at every point t, ^F (t) will be normally distributed around the value of the true CDF F (t). Because the EDF ^F (t) is an unbiased estimate of the CDF, it should be noted that 1? ^F (t) is therefore an unbiased estimate of the survivor function. We will use the survivor function estimate later in this chapter. It is often useful to compute the estimates ^F?1 (p) of the quantiles of an RT distribution (e.g., Logan, 1992; Van Zandt, 2000; Van Zandt, Colonius & Proctor, 2000). The CDF F (t) can then be estimated by plotting p for a number of estimates t p = ^F?1 (p). To

20 RT Analysis 20 estimate the p th quantile t p, where P (RT t p ) = p, the RTs are ordered from smallest to largest. The simplest way to estimate the p th quantile is to nd the np th observation in the sample, if np is an integer. If np is not an integer, then an average of the [np] th and [np] + 1 th observation is computed, where [np] indicates the integer part of np. Typically the midpoint of [np] and [np] + 1 is used, but other weighting schemes can be used as well (Davis & Steinberg, 1983). As an example, consider the data in conditions Left(s) in Table 1. We will estimate the median, the p = :50 quantile. There are n = 45 observations in this sample, so np = 22:5. The [np] th observation is (counting frequencies from the smallest observed RT) RT 17 = 480 ms, and the [np] + 1 th observation is RT 18 = 485 ms. The average of these two observations is 482:5 ms, therefore ^F?1 (:5) = Md = 482:5 ms. Like the EDF, quantile estimates are asymptotically normal and unbiased. These characteristics of the EDF and quantile estimates makes inferential statistical analyses easy to perform, because the sampling distributions of quantiles and percentile ranks are known exactly (Stuart & Ord, 1999). A widely used method for estimating quantiles of RT distributions is called \Vincentizing" (Ratcli, 1979). To describe this procedure, suppose that q quantiles are to be estimated from a sample of n observations. We will call the estimated quantiles \vincentiles" (Heathcote, Brown & Mewhort, 2000) to distinguish them from the quantiles estimated by the procedure described above. It is typically assumed (Ratcli, 1979) that the vincentiles are, like quantiles, evenly spaced across the data, separating the data into q equally probable bins. Note, however, that the vincentiles are quantile midpoints, so that q + 1 bins are obtained through Vincentizing. For instance, suppose that 10 deciles were computed by the standard quantile estimation procedure described above. These estimates would correspond to the 10 th, 20 th,... percentiles. Now suppose that 10 vincentiles were computed. We would assume that these vincentiles were located at approximately the 5 th, 15 th, 25 th... percentiles { the midpoints of the decile ranges. Thus, the vincentiles separate the sample into q + 1 groups, and the relative frequencies of the middle groups are 1=q, and the relative frequencies of the slowest and fastest groups are 1=2q. Consider the RTs for condition Left(s) shown in Table 1. There are n = 45 observations in this condition, and we will estimate q = 5 vincentiles. To do this, we rst make q copies

21 RT Analysis 21 of each observation. Then, starting from the smallest RTs, we begin averaging the duplicated order statistics in groups of n. The rst vincentile for this sample is therefore (5)(350 ms) + (5)(365 ms) + (5)(375 ms) + (5)(409 ms)+ (15)(422 ms) + (5)(423 ms) + (5)(426 ms) V 1 = 45 = 401:56 ms: The second vincentile is the average of the next n = 45 observations: (5)(428 ms) + (5)(429 ms) + (5)(449 ms)+ (15)(450 ms) + (5)(454 ms) + (10)(457 ms) V 2 = 45 = 447:11 ms: The averaging procedure is continued throughout the remaining observations in the duplicated sample, yielding V 3 = 490:67 ms, V 4 = 578:22 ms, and V 5 = 835:44 ms. The vincentiles V 1, V 2, etc. should estimate the 10 th, 30 th, 50 th, 70 th and 90 th quantiles. It is informative to compare the values of the vincentiles to the estimates of the quantiles found using the standard quantile estimation procedure. For p = :10, [np] = 4 and [np] + 1 = 5. The average of the fourth and fth order statistics (409 and 422 ms) is ^t :1 = 415:5 ms. For p = :30, [np] = 13 and [np] + 1 = 14. The average of the 13 th and 14 th order statistics (450 and 450 ms) is ^t :3 = 450 ms. Continuing in this way, we nd ^t :5 = 482:50, ^t :7 = 567:00 ms, and ^t :9 = 810 ms. Although ^t :3 = 450 ms is close to the value V 2 = 447:11, none of the other vincentiles are as close to the estimated quantiles. Neither do the vincentiles divide the sample into equally frequent groups. It turns out that the only time that vincentiles are estimates of the quantiles to which they are supposed to correspond is when the sample is drawn from a symmetric distribution (Heathcote et al., 2000). Simulations show that the vincentiles do not correspond to known quantiles for nonsymmetric distributions, and therefore they are not useful as a way to estimate CDFs (Van Zandt, 2000). The major attraction of the Vincentizing procedure is that it allows for averaging of RT distributions across subjects in an experiment (Ratcli, 1979). This is useful when small sample sizes prevent accurate estimates of RT distributions for individual subjects

22 RT Analysis 22 (although CDFs and quantiles can be accurately estimated with as few as 50 observations, see Van Zandt, 2000). To average RT distributions across subjects, the vincentiles are computed for each subject's data, and then the vincentiles are averaged. Because each vincentile corresponds (presumably) to a particular percentile rank, the resulting averages can then be used to construct the EDF. However, because the vincentiles do not typically correspond to known percentiles, averaging should be performed using standard quantiles rather than the vincentiles. There seem to be no particular benets to using vincentiles instead of quantiles, and using vincentiles may introduce error, depending on the goals of the analysis. The Density Function Probably the most popular and easiest method for density estimation is the simple histogram. Observations are binned and the relative frequency of the number of observations within each bin are used as a density estimate. Unfortunately, there is no best estimator for the density function as there was for the CDF. There is a large area in statistics devoted to density function estimation, only a little of which we will be able to touch on in this chapter. See Silverman (1986) for a basic and accessible treatment of this problem. The issue of unbiased and consistent estimators for density functions is a tricky one. To illustrate why, let's consider a rather large class of density function estimators called general weight function estimators. The simple histogram is one member of this class. Associated with each of the estimators in this class is a parameter h n, sometimes called a smoothing or bandwidth parameter, which depends on the sample size n. In the case of a simple histogram, h n would be the width of the bins. In general, the larger n is the smaller h n needs to be. Under some fairly general constraints, to be sure that the estimate is asymptotically unbiased, it must be that lim n!1 h n = 0. But even if this holds, to make sure the estimator is consistent, it must also be that lim n!1 nh n = 1. So h n must go to zero as n gets large, but it can't go to zero too quickly. For any particular density estimator, bias will be a function of the sample size n and the true underlying density, which in practice is always unknown. Furthermore, RT densities have a special problem in

23 RT Analysis 23 their skew, which makes some potential estimators unsuitable. It is important to realize that a density estimate is probably biased, the degree of bias will be unobservable, and the bias will not necessarily get smaller with increases in sample size. For this chapter, I present two types of density estimators. The simple and very popular histogram estimators are easy to compute but can be inaccurate. Kernel estimators are more accurate but they are a little more dicult to understand and require more computation. These two types of estimators barely begin to cover the statistical literature on nonparametric density estimators. There are more accurate estimators than the ones presented here. However, it is especially important to remember that density estimation is an exercise in descriptive statistics only. Under no circumstances should parameter estimation (i.e. model tting) depend in some way on a density estimate. Appropriate parameter estimation methods will be discussed shortly. For graphical purposes, the kernel estimator described below should suce. Histogram Estimators Histogram estimators are perhaps the most well-known density estimators. Construction of the histogram estimate is fairly simple, and involves selecting r \bins" with bin boundaries ft 0 ; t 1 ; t 2 ; : : : ; t r g along the time axis. The estimate is a function of the number of observations falling in each bin: ^f(t) = number of observations in bin i nh i ; t i?1 t < t i ; where h i = t i? t i?1 is the width of the i th bin. The bins can be of xed width, or they can vary according to the density of the observations along the time axis. For xed-width estimators, an origin t 0 and a bin width (bandwidth) h n are selected, and the bin boundaries are computed as ft 0 ; t 0 + h n ; t 0 + 2h n ; : : :g. The histogram estimate for the Left(s) data is shown in Figure 3, using t 0 = 200 ms and h n = 50 ms. Unfortunately, there is no automatic way to select t 0 or h n, and no systematic way to adjust h n with sample size to ensure asymptotic unbiasedness or consistency of the estimator. An appropriate t 0 and h n must be selected after inspecting the sample for which the density is to be estimated.

24 RT Analysis 24 For this reason, variable-width histograms that specify the frequency of observations within each bin are frequently used in RT analysis. One such estimator is based on the vincentiles (Ratcli, 1979). For this estimator, the vincentiles are assumed to divide the sample into equally-probable intervals (with the exception of the fastest and slowest bins) as described above. The height of the estimate for each bin is then computed so that the area of the interval is equal to 1/q (the number of vincentiles). I previously investigated the accuracy of histogram estimators, both xed and variable widths, for a number of dierent RT models (Van Zandt, 2000). Unfortunately, variable-width estimators based on quantiles and vincentiles were highly inaccurate and quite variable even for very large sample sizes. As I demonstrated above, the vincentiles do not divide the sample into groups of equal frequency, so the heights of the density estimate computed under this assumption are incorrect. The xed-width histogram performed better than density estimates based on quantiles, but because of the lack of an algorithm to adjust h n with increasing sample size, accuracy did not improve with increases in sample size. Therefore, the histogram estimates should not be relied upon for any serious RT distributional analysis. Kernel Estimators The kernel estimator is also a member of the class of weight function estimators. A \kernel" is simply a function K(x) that is integrated or, in the discrete case, summed, over x. To estimate the density at a particular point t, the kernel is summed over all of the observations T i in the sample. Another way of looking at the kernel estimate is that every point on the estimate is a weighted average of all of the observations in the sample. The general form of the kernel estimate is ^f(t) = 1 nh n nx i=1 Ti? t K : (2) Notice the presence of the bandwidth parameter h n in the denominator of the kernel variable. The larger h n, the smoother the estimate will be: large values of h n will tend to minimize the deviations in the sample from the point t. The kernel K(x) is itself a density function, a positive function that integrates to 1 over all x, and is typically symmetric. The kernel estimate of the density at the point t, then, is found by centering the kernel over the h n

GIST 4302/5302: Spatial Analysis and Modeling

GIST 4302/5302: Spatial Analysis and Modeling Basics of Statistics Guofeng Cao www.myweb.ttu.edu/gucao Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Spring 2015 Outline of This Week