Supplementary Material for:

Size: px

Start display at page:

Download "Supplementary Material for:"

Arron Wright
5 years ago
Views:

1 Supplementary Material for: Correction to Kyllingsbæk, Markussen, and Bundesen (2012) Unfortunately, the computational shortcut used in Kyllingsbæk et al. (2012) (henceforth; the article) to fit the Poisson counter model to experimental data built on an approximation that was not well-founded. In this supplement to the correction of the article we show why this is so, refit both to get a sense of the actual deviation, and offer a well-founded computational shortcut that strongly reduces the time needed to fit the data. The Poisson counter model fits did, fortunately, not deviate noticeably from those of the computational shortcut, nor did they invalidate any conclusions derived in the article. We, however, still recommend that the Poisson counter model or the well-founded computational shortcut be used in a fitting routine. Why the Computational Shortcut is Not Well-Founded Consider the Poisson counter model proposed in the article (i.e., Equation A1-A4). Assume that R is the set of possible categorization reports (identifications) j of a stimulus i. For each j R there exists a counter X j that during time (t t 0 ) independently accumulates categorizations for response j at a Poisson processing rate of v(i, j). The Poisson counter model proposes that the probability of categorizing stimulus i as belonging to j consists of (1) the probability that counter j alone has the maximum number of counts and (2) the probability that counter j together with one or more of the other counters has the maximum number of counts and the participant chooses category j when guessing randomly among these. The computational shortcut suggested in the article (i.e., Equation A5) for approximating this joint probability is: P L Approx(i, j) = 1 Z L v(i, j) n (t t 0 ) n e v(i,j)(t t 0) n! n=1 ( m k R {j} 0)) v(i, k)(t t m! m=0 e ( ) k R {j} v(i,k)(t t 0) n L n + m where L is a large number and factor Z L is a normalization constant ensuring that PApprox(i, L j) = 1 e k R v(i,k)(t t0), 1

2 which is the probability that at least one counter is greater than zero. If there are no counts, then the participant is assumed to either guess at category j with probability P g (j), or report nothing with probability 1 P g(j). Observe now that if m > 0, then ( n n+m) L 0 as L, and PApprox(i, L j) = 1 ( ) 1 e v(i,j)(t t 0) e k R {j} v(i,k)(t t 0) Z = 1 Z P (X j > 0, X k = 0 for k j). where Z is the normalization factor for L. This means that the computational shortcut evaluates the probability of the jth counter being larger than all the other counters, conditionally on the event that only one counter is non-zero. The computational shortcut does therefore not capture the intuition of the Poisson counter model, and its approximation is thus not well-founded. To get a sense of the actual deviation of this, we let Y = #{k R X k > 0} identify the number of counters with non-zero counts, such that P L Approx(i, j) = P (Y 1) P (X j > 0 Y = 1) = P (Y 1) P (X j > X k for k j Y = 1). Even though the computational shortcut is not well-founded, we still expect that its parameter estimates are close to those of the Poisson counter model, and that the differences in fit may be visible only for intermediate t. To see this, consider the following: when t is long, then the event Y = 1 will in general have a low probability according to the Poisson counter model. However, in this case it is very likely that the counter with the highest Poisson processing rate will be reported, and the computational shortcut provides a good approximation. When t is short, then the event Y 1 will have probability close to 1, and the computational shortcut again provides a good approximation. For intermediate t the computational shortcut exhibits the correct qualitative behavior with a non-monotonic probability for erroneous report. In addition, the observations with short t are the most informative for fits and these are observations where the computational shortcut provides a good approximation. The Effect on Reported Results We refitted the computational shortcut (with L = 20 as in the article) and the Poisson counter model to the data using ADMB (see Fournier et al., 2012). 1 In the following, we will replicate the likelihood fits and goodness-of-fit tests for Experiment 1 and 2 reported in the article, and compare these results to those of the Poisson counter model. We refer to the article for more information on the experiments. 1 The infinite sums of Equation A2, A3 and A5 in the paper are infeasible to compute, and we limited them to N = 18 (instead of N = ). The ADMB template code can be given upon request. 2

3 Experiment 1: Four participants were asked to identify briefly presented single digits between 1 and 9. By briefly presented, we mean that the shown digit was masked at eight systematically varied exposure durations lying between 10 and 100 ms. In every experimental block, each of the nine digits was presented five times at each of eight exposure durations yielding a total of 360 trials per block. The order of presentation was randomized within each block of 360 trials. ran a total of 20 blocks resulting in 100 repetitions of each of the eight exposure durations for each of the nine digits and a total of 7, 200 trials. Probability of Correct Report Probability of False Report Stimulus 2 Stimulus 2 Stimulus 6 Stimulus 6 Stimulus 8 Stimulus 8 Exposure Duration Exposure Duration Figure 1: Maximum likelihood fits for Participant MF in Experiment 1. The graphs show the observed proportions of correct and erroneous reports for stimulus digits 2, 6, and 8 as functions of exposure duration. Left panels: Correct reports. Right panels: False reports. The error bars show 95% confidence intervals of the proportions. The continuous (resp. dotted) curves show the predictions generated by the overall maximum likelihood fit of the Poisson counter model (resp. the computational shortcut) to the results. Figure 1 shows the observed proportions of correct reports (left panels) and erroneous reports (right panels) for three representative stimulus digits of the representative Participant MF. The likelihood fits to data for Participant MF are shown in Figure 1. 3

4 Med9D Max*D D<H9intercept D<H*slope p Maximum*Likelihood*Estimates*for**in*Experiment*1 KK MA MF MR Min*v(i,i) Mean*v(i,i) Max9v(i,i) Min*v(i,<i) Mean*v(i,<i) Max9v(i,<i) Min9P g* Mean9P g* Max9P g* t Neg.*log*lik Table 1: Model = Poisson counter model; Shortcut = computational shortcut; v(i, i) = Poisson processing rate for a correct report; Maximum*Likelihood*Estimates*for**in*Experiment*2 v(i, i) = Poisson processing rate an erronous report; P g = guessing probability; Min = minimum; Mean = mean; Max = maximum; t 0 = threshold for processing in seconds; Neg. log lik. = negative log likelihood value ( ln L(P ) > 0 because P [0, 1]). ADMB minimizes KKthe negative log likelihoodskfunction. MF MR Min*v(i,i) Mean*v(i,i) As can be seen 54 from Table 55 1, the63poisson 64 counter model 44 and computational 41 35shortcut33gives Max9v(i,i) the Min*v(i,<i) same parameter 0 estimates 0 for the guessing 0 probability 0 P g 0and the slack 0 of the Poisson 0 process 0 Mean*v(i,<i) t 0. The likelihood fits of the computational shortcut has a tendency to underestimate the correct Max9v(i,<i) Min9P g* Poisson processing rates v(i, i) and overestimate the erroneous Poisson processing rates v(i, i), Mean9P g* when compared to the Poisson counter model. We thus use i to denote any categorization report Max9P g* that is not i. t Neg.*log*lik For each participant, we recalculated the Monte Carlo tests of goodness-of-fit and the information theoretical measures for both the computational shortcut and the Poisson counter model. The left panel of Figure 2 shows a Q-Q plot of estimated versus simulated p values for both fits to Participant MF. The deviation between estimated and simulated p values for both fits were in general similar across the two (see the last row of Table 2). Only the Kolmogorov-Smirnov two-sample test for Participant KK differed. Table 2 shows the range and median of the information theoretic measures, the intercept and slope of the Kullback-Leibler divergence against the Shannon entropy, and the p value of the Kolmogorov-Smirnov goodness-of-fit test for the Poisson counter model and the computational shortcut fits to each of the four participants. Even though both the Poisson counter model and the computational shortcut by large is rejected by the goodness-of-fit test for all participants, we see that the relative information loss (given by the slope) is quite low (below 4%). Thus, despite the significant deviations between data and the fits, both yielded a fairly accurate account of the data from Experiment 1, including an account of the non-monotonic relationship between the proportion of erroneous reports and the exposure durations exemplified in Figure 1. 4

5 Estimated p values KL divergence Simulated p values Shannon Entropy Figure 2: Evaluations of the Poisson counter model (black) and the computational shortcut (gray) for Participant MF in Experiment 1. Left panel: Q-Q plot of estimated p values against p values simulated under the null hypothesis for all 72 experimental conditions with Participant MF in Experiment 1. A y = x reference line is also plotted. If the estimated p values came from a population with the same distributions simulated under the null hypothesis, the points should fall approximately along this reference line. Right panel: Kullback-Leibler divergence D of the theoretical response distribution from the empirical distribution, plotted against the Shannon entropy H of the empirical distribution, for all 72 experimental conditions with Participant MF in Experiment 1. Information*Theoretic*Measure*and*p*Values*for**in*Experiment*1 KK MA MF MR Min*H Med*H Max9H Min9D* Med9D Max*D D<H9intercept D<H*slope p Table 2: Model = Poisson counter model; Shortcut = computational shortcut; H = Shannon entropy of empirical response Information*Theoretic*Measure*and*p*Values*for**in*Experiment*2 distribution; D = Kullback-Leibler divergence of the theoretical response distribution from the empirical distribution; Min = minimum; Med = median; Max = maximum; D H intercept = intercept by linear regression of D on H; D H slope = slope by linear regression of D on KK H. Each p value was obtained SK by Kolmogorov-Smirnov MF test summarizing the results MR of Monte Carlo Measure tests based on Model the χ 2 test Shortcut statistic. Model Shortcut Model Shortcut Model Shortcut Min*H Med*H Max9H Experiment 2: Min9D* Med9D Because Max*D Experiment showed considerable variation in Poisson processing rates across digit stimuli, D<H9intercept the article also reported a second experiment investigating a0.014 more homogenous stimulus material D<H*slope p To this end, the stimulus material consisted of otherwise identical Landolt rings with eight different gap orientations, evenly spread around the circle. Four participants were in Experiment 2 thus asked to Maximum*Likelihood*Estimates*for**in*Experiment*1 identify a briefly presented Landolt ring with possible gap orientations E, SE, S, SW, W, NW, N, or NE (according to akk compass). Again, thema shown Landolt ring was MF masked at eight systematically MR Min*v(i,i) Mean*v(i,i) Max9v(i,i) Min*v(i,<i) Mean*v(i,<i) Max9v(i,<i)

6 varied exposure durations lying between 10 and 100 ms. In every experimental block, each of the eight stimuli was presented five times at each of the eight exposure durations yielding a total of 320 trials per block. The order of presentation was randomized within each block of 320 trials. The participants ran a total of 20 blocks resulting in 100 repetitions of each of the eight exposure durations for each of the eight stimuli and a total of 6, 400 trials. Probability of Correct Report Probability of False Report Stimulus E Stimulus E Stimulus SE Stimulus SE Stimulus S Stimulus S Exposure Duration Exposure Duration Figure 3: Observed proportions of correct and erroneous reports for Landolt rings with gabs centered at E, SE, and S, respectively, as functions of exposure duration for Participant MF in Experiment 2. Left panels: Correct reports. Right panels: Erroneous reports. The error bars show 95% confidence intervals of the proportions. The continuous (resp. dotted) curves show the predictions generated by the overall maximum likelihood fit of the Poisson counter model (resp. the computational shortcut) to the results of Participant MF. Figure 3 shows the observed proportions of correct reports (left panels) and erroneous reports (right panels) for three representative stimuli with the representative Participant MF. The likelihood fits to data for Participant MF are shown in Figure 3. As can be seen from Table 3, the Poisson counter model and the computational shortcut give 6

7 Min9P g* Mean9P g* Max9P g* t Neg.*log*lik Maximum*Likelihood*Estimates*for**in*Experiment*2 KK SK MF MR Min*v(i,i) Mean*v(i,i) Max9v(i,i) Min*v(i,<i) Mean*v(i,<i) Max9v(i,<i) Min9P g* Mean9P g* Max9P g* t Neg.*log*lik Table 3: Model = Poisson counter model; Shortcut = computational shortcut; v(i, i) = Poisson processing rate for a correct report; v(i, i) = Poisson processing rate for an erronous report; P g = guessing probability; Min = minimum; Mean = mean; Max = maximum; t 0 = threshold for processing in seconds; Neg. log lik. = negative log likelihood value ( ln L(P ) > 0 because P [0, 1]). ADMB minimizes the negative log likelihood function. nearly the same parameter estimates for guessing probability P g and the slack of the Poisson process t 0. The likelihood fits of the computational shortcut still has a tendency to underestimate the Poisson processing rates v(i, i) for the correct categorization, while overestimating the erroneous Poisson processing rates v(i, i), when compared to the Poisson counter model. For each participant in Experiment 2, we again recalculated the Monte Carlo tests of goodnessof-fit and the information theoretical measures for both the Poisson counter model and the computational shortcut. Figure 4 shows the Q-Q plot of estimated versus simulated p values for Estimated p values KL divergence Simulated p values Shannon Entropy Figure 4: Evaluations of the Poisson counter model (black) and the computational shortcut (gray) for Participant MF in Experiment 2. Left panel: Q-Q plot of estimated p values against p values simulated under the null hypothesis for all 64 experimental conditions with Participant MF in Experiment 2. A y = x reference line is also plotted. If the estimated p values came from a population with the same distributions simulated under the null hypothesis, the points should fall approximately along this reference line. Right panel: Kullback-Leibler divergence D of the theoretical response distribution from the empirical distribution, plotted against the Shannon entropy H of the empirical distribution, for all 64 experimental conditions with Participant MF in Experiment 2. 7

8 Information*Theoretic*Measure*and*p*Values*for**in*Experiment*1 KK MA MF MR Min*H Med*H Max9H Min9D* Participant MF. By the Kolmogorov-Smirnov two-sample test of the estimated p against the simulated Med9D p values the deviations between estimated and simulated p values were not significant except Max*D for the fit by the computational shortcut to the data of Participant KK (see Table 4). Thus, by D<H9intercept the Kolmogorov-Smirnov two-sample test, we found no signs of systematic deviations between data D<H*slope p and fits. Information*Theoretic*Measure*and*p*Values*for**in*Experiment*2 KK SK MF MR Min*H Med*H Max9H Min9D* Med9D Max*D D<H9intercept D<H*slope p Table 4: Model = Poisson counter model; Shortcut = computational shortcut; H = Shannon entropy of empirical response distribution; Maximum*Likelihood*Estimates*for**in*Experiment*1 D = Kullback-Leibler divergence of the theoretical response distribution from the empirical distribution; Min = minimum; Med = median; Max = maximum; D H intercept = intercept by linear regression of D on H; D H slope = slope by linear regression of D onkkh. Each p value was obtained MA by Kolmogorov-Smirnov MFtest summarizing the results MR of Monte Carlo Measure tests based onmodel the χ 2 testshortcut statistic. Model Shortcut Model Shortcut Model Shortcut Min*v(i,i) Mean*v(i,i) Max9v(i,i) AMin*v(i,<i) Well-Founded 0 Computational 0 0Shortcut Mean*v(i,<i) Max9v(i,<i) The Min9Pprobability g* 0.00 P (i, j) of0.00 reporting category 0.00 j0.00 in a given 0.00 trial is given 0.00 by the Poisson 0.00 counter 0.00 model Mean9P g* in the article 0.00 (i.e., 0.00 Equation A1-A4) However, 0.01 the infinite 0.00 sums 0.00are infeasible 0.00 to compute 0.00 Max9P g* and the summation over power sets in Equation A3 may be very time consuming. As a well-founded t Neg.*log*lik. >1815 >1775 >3085 >3076 >3684 >3675 >2118 >2116 computational shortcut, we suggest that the probability P 1 (i, j) + P 2 (i, j) may be approximated by P1 N(i, j) + P 2,λ N (i, j) and a normalization factor. Maximum*Likelihood*Estimates*for**in*Experiment*2 First, P1 N KK SK MF MR Min*v(i,i) P Mean*v(i,i) 1 N = e (t t 0) N k R v(i,k) v(i, j) n (t t 0 ) n n 1 v(i, k) m (t t 0 ) m, n! m! Max9v(i,i) n= k R {j} 55 m= Min*v(i,<i) Mean*v(i,<i) where N is a finite positive integer. Second, P Max9v(i,<i) ,λ N Min9P g* (i, j) is the probability that count j is higher than any other counts (Equation A2): (i, j) is the probability that at most λ counters in addition to counter j have maximum counts, and the participant hits category j when guessing Mean9P g* among the counters with maximum counts (Equation A3): Max9P g* t Neg.*log*lik. >3012 >2977 N >3360 >3293 >4086 >4068 >4697 >4694 P N 2,λ = e (t t 0) k R v(i,k) J P λ (R {j}) { } n=1 v(i, j) n (t t 0 ) n J n! v(i, k) n (t t 0 ) n k J n! n 1 k R J {j} m=0 v(i, k) n (t t 0 ) m. m! 8

9 P λ (R {j}) is the limited power set (i.e., the set of all subsets with a cardinality of at most λ) of the set of other categorizations than j, and is the empty set. This limiting of the power set implies that the probability of having more than λ maximum counters, in addition to counter j, is assumed to be zero. Third, the probability that all counters are zero and the participant guesses at category j is: P 3 (i, j) = e (t t 0) k R v(i,k) P g (j). Conversely, the probability that all counters are zero and the participant does not guess is: ( P 4 (i) = e (t t 0) k R v(i,k) 1 ) P g (k). k R Next we quantify the probability mass lost by the approximation of P 1 (i, j) and P 2 (i, j). Approximation Difference: It can be shown that the stimulus i trial-by-trial report accuracy implied by the Poisson counter model, plus the probability that a participant reports nothing P 4 (i) sums to unity, that is P 1 (i, j) + P 2 (i, j) + P 3 (i, j) + P 4 (i) = 1 P 1 (i, j) + P 2 (i, j) + e (t t 0) k R v(i,k) = 1. Notice that our well-founded computational shortcut underestimates the probability mass of P 1 (i, j) and P 2 (i, j), respectively, such that: P1 N (i, j) + P2,λ N (i, j) + e (t t 0) k R v(i,k) < 1. We can quantify this probability mass difference by: P1 N (i, j) + P2,λ N (i, j) + = 1 e (t t 0) k R v(i,k) = 1 P1 N (i, j) P2,λ N (i, j) e (t t 0) k R v(i,k). Notice that the difference,, increases as underestimation becomes more severe. Normalization Constant: To ensure the well-founded computational shortcut is a probability that sums to unity, we rescale 9

10 P1 N and P2,λ N by defining: PApprox N := 1 ( P N Z 1 (i, j) + P2,λ N (i, j)), where Z is a normalization constant chosen such that = 0. That is: 1 Z Z = P N 1 (i, j) + P N 2,λ (i, j) = ( 1 e (t t 0) k R v(i,k)) P 1 N(i, j) + P 2,λ N (i, j) (1 e (t t 0) k R v(i,k) ). The normalization factor can be understood as follows: first, the Poisson counter model implies that ( P 1(i, j) + P 2 (i, j) = 1 e (t t 0) v(i,k)) k R and no rescaling is necessary (Z = 1). Second, when we use the well-founded computational shortcut, the probability masses of P 1 (i, j) and P 2 (i, j) are underestimated and rescaling is necessary (Z < 1). The more we underestimate, the more we need to rescale. Report Accuracy: We thus propose that the well-founded computational shortcut is given by the stimulus i trial-bytrial report accuracy probabilities P N Approx (i, j), P 3(i, j) and P 4 (i), with the property that Example of Tradeoff: ( P N Approx (i, j) + P 3 (i, j) ) + P 4 (i) = 1. As an example of the tradeoff between saved computational time and approximation accuracy, we estimate (with N = 18) an ad hoc measure of computational time for Participant MFs performance in Experiment 2 in the article, and compare it to the approximation difference for each limited power set. As summarized by Table 5, for Participant MF in Experiment 2, the well-founded computational shortcut perfectly approximate the Poisson counter model when the power set is restricted to three other categorizations (λ = 3). The probability that at least four counters have maximum counts is thus zero. Applying the new shortcut with λ = 3 does therefore not cost anything in accuracy, but the computational time is reduced by a factor of (or 10 hours and 44 minutes). If we are willing to accept a small underestimation, then we can restrict the power set to two other categorizations (λ = 2). This approximation will underestimate the probability mass with maximally 0.001, but the computational time will now be reduced by a factor of (or 17 hours and 34 minutes). These observations are of cause ad hoc and not necessarily robust to changes in, for example, participants and paradigms. However, we believe that it is fair to approximate the Poisson counter model by assuming zero probability of having more than two-three counters with maximal counts. Our recommendation is that the approximation difference is reported every time the well-founded 10

11 Well$Founded*Computational*Shortcut*Estimates*for*Participant*MF*in*Experiment*2 Number2of2other2counters2(λ) Measure Min*v(i,i) Mean*v(i,i) Max2v(i,i) Min*v(i,$i) Mean*v(i,$i) Max2v(i,$i) Min2P g* Mean2P g Max*P g t Neg.*log*lik Min*Δ Mean*Δ Max2Δ Time2Ratio Table 5: Min = Minimum; Mean = Mean; Max = Maximum; v(i, i) = Poisson processing rate for a correct categorization; v(i, i) = Poisson processing rate for a false categorization; P g = guessing probability; t 0 = threshold for processing in seconds; Neg. log lik. = negative log likelihood value; = difference between the model and the new shortcut in units of probability mass. Time Ratio = ratio between the computational time of the new shortcut and the computational time of the model, when using ADMB on an Intel Xeon 2.40 GHz with 11 GiB system memory. computational shortcut is applied. Conclusion We conclude by restating our main points. When proposing their Poisson counter model, Kyllingsbæk et al. (2012) used a computational shortcut, which strongly reduced the time needed to fit the Poisson counter model to experimental data. Unfortunately, the computational shortcut built on the assumption that only one of the counters is non-zero, and therefore suggest an approximation that is not well-founded in the Poisson counter model. To get a sense of how much the computational shortcut actually misfitted, we refitted it and the Poisson counter model to the experimental data reported in the article. The Poisson counter model fits did, fortunately, not deviate noticebly from those produced by the computational shortcut, nor did they invalidate any conclusions derived in the article. This is because the computational shortcut exhibits the correct quantitative behavior with a non-monotonic probability for erroneous reports. Finally, we proposed a well-founded computational shortcut that is consistent with the Poisson counter model and showed, by example, how much computational time that can be saved by using it. 11

12 References Fournier, D. A., Skaug, H. J., Ancheta, J., Ianelli, J., Magnusson, A., Maunder, M. N., Nielsen, A., Sibert, J., Ad model builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods and Software 27 (2), Kyllingsbæk, S., Markussen, B., Bundesen, C., Testing a Poisson counter model for visual identification of briefly presented, mutually confusable single stimuli in pure accuracy tasks. Journal of Experimental Psychology: Human Perception and Performance 38 (3),

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). Basic Statistics There are three types of error: 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation). 2. Systematic error - always too high or too low