Statistical inference for MEG Vladimir Litvak Wellcome Trust Centre for Neuroimaging University College London, UK MEG-UK 2014 educational day
Talk aims Show main ideas of common methods Explain some of the terminology Give some useful tips Warn of pitfalls and common errors
What is special about MEG? Multi-dimensionality Presence of structure in the data and availability of prior knowledge about this structure. Also in EEG Large volumes of data Vast selection and diversity of potentially interesting data features.
The basics
What do we want to know? Is there an effect of experimental factor expressed the data? Where in the data space is the effect expressed? How large is the effect?
Four approaches to statistical testing Parametric: SPM Non-parametric (permutations): FieldTrip, SnPM Bootstrap: EEGLAB, LimoEEG. Bayesian: SPM
Data example Random presentation of faces and scrambled faces Question: Is there a difference between the ERF of faces and scrambled faces?
Evoked field example Focus on N170 field strength # trial field strength field strength t 2 1 1 n 1 2 1 n 2 *statistic any function of the sample
Inference the undergraduate perspective Idea: u Find out the distribution of the chosen statistic under the assumption of no effect (null hypothesis H0). See how probable it would be to get a statistic value more extreme than the actual value under this distribution (p-value). Reject H0 if p-value is below a pre-defined threshold α. α is also the false positive rate. Parametric (Neumann-Pearson) approach Only uses statistics whose distribution under H0 can be computed analytically, most commonly t, F, 2 p (t u H 0 )
An alternative approach H0 implies that the samples from the two conditions come from the same distribution. Thus, if trial labels were randomly reassigned (reshuffled), we could re-compute the statistic and thereby draw a sample from its distribution under H0. original 1 2 3 4 5 face face face scrambled scrambled shuffling #1 face scrambled face scrambled face shuffling #2 shuffling #3 scrambled face face face scrambled face scrambled face scrambled face For the example above there are 5!/(3!*2!)=10 distinct label assignments
Original example # trial field strength Focus on N170 field strength field strength t 1 2 2 1 1 n n 2 1 = -10.1
Shuffling #1 # trial field strength Focus on N170 field strength field strength t 1 2 2 1 1 n n 2 1 = 2.5
Shuffling # 2 # trial field strength Focus on N170 field strength field strength t 1 2 2 1 1 n n 2 1 = -1.2
Shuffling # 3 # trial field strength Focus on N170 field strength field strength t 1 2 2 1 1 n n 2 1 = 0.14
# shufflings = 3 250 permutation distribution p < 1/250 = 0.004 * Note, that you need at least N shufflings to show that p<1/n
Permutation test vs. parametric test Unlike parametric test, permutation test does not assume that the samples come from normal distribution, just that they come from the same distribution (under H0) and are independent. Permutation distribution can be computed for ANY statistic, not just t. Parametric framework allows for more complex experimental designs (see below). Permutation test is more computationally demanding. Parametric test is more sensitive if its assumptions (normality) hold. Parametric framework can provide confidence intervals on effect sizes while the permutation framework cannot.
Multiple comparisons problem and solutions
Multiple testing in MEG field strength 3D N 300, 000 1D N = Fs*time 1000 2D N = # channels 300
Multiple testing: time-frequency 2D N = Fs*time* # frequencies = 5,000 4D N 1,000,000
Multiple comparisons problem (MCP) u u u u u u If we have 100,000 tests, α=0.05 t t t t t 5,000 false positives t Use of uncorrected p-value, α =0.1 11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% Percentage of Null Pixels that are False Positives 10.2% 9.5%
Neural Correlates of Interspecies Perspective Taking in the PostMortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction Bennett et al. (2010) Journal of Serendipitous and Unexpected Results Final sentence of paper: While we must guard against the elimination of legitimate results through Type II error, the alternative of continuing forward with uncorrected statistics cannot be an option.
Family-Wise Null Hypothesis Family-Wise Null Hypothesis: There is no effect everywhere If we reject the null hypothesis for any test, we reject the family-wise Null hypothesis A false positive anywhere in the image gives a Family Wise Error (FWE) Family-Wise Error rate (FWER) = corrected p-value Use of uncorrected p-value, α =0.1 Use of corrected p-value, α =0.1 FWE
MCP with no topological structure Bonferroni correction To ensure a particular FWER for N tests choose: FWE N This correction does not require the tests to be independent but becomes very stringent if there is dependence. False Discovery Rate (FDR) Instead of FWE, control the expected percentage of false detections out of all detections. At least one test should be significant at Bonferroni-corrected level. The more effects are detected, the more noise there can be. One should be careful about over-interpreting any specific activation. Control of False Discovery Rate at 10% 6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% Percentage of Activated Pixels that are False Positives 8.7%
Random field theory - topological inference 2D Gaussian Random Field p(fwe) ( ) 1/2 (u 2-1) exp(-u 2/2) / (2 )2 ( 1/2 u Search region R3 volume roughness threshold Expected number of peaks = normalised volume * peak density Threshold increases (for high enough u) p decreases (less probable to cross) Search volume increases p increases (more tests) Roughness decreases (smoothness increases) p decreases (fewer independent tests) Derived analytically for each statistic type t, F, 2
Peak, cluster and set level inference Sensitivity Regional specificity Peak level test: height of local maxima Cluster level test: spatial extent above u Set level test: number of clusters above u : significant at the set level : significant at the cluster level : significant at the peak level L1 > spatial extent threshold L2 < spatial extent threshold If you found a significant cluster between 50 and 200 ms, could you conclude that the brain can detect faces already at 50 ms after the stimulus?
Other approaches to MCP In permutation approach design a statistic that pertains to the whole set of tests: maximal t in volume, maximal cluster size, maximal sum of t in a cluster etc. and determine its permutation distribution. There is no need to treat scalp maps as continuous images. Clusters can be defined based on hand-crafted neighbourhood relations between sensors (FieldTrip). Details are flexible and the test will be valid as long as it is designed a-priori and original and shuffled data are treated in the same way.
The General Linear Model
Complication 2: accounting for additional factors # trial field strength Focus on N170 field strength field strength t 1 2 2 1n 1n 2 1
Data modeling Data Faces = b1 Y = b1 X1 Scrambled + b2 + b2 + X2 +
Design matrix b1 = Y = b2 X b + +
General Linear Model p 1 1 1 N: # trials p: # regressors p Y N = X N GLM defined by + Y X N design matrix X error distribution ~ N (0, I ) 2
Contrast : specifies linear combination of parameter vector: ct = -1 +1 ct ERP: faces < scrambled? = t-statistic map ˆ1 ˆ2? 1 ˆ1 1 ˆ2 0? T Test H0: c 0? contrast of estimated parameters t= variance estimate t ct ˆ 2 T T s c (X X) c
More complicated design (ANCOVA) Each trial-type (6) Each trial This is the general linear model (GLM) Confounds (4)
Tips and warnings After you've not fooled yourself, it's easy not to fool other scientists. Richard P. Feynman
Increasing sensitivity by limiting the search volume Frequency (Hz) Limit the test to where an effect is expected a-priori time (s) average time (s) Frequency (Hz) Frequency (Hz) Reduce dimensionality by averaging where possible time (s)
Double dipping Difference between conditions orthogonal contrast ft 0.5 1 1 0 0.5 time (s) Test point based on the same data difference mean Test point based on independent data Kilner, Clin Neurophysiol., 2013
How impressive are impressive fits? Regressor in the form of M Don t be too impressed by model fit to the data if the fit is shown for one of multiple data ponts selected on the basis of its goodness-of-fit. 100 x 100 random vectors F Fit for the pixel with maximal F
The pitfalls of non-statistical inference Frequency (Hz) Original grand mean response to button press Random patterns in the data can easily appear physiologically meaningful due to covariance structure. Meaningful appearance can supplement but not replace properly assessed statistical significance. time (s) Frequency (Hz) Contrast images for regression coefficient across subjects with 16 different random regressors time (s) Example design matrix
Take home messages Parametric and permutation approaches are both valid statistical methods applicable to MEG. Parametric approach allows for more complex experimental designs (GLM) with limited choice of statistics while permutation approach allows to use any statistic but only with simple designs (t-test, regression). Unless you are interested in the thoughts of dead fish, control for multiple comparisons. Don t fool yourself.
Thanks to Tom Nichols Guillaume Flandin Gareth Barnes Stefan Kiebel Thank you for your attention
References Kilner J., and Friston KJ. Topological inference for EEG and MEG data. Annals of Applied Statistics, 4(3):1272-1290, 2010. Maris E., Oostenveld R. Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods. 164(1):177-90, 2007. Maris E. Statistical testing in electrophysiological studies. Psychophysiology. 2012, 49(4):549-65. Friston K. Ten ironic rules for non-statistical reviewers. NeuroImage. 2012, 61(4):1300-10. Gross et al. Good practice for conducting and reporting MEG research. Neuroimage. 2013, 65:349-63. http://www.fil.ion.ucl.ac.uk/spm/doc/biblio/keyword/rft.html http://www.fil.ion.ucl.ac.uk/spm/course/video/#meeg12 http://fieldtrip.fcdonders.nl/tutorial
RFT and alternative approaches Parametric (RFT) Permutation Bootstrap Implemented in SPM Fieldtrip SnPM EEGLAB LIMO EEG Mathematically sound yes yes no (a heuristic) Fast computation yes no no Can be applied to complex experimental designs yes no yes Can be applied to arbitrary statistics no yes yes