SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

Similar documents
Simultaneous Confidence Bands for the Coefficient Function in Functional Regression

Regression When the Predictors Are Images

Function-on-Scalar Regression with the refund Package

Semiparametric Methods for Mapping Brain Development

ESL Chap3. Some extensions of lasso

Multinomial functional regression with application to lameness detection for horses

Towards a Regression using Tensors

Prediction & Feature Selection in GLM

High-dimensional regression

MS-C1620 Statistical inference

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Regularization in Cox Frailty Models

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Inference After Variable Selection

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Recap from previous lecture

Analysis Methods for Supersaturated Design: Some Comparisons

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

TECHNICAL REPORT NO. 1091r. A Note on the Lasso and Related Procedures in Model Selection

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Math 423/533: The Main Theoretical Topics

Multivariate Regression Generalized Likelihood Ratio Tests for FMRI Activation

Generalized Elastic Net Regression

ISyE 691 Data mining and analytics

A simulation study of model fitting to high dimensional data using penalized logistic regression

Regression Shrinkage and Selection via the Lasso

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Package Grace. R topics documented: April 9, Type Package

Penalized likelihood logistic regression with rare events

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Fast Regularization Paths via Coordinate Descent

Univariate shrinkage in the Cox model for high dimensional data

High-dimensional regression modeling

Regularization Path Algorithms for Detecting Gene Interactions

A Confidence Region Approach to Tuning for Variable Selection

Regularization: Ridge Regression and the LASSO

Tuning Parameter Selection in L1 Regularized Logistic Regression

Machine Learning for OR & FE

Data Mining Stat 588

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Checking model assumptions with regression diagnostics

LASSO Review, Fused LASSO, Parallel LASSO Solvers

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

The lasso, persistence, and cross-validation

Linear Model Selection and Regularization

Lecture 5: Clustering, Linear Regression

Neuroimaging for Machine Learners Validation and inference

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Discussion of Least Angle Regression

Statistical Inference

Shrinkage Methods: Ridge and Lasso

Statistics 262: Intermediate Biostatistics Model selection

The General Linear Model (GLM)

Extracting fmri features

Proteomics and Variable Selection

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

Lecture 5: Clustering, Linear Regression

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

Prediction Intervals For Lasso and Relaxed Lasso Using D Variables

Machine Learning Linear Classification. Prof. Matteo Matteucci

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

,..., θ(2),..., θ(n)

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

PENALIZING YOUR MODELS

Robust Bayesian Variable Selection for Modeling Mean Medical Costs

Stability and the elastic net

On High-Dimensional Cross-Validation

Statistical Learning with the Lasso, spring The Lasso

Uncertainty quantification and visualization for functional random variables

A Modern Look at Classical Multivariate Techniques

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Group analysis. Jean Daunizeau Wellcome Trust Centre for Neuroimaging University College London. SPM Course Edinburgh, April 2010

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Regularization and Variable Selection via the Elastic Net

High-dimensional Ordinary Least-squares Projection for Screening Variables

6. Regularized linear regression

Marginal Screening and Post-Selection Inference

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

2. Regression Review

Statistical Inference

SUPPLEMENTARY SIMULATIONS & FIGURES

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Resampling-Based Information Criteria for Adaptive Linear Model Selection

MSA220/MVE440 Statistical Learning for Big Data

Sufficient Dimension Reduction using Support Vector Machine and it s variants

Smooth Scalar-on-Image Regression via. Spatial Bayesian Variable Selection

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Package covtest. R topics documented:

2.1 Linear regression with matrices

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Focused fine-tuning of ridge regression

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Lecture 5: Clustering, Linear Regression

Piotr Majer Risk Patterns and Correlated Brain Activities

Model comparison and selection

Chapter 13. Multiple Regression and Model Building

arxiv: v3 [stat.ml] 14 Apr 2016

Transcription:

Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare Kelly and R. Todd Ogden APPENDIX A: SIMULATION DETAILS A.. Comparative simulation study. Given the true coefficient image β = β (k) for k = or 2 as defined in 4, we generated continuous outcomes (A) y i = x T i β + ε i with ε i N(0, σ 2 ), with σ 2 chosen as the solution to (A2) σ 2 s 2 xβ + σ2 = R2, where s 2 xβ is the sample variance of xt β,..., xt n β, representing variance explained by the model, and R 2 is the specified value (0. or 0.5). The left side of (A2) is similar to what Tibshirani and Knight (999) called the theoretical R 2, and has the interpretation that for responses generated according to (A), (A2), the coefficient of determination for the true model is approximately equal to the specified value R 2. An R 2 analogue for logistic regression (Menard, 2000) is given by (A3) R 2 L = log(l M) log(l 0 ), where L M is the likelihood of the given model while L 0 is the likelihood of the model containing only an intercept. For the logistic regression simulation settings we defined a theoretical version of RL 2 analogous to the left side of (A2). Suppose we are simulating responses y i Bernoulli(p i ), i =,..., n, where p i (A4) log = δ 0 + x T i β p i for given δ 0, β, and predictors x i (i =,..., n). Let E( ) denote expectation under the assumed model (A4), and let L... denote likelihood based on the

2 P. T. REISS ET AL. true value of parameters given in the subscript(s), with parameters not given in the subscripts set to zero. Then the proposed variant of (A3) for use in simulations is (A5) (A6) RL 2 = Elog(L δ 0,β) Elog(L δ0 ) = log{+exp( δ 0 x T i β)} +exp( δ 0 x T i β) + log{+exp(δ 0+x T i β)} +exp(δ 0 +x T i β). log{+exp( δ 0 )} +exp( δ 0 x T i β) + log{+exp(δ 0)} +exp(δ 0 +x T i β) To perform simulations with a desired value of R 2 L, we take β = sβ 0 in (A6) for a given β 0, and numerically solve for s > 0 such that (A6) equals the specified value. We used 8-fold CV with 8 repetitions (see 3.6), i.e., minimization of (3) or (4) with K = R = 8, to choose from among the following candidate tuning parameter values. For FPCR, the number of basis functions along each dimension was chosen between 20 and 30, and the number of PCs chosen from among the values 20; for given numbers of basis functions and of PCs, the roughness penalty parameter was chosen by restricted maximum likelihood (Reiss and Ogden, 2009; Wood, 20). For the three wavelet methods, the decomposition level parameter j 0 was set to 4. For WPCR and WPLS, we retained c = 200 wavelet coefficients, and again chose from among 20 components. (In our experience, a small number of retained wavelet coefficients can sometimes attain the minimal CV, but at the expense of leading to highly unstable estimates; we therefore chose the moderate value c = 200 rather than varying c for what would likely be a miniscule reduction in CV.) For WNet, the mixing parameter α in () is chosen from among 0., 0.4, 0.7,, and a dense grid of candidate λ values was automatically chosen by the glmnet algorithm (Friedman, Hastie and Tibshirani, 200) for each value of α. A.2. Permutation test simulation study. To test the power of the permutation procedure for model (A), we again generated continuous responses with specified R 2 values in the sense that σ 2 was chosen to satisfy (A2). For the linear model with a scalar covariate (A7) y i = t i δ + x T i β + ε i, we fixed the coefficient of determination Rt 2 = 0.2 for the scalar predictors, and set the partial coefficient of determination Rx t 2 = 0.02, 0.04,..., 0.6

WAVELET-DOMAIN IMAGE REGRESSION 3 for the image predictors (see Anderson-Sprecher, 994), by choosing δ, σ 2 in (A7) to satisfy the pair of equations (A8) s2 xβ + σ2 s 2 tδ+xβ + σ2 = R2 t = 0.2, σ 2 s 2 xβ + σ2 = R2 x t, where s 2 tδ+xβ is the sample variance of the t iδ + x T i β (i =,..., n). The second equation in (A8) can be solved directly for σ 2. Substituting this value into the first equation, and noting that s 2 tδ+xβ = δ2 s 2 t + 2δs t,xβ + s 2 xβ (where s 2 t is the sample variance of the t i s and s t,xβ is the sample covariance the t i s and the x T i β s), yields a quadratic equation that can be solved for δ. To evaluate the permutation test for logistic regression, we used the same set of R 2 values for the case without scalar covariates, and the same Rt 2 and Rx t 2 values for the case with scalar covariates. As in A., this required defining Rt 2 and Rx t 2 (partial R2 ) for simulating from a logistic regression model, in this case (A9) log p i p i = δ 0 + t i δ + x T i β. With analogous notation to (A5), we define and Rt 2 = Elog(L δ 0,δ ) Elog(L δ0 ) = Rx t 2 = Elog(L δ 0,δ,β) Elog(L δ0,δ ) (A0) = log{+exp( δ 0 t i δ )} +exp( δ 0 t i δ x T i β) + log{+exp(δ 0+t i δ )} log{+exp( δ 0 )} +exp( δ 0 t i δ x T i β) + log{+exp(δ 0)} log{+exp( δ 0 t i δ x T i β)} +exp( δ 0 t i δ x T i β) + log{+exp(δ 0+t i δ +x T i β)}. log{+exp( δ 0 t i δ )} +exp( δ 0 t i δ x T i β) + log{+exp(δ 0+t i δ )} Given (t i, x i ) (i =,..., n), δ 0, and β 0 such that β = sβ 0 for some s, attaining specified values Rt 2 of Rx t 2 reduces to solving the above two

4 P. T. REISS ET AL. equations for δ and s. Assuming the x i s have mean zero, we can simplify the problem via the approximation n log{+exp( δ0 t i δ )} R 2 t +exp( δ 0 t i δ ) + log{+exp(δ 0+t i δ )} +exp(δ 0 +t i δ ) log{+exp( δ0 )} +exp( δ 0 t i δ ) + log{+exp(δ 0)} +exp(δ 0 +t i δ ). We treat this as an equality and solve it for δ, then insert the result into (A0) and solve for s. APPENDIX B: PERMUTATION OF RESIDUALS The original permutation of regressor residuals (PRR) procedure of Potter (2005) differs somewhat from what we propose in 5. of the main text. The PRR procedure (adapted slightly to the image-predictor context) uses design matrix (A) T Π(I P T )X rather than T P T X + Π(I P T )X as in (5); in other words, it would simply replace the X portion of the design matrix with the permuted residuals instead of adding back the permuted residuals to P T X. For the unpenalized model considered by Potter (2005) (see also section 2.4.3 of Ridgway, 2009), the simpler design matrix (A) is equivalent to (5). But for penalized models such as the wavelet-domain elastic net, the two design matrices tend to produce slightly different results. We therefore prefer the permuted-data design matrix (5), which preserves the original data s dependence between scalar and image predictors. In a different neuroimaging setting, Winkler et al. (204) show that PRR (which they refer to as the Smith procedure ) compares favorably to other permutation test procedures for linear models with nuisance predictors. APPENDIX C: LINEAR REGRESSION POWER SIMULATION RESULTS Here we report linear regression simulation results for the permutation test procedure (see 5. for logistic regression results). We first considered the case without scalar covariates, and generated responses (A2) y i = x T i β + ε i with ε i N(0, σ 2 ), i =,..., n = 333, where x i R 642 is the ith image (expressed as a vector), β is the true coefficient image shown in Figure A(a) (similarly vectorized),

WAVELET-DOMAIN IMAGE REGRESSION 5 (a) (b) (c) prop. p... 0.05 0.0 0.2 0.4 0.6 0.8.0 prop. p... 0.05 0.0 0.2 0.4 0.6 0.8.0 0.00 0.05 0.0 0.5 0.20 0.25 0.30 R 2 0.00 0.05 0.0 0.5 0.20 0.25 0.30 R 2 Fig A. (a) True coefficient image β used in the power study: gray denotes 0, black denotes. (b) Estimated probability of rejecting the null hypothesis β = 0 as a function of R 2, with 95% confidence intervals, for model (A2). (c) Same, for model (A3). and σ 2 is chosen to attain approximate R 2 values as in Supplementary Appendix A. We simulated 200 response vectors to assess power to reject H 0 : β = 0 at the p =.05 level for each of the R 2 values.04,.07,.,.5,.2,.25,.3, as well as 000 response vectors with β = 0 (R 2 = 0) to assess type-i error rate. Next we considered testing the same null hypothesis for the model (A3) y i = t i δ + x T i β + ε i with ε i N(0, σ 2 ), with a scalar covariate t i such that R 2 for the submodel E(y i t i ) = t i δ is approximately 0.2. We generated the same number of response vectors as above for each of the above R 2 values, but here R 2 refers to the partial R 2 adjusting for t i (see Supplementary Appendix A.2). The results, displayed in Figure A(b) and (c), indicate that the nominal type-i error rate is approximately attained for both models, and the power exceeds 90% when R 2 is at least.5 for either model (A2) or model (A3). APPENDIX D: SELECTING A SUBSAMPLE OF THE ADHD-200 DATA SET Of the 776 individuals in the ADHD-200 training sample, we considered only the 450 individuals who were right-handed and were either typically developing controls (340) or diagnosed with combined-type ADHD (0), the subtype expected to be most readily distinguishable from controls. Head motion artifacts have recently emerged as a major concern in the resting-state fmri literature (e.g. Van Dijk, Sabuncu and Buckner, 202). Since there is no consensus as yet on how to address this issue, we chose to sacrifice a considerable amount of data in order to minimize the risk of spurious findings due to motion artifacts. We excluded those subjects whose

6 P. T. REISS ET AL. mean framewise displacement (FD) (Power et al., 202), a motion score, exceeded 0.25. We then matched the control and ADHD groups on mean FD by dividing the sample into mean FD deciles, and then randomly subsampling either controls or ADHD individuals within each decile to attain roughly equal control-to-adhd ratios for each decile. This reduced the number of subjects to 333 (257 controls and 76 with combined-type ADHD; 98 males, 35 females; age range 7.7 20.45). The falff data were processed and made available by the Neuro Bureau via the NITRC repository; the data and full details of the image processing steps are available at http://www.nitrc.org/plugins/mwiki/index. php/neurobureau:athenapipeline. Nonzero falff values were recorded only for voxels within the brain, but due to inter-subject differences in scan volume coverage, the set of brain voxels varied somewhat among subjects. Our analysis included the 92 voxels located within the brain for all 333 subjects. APPENDIX E: FROM 2D TO 3D PREDICTORS We fitted model (8) by the wavelet-domain elastic net, using the same set of 333 individuals as in 6, but with 3D maps a 32 32 32 set of voxels from the falff maps rather than with 2D slices. Here, as in 4, we retained sufficiently many wavelet coefficients to capture 99.5% of the excess variance. We were particularly interested in whether the relative performance of lower vs. higher values of α (less sparse vs. more sparse fits) differed when 3D rather than 2D images were used. Figure A2 shows the observed CV deviance when we used, 6, or all 32 of the 32 32 axial slices. For slice, the lowest CV score is attained with α =, i.e., the lasso. But for 6 or 32 slices, less sparse models, in particular α = 0., are favored. This suggests that as the number of voxels grows, choosing a sparse coefficient image incurs a higher cost in terms of predictive accuracy. REFERENCES Anderson-Sprecher, R. (994). Model comparisons and R 2. The American Statistician 48 3 7. Friedman, J., Hastie, T. and Tibshirani, R. (200). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 22. Menard, S. (2000). Coefficients of determination for multiple logistic regression analysis. American Statistician 54 7 24. Potter, D. M. (2005). A permutation test for inference in logistic regression with smalland moderate-sized data sets. Statistics in Medicine 24 693 708. Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L. and Petersen, S. E. (202). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage 59 242 254.

WAVELET-DOMAIN IMAGE REGRESSION 7 slice 6 slices 32 slices CV Deviance 42 43 44 45 α 0. 0.4 0.7 CV Deviance 42 43 44 45 46 α 0. 0.4 0.7 CV Deviance 42 44 46 48 50 52 α 0. 0.4 0.7 0.000 0.002 0.004 0.006 0.008 0.00 0.02 λ 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 λ 0.00 0.02 0.04 0.06 0.08 0.0 0.2 λ Fig A2. CV deviance for the wavelet-domain elastic net fitted to our subsample of the ADHD-200 data set using a 32 32 32 set of voxels from the falff images. Reiss, P. T. and Ogden, R. T. (2009). Smoothing parameter selection for a class of semiparametric linear models. Journal of the Royal Statistical Society: Series B 7 505 523. Ridgway, G. R. (2009). Statistical analysis for longitudinal MR imaging of dementia PhD thesis, University College London. Tibshirani, R. and Knight, K. (999). The covariance inflation criterion for adaptive model selection. Journal of the Royal Statistical Society, Series B 6 529 546. Van Dijk, K. R. A., Sabuncu, M. R. and Buckner, R. L. (202). The influence of head motion on intrinsic functional connectivity MRI. NeuroImage 59 43 438. Winkler, A. M., Ridgway, G. R., Webster, M. A., Smith, S. M. and Nichols, T. E. (204). Permutation inference for the general linear model. NeuroImage 92 38 397. Wood, S. N. (20). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society: Series B 73 3 36. Department of Child and Adolescent Psychiatry New York University School of Medicine Park Ave., 7th floor New York, NY 006 E-mail: phil.reiss@nyumc.org lan.huo@nyumc.org yihong.zhao@nyumc.org amclarekelly@gmail.com Department of Biostatistics Columbia University 722 W. 68th St., 6th floor New York, NY 0032 E-mail: to66@columbia.edu