Semiparametric Methods for Mapping Brain Development

Size: px

Start display at page:

Download "Semiparametric Methods for Mapping Brain Development"

Virginia McDowell
6 years ago
Views:

1 University of Haifa From the SelectedWorks of Philip T. Reiss May, 2012 Semiparametric Methods for Mapping Brain Development Philip T. Reiss Yin-Hsiu Chen Lan Huo Available at:

2 1 / 40 Semiparametric methods for mapping brain development Philip T. Reiss New York University and Nathan Kline Institute phil.reiss@nyumc.org Thomas R. Ten Have Symposium on Statistics in Psychiatry University of Pennsylvania May 23, 2012 Research supported in part by grant DMS , US National Science Foundation

3 2 / 40 Acknowledgments This is joint work with Yin-Hsiu Chen and Lan Huo (NYU) building on earlier joint work with Lei Huang (NYU, now at Johns Hopkins), Thad Tarpey (Wright State), and Maarten Mennes (NYU, now at Radboud University Nijmegen). Aaron Alexander-Bloch, Armin Raznahan and Jay Giedd (NIMH) supplied the data; and they, along with Eva Petkova and Xavier Castellanos (NYU), provided valuable feedback.

3 / 40 Cortical thickness The cerebral cortex, a convoluted sheet of neurons, is the grey matter area of the brain responsible for information processing.

4 3 / 40 Cortical thickness The cerebral cortex, a convoluted sheet of neurons, is the grey matter area of the brain responsible for information processing. Cortical thickness (CT), which varies from about mm, has been linked to cognitive function. Maps of CT can be derived from magnetic resonance images, by measuring distance from the pial surface to the grey/white matter boundary at points (vertices).

5 4 / 40 An example cortical thickness map

6 5 / 40 NIMH cortical thickness data set The Brain Imaging Unit at the NIMH Child Psychiatry Branch, headed by Dr. Jay Giedd, has been conducting longitudinal MRI studies for about two decades. They have kindly provided us with 1181 CT maps from 615 typically developing controls representing 398 families: Number of scans Frequency Age at time of scan ranges from Each map consists of CT values at vertices.

7 Neurodevelopmental trajectories Overarching scientific question: Characterizing the course of development of CT throughout the brain in normal development ( implications for how we view the teen brain ) in neurological or psychiatric disorders Key finding: CT has an inverted U-shaped trajectory, attributed to pruning of neurons Maturation may be related to attainment of peak thickness ( delay theory of ADHD) CT Age Control ADHD 6 / 40

8 7 / 40 Standard way to map neurodevelopmental trajectories: Polynomial functions of age (e.g., Zuo et al., 2010)

9 More formally: In a cross-sectional study of n individuals, for i = 1,..., n, we have collected age x i ; CT values y i1,..., y iv ; other demographic / clinical covariates (but we ll omit these for simplicity). Standard approach for developmental neuroimaging data: for v = 1,..., V, use stepwise testing or AIC to choose among M 0v : E(y iv ) = β 0v M 1v : E(y iv ) = β 0v + β 1v x i M 2v : E(y iv ) = β 0v + β 1v x i + β 2v xi 2 M 3v : E(y iv ) = β 0v + β 1v x i + β 2v xi 2 + β 3v xi 3 Similarly, for a longitudinal study, we choose among M u 0v, Mu 1v, M u 2v, Mu 3v, where M u 1v : E(y iv u iv ) = β 0v + β 1v x i + u iv, etc. 8 / 40

10 9 / 40 Some advantages of polynomial models: familiar relatively straightforward computationally fast age of peak CT falls out of quadratic model: Some disadvantages: β 0 + β 1 x i + β 2 x 2 i peaks at x = β 1 2β 2. polynomials may not describe the trajectory well (e.g., plateau) estimated peak age is highly sensitive to sampled age range (Fjell et al., 2010) Spline smoothing can remove both of these limitations.

11 So... why don t we just model vertex-wise developmental trajectories by penalized B-splines instead of polynomials? Three issues: 1. Conceptual complexity 2. Computational expense 3. Not clear how to map the results 10 / 40

12 11 / 40 Goals for the rest of the talk Consider how we can address these 3 issues Present preliminary applications of semiparametric regression to inverted-u-shaped developmental trajectories of cortical thickness

13 12 / 40 Issue 1: Conceptual complexity To mitigate this problem, we adopt a minimalist approach to semiparametric regression (Ruppert, Wand and Carroll, 2003). A key tool is the mixed-model formulation of penalized splines, which offers a unifying, simplifying framework for modeling and inference.

14 13 / 40 Penalized B-splines Given data (x 1, y 1 ),..., (x n, y n) (univariate responses), we fit the model as follows. y i = g(x i ) + ε i, E(ε i ) = 0 1. Assume that g(x) = θ T b(x) where b( ) = [b 1 ( ),..., b K ( )] T, where the b j s are B-spline basis functions. 2. Estimate θ by penalized least squares: ˆθ = arg min θ R K y Bθ 2 }{{} sum of squared errors where y = (y 1,..., y n) T, b 1 (x 1 )... b K (x 1 ) B = λ θ } T {{ Pθ }, roughness functional b 1 (x n)... b K (x n) λ is a tuning parameter, and P is a K K matrix. ;

15 Roughness penalty λ θ T Pθ in criterion y Bθ 2 + λ θ T Pθ prevents overfitting by shrinking ĝ( ) = ˆθ T b( ) toward {θ T b( ) : θ T Pθ = 0} = {θ T b( ) : θ nullp}; smoothing parameter λ 0 controls the degree of shrinkage. Examples: P = ( b i b j ) 1 i,j K θ T Pθ = g (x) 2 dx shrink toward linear g x y λ small x y x y λ large P = ( b i b j ) 1 i,j K θ T Pθ = g (x) 2 dx shrink toward constant g x y x y x y 14 / 40

16 15 / 40 Connection with linear mixed models With a change of basis B (X Z ) where colx = nullp (e.g., Wand and Ormerod, 2008), criterion y Bθ 2 + λθ T Pθ becomes y Xβ Z u 2 + λu T u, which is log[f (y u)f (u)] for the linear mixed model y = Xβ + Z u + ε with u N[0, (σ 2 /λ)i], ε N(0, σ 2 I) (cf. Speed, 1991). Can reduce nonparametric inference to mixed model inference, with smoothing parameter λ recast as a variance parameter (e.g., Ruppert, Wand and Carroll, 2003; Wood, 2011).

17 More specifically, the correspondence between the penalized spline problem ( ) ˆθ = arg min y Bθ 2 + λθ T Pθ (1) θ R K and a linear mixed model y = Xβ + Z u + ε with u N[0, (σ 2 /λ)i], ε N(0, σ 2 I), (2) motivates basing inference on the modified profile likelihood, aka the restricted maximum likelihood (REML) criterion l R (λ y) = 1 2 (n p) log[y T {V 1 λ V 1 λ X(X T V 1 X) 1 X T V 1 }y] 1 2 log V λ 1 2 log X T V 1 λ X (with V λ I n + λ 1 Z Z T ) : λ λ 1. Testing H 0 : θ nullp (e.g., a linear fit vs. a nonparametric alternative) reduces to testing λ = (zero random effect variance); can use the Crainiceanu and Ruppert (2004) restricted likelihood ratio test statistic RLRT(y) = sup λ 0 2l R (λ y) 2l R ( y). 2. Do optimal smoothing by taking λ = arg max λ 0 l R (λ y) in (1). 16 / 40

18 17 / 40 Upshot regarding issue 1 (conceptual complexity) Linear mixed models are already in use by neuroimagers (for studies with longitudinal and/or family data). With penalized splines, estimation and testing for semiparametric regression can be reduced to corresponding questions for mixed models.

19 18 / 40 Issue 2: Computational expense For v = 1,..., V 80000, the penalized spline problem ( ) ˆθ v = arg min y v Bθ 2 + λθ T Pθ θ R K gives rise to REML criterion l R (λ y v ) and hence to 1. a problem of testing parametric null H 0v : θ nullp via RLRT(y v ) = sup λ 0 2l R (λ y v ) 2l R ( y v ). 2. an optimal smoothing problem of finding λ v = arg max λ 0 l R (λ y v ). Question: How to solve each of these problems efficiently V times??

20 19 / 40 Massively parallel RLRT: naïve approach To approximate RLRT(y v ) = sup 2l R (λ y v ) 2l R ( y v ) for v = 1,..., V, λ 0 1. choose a grid λ (1) <... < λ (G) ; 2. find the maximum of each column of the G V matrix l R (λ (1) y 1 ) l R (λ (1) y 2 )... l R (λ (1) y V ) l R (λ (G) y 1 ) l R (λ (G) y 2 )... l R (λ (G) y V )

21 20 / 40 Massively parallel RLRT: proposed fast approach Computational shortcut: That big matrix [ ] l R (λ (g) y v ) 1 g G,1 v V can be shown to equal y T 1 M λ (1)y 1... y T V M λ (1)y V..... y T 1 M λ (G)y 1... y T V M λ (G)y V = 1 T [Y (M λ (1)Y )]. 1 T [Y (M λ (G)Y )], where Y = (y 1... y V ) and M λ = V 1 λ X(X T V 1 X) 1 X T V 1 λ λ V 1 λ.

22 21 / 40 Massively parallel smoothing At vth point (vertex), the smooth function g v(x) = θ T v b(x) is estimated via ( ) ˆθ v = arg min y v Bθ 2 + λ vθ T Pθ, θ R K where B = [b j (x i )] 1 i n,1 j K. Two tasks must be performed for v = 1,..., V with huge V : 1. Find the optimal λ v, i.e., λ v = arg max λ 0 l R (λ y v ). Use same trick as for RLRT. 2. Given the optimal λ v, compute ˆθ v = (B T B + λ vp) 1 B T y v. Demmler-Reinsch algorithm lets us do this in one big matrix multiplication.

23 22 / 40 Approximate timing comparisons (in minutes) for a real-data example RLRT Smoothing Naïve Proposed

24 23 / 40 RLRT for null hypothesis of linear development

25 24 / 40 Issue 3: Mapping the estimates Massively parallel smoothing yields an estimate ĝ v ( ) of the mean developmental trajectory for vertex v (v = 1,..., V ). How to display a succinct summary of all these function estimates? Solution: Treat them as functional data, and apply functional data clustering methodology (e.g., Tarpey and Kinateder, 2003).

26 25 / 40 Functional data clustering algorithm 1. Interested in shape, rather than height, of trajectories cluster estimated first derivative of trajectories... ĝ 1 ( ) = ˆθ T 1 b( ). ĝ V ( ) = ˆθ T V b( ) ĝ 1 ( ) = ˆθ T 1 b ( ). ĝ V ( ) = ˆθ T V b ( ) 2. Truncated Karhunen-Loève expansion ĝ v( ) M m=1 c vmφ m ( ) reduce vth function to functional principal component scores c v1,..., c vm (e.g., Silverman, 1996; Ramsay and Silverman, 2005). 3. Apply k-means clustering to the c vm s.

27 26 / 40 2-cluster solution

28 27 / 40 3-cluster solution

29 28 / 40 4-cluster solution

30 29 / 40 5-cluster solution

31 30 / 40 Age of peak CT: some preliminary work The proposed fast methods work mainly for cross-sectional analyses. Whole-brain longitudinal analyses still require hours (even for polynomial regression). But... we can apply our methods to repeated cross-sectional subsamples from the NIMH data set to study properties of cross-sectional analyses (e.g., variability). Sample Families Individuals Scans Implementation Full Unrelated longitudinal gamm4 loop Cross-sectional Proposed Cross-sectional methods....

32 LH vertex 4124 Age Cortical thickness With random effects subtracted off Age Cortical thickness 31 / 40

33 32 / 40 Example: Sex differences in age of peak CT Repeated cross-sectional sex-specific curves for vertex Cortical thickness Female Male Age

34 33 / 40 Age of peak thickness: median over cross-sectional samples Male Female

35 34 / 40 Vertex CDF of age of peak CT Cortical thickness Female Male Cumulative distribution Female Male Age Age

36 35 / 40 Vertex 65953: 2nd derivative penalty Vertex 65953: 3rd derivative penalty Cortical thickness Female Male Cortical thickness Female Male Age Age

37 36 / 40 2nd derivative penalty 3rd derivative penalty Cumulative distribution Female Male Cumulative distribution Female Male Age Age

38 37 / 40 Discussion We have outlined three challenges for semiparametric mapping of neurodevelopmental trajectories, and progress toward addressing them. In the coming years, longitudinal imaging data will become much more common, but most data sets will still be cross-sectional. Much work is needed in many areas e.g., massively parallel varying-coefficient models for differently-shaped trajectories in different demographic or diagnostic groups.

39 38 / 40 Thank you! Ben Franklin s 13 Virtues 1. Temperance 2. Silence 3. Order 4. Resolution 5. Computational Frugality 6. Industry 7. Sincerity 8. Justice 9. Moderation 10. Cleanliness 11. Tranquility 12. Chastity 13. Humility

40 References I Crainiceanu, C. M., and Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society, Series B 66, Fjell, A. M., Walhovd, K. M., Westlye, L. T., Østby, Y. Tamnes, C. K., Jernigan T. L., Gamst, A., and Dale, A. M. (2010). When does brain aging accelerate? Dangers of quadratic fits in cross-sectional studies. NeuroImage 50, Ramsay, J. O., and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. New York: Springer. Reiss, P. T., Huang, L., Chen, Y.-H., Tarpey, T., and Mennes, M. (2011). Massively parallel nonparametrics. Submitted. Available at Ruppert, D., Wand, M. P., and Carroll, R. J. (2003). Semiparametric Regression. Cambridge and New York: Cambridge University Press. Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm. Annals of Statistics 24, Speed, T. (1991). Discussion of That BLUP is a good thing: the estimation of random effects by G. K. Robinson. Statistical Science 6, Tarpey, T., and Kinateder, K. K. J. (2003). Clustering functional data. Journal of Classification 20, Wand, M. P., and Ormerod, J. T. (2008). On semiparametric regression with O Sullivan penalized splines. Australian & New Zealand Journal of Statistics 50, Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society, Series B 73, / 40

41 References II Zuo, X.-N., Kelly, A. M. C., Di Martino, A., Mennes, M., Margulies, D., Bangaru, S., Grzadzinski, R., Evans, A., Zang, Y., Castellanos, F. X., and Milham, M. P. (2010). Growing together and growing apart: regional differences in the lifespan developmental trajectories of functional homotopy. Journal of Neuroscience 30, / 40

Function-on-Scalar Regression with the refund Package

University of Haifa From the SelectedWorks of Philip T. Reiss July 30, 2012 Function-on-Scalar Regression with the refund Package Philip T. Reiss, New York University Available at: https://works.bepress.com/phil_reiss/28/