Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li

Multidimensional heritability analysis of neuroanatomical shape Jingwei Li

Brain Imaging Genetics Genetic Variation Behavior Cognition Neuroanatomy

Brain Imaging Genetics Genetic Variation Neuroanatomy

Descriptors of Brain Structures One-dimensional descriptors (Hibar015; Stein01; Sabuncu01) Volume Surface area Drawbacks Limited when capturing the anatomical variation Same area

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) ψ: R n R n+k is the local parametrization of a submonifold M of R n+k g ij =< i ψ, j ψ >, G = g ij, W = det G, g ij = G 1 i, j n n If f and φ are real-valued functions defined on M, then f, φ = i,j g i,j i f j φ, Δf = 1 W i,j i g ij W j f where f, φ < grad f, grad φ > and Δf div grad f. Nabla operator Laplace-Beltrami operator Solve Laplacian eigenvalue problem: Δf = λf eigenfunction eigenvalue

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Translate Laplacian eigenvalue problem: Δf = λf to a variational problem: φδf dσ = f, φ dσ Green formula Since f, φ = i,j g i,j i f j φ and φδf dσ = φ λf dσ = λ φfdσ i,j g i,j i f j φ dσ = λ φfdσ variational problem

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Discretization of i,j g i,j i f j φ dσ = λ φfdσ: Choose n linearly independent form functions: φ 1 x, φ x,, φ n x as basis functions (e.g. x, x, x 3, ) defined on the parameter space. Any eigenfunction f can be approximately projected to the basis functions: f x F x = U 1 φ 1 x + + U n φ n x To solve U, substitute f and φ into the variational problem. Define A = a lm n n = j,k j F l k F m g jk dσ n n and B = b lm n n = F l F m dσ n n => AU = λbu General eigenvalue problem

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Solve a Laplacian eigenvalue problem defined based on the brain region Obtain the first M eigenvalues Properties (Reuter 006): Isometric invariant For planar shapes and 3D-solids: isometry congruency (identical after rigid body transformation) For surface: isometry congruency

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Solve a Laplacian eigenvalue problem defined based on the brain region Obtain the first M eigenvalues Properties (Reuter 006): Isometric invariant scaling a n-dimensional manifold by the factor a results in scaled eigenvalues by the factor 1 a In this paper, eigenvalues are scaled: λ i,m = λ i,m V i /3 i: subject; m: dimension

Heritability A phenotype/trait can be influenced by genetic and environmental effects. Heritability: how much of the variation in a phenotype/trait is due to variation in genetic factors.

Main Idea of This Paper Truncated LBS is more representative for a shape compared to volume. Use truncated LBS as descriptors for 1 brain regions to compute heritability. Compare that with volumebased heritability. To adapt truncated LBS into GCTA (Genome-wide Complex Trait Analysis) (Yang 011) heritability model, propose a multi-dimensional heritability model.

GCTA heritability model N 1 trait vector (N: #subjects) y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Additive genetic component Common environmental component Unique environmental component

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I K: genetic similarity matrix Familial study: K = Kinship Coefficients. E.g. parent-offspring (0.5), identical twins (1), full siblings (0.5), half siblings (0.5) Unrelated subjects study: genome-side single-nucleotide polymorphism (SNP) data

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I What is Single-Nucleotide Polymorphism (SNP): Each locus on a DNA sequence is a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G). SNP: a DNA sequence variation occurring when the types of single nucleotide in the genome (or other shared sequence) differs between individuals or paired chromosomes in one subject. E.g., AAGCCTA and AAGCTTA. SNP can leads to alleles (variants of a given gene). Each SNP can have 3 genotypes: AA, Aa, aa (denoted as 0-)

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I How to compute genetic similarity from SNP: X(#subjects x #SNPs). Standardize each column of X (mean 0, variance 1). 0 1 1 0 K = XXT #SNPs

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Λ: shared environment matrix between the subjects Familial study: e.g., twins & non-twin siblings (1) Unrelated subjects study: Λ vanishes

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Identical matrix

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I heritability h = σ A σ A + σ C + σ E h : the variance in the trait explained by the variance in additive genetic component

Multi-dimensional traits heritability model N M trait matrix (N: #subjects) (M: #dimensions) Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I Σ A = σ A rs M M : σ Ars is the genetic covariance between r-th and s-th dimensions in traits Σ C = σ C rs M M : σ Crs is the common environmental covariance between r-th and s-th dimensions in traits Σ E = σ E rs M M : σ Ers is the unique environmental covariance between r-th and s-th dimensions in traits

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I : Kronecker product Σ A rs K = σ A 11 K σ A1 K σ A1M K σ A 1 K σ A K σ A M K σ A M1 K σ AM K σ AMM K

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I vec a 1, a,, a k = a 1 a a k

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I heritability h = tr Σ A tr Σ A + tr Σ C + tr Σ E = M m=1 γ m h m where γ m = σ Amm + σ Cmm + σ Emm h m = M p=1 σ A pp + σ Cpp + σ Epp σ A mm σ A mm + σ Cmm + σ Emm The multi-dimensional trait heritability is a weighted average of the heritability of each dimension.

Multi-dimensional traits heritability model Properties Invariant to rotations of data Y = G + C + E (1) YT = GT + CT + ET () T T T = TT T = I h T = h heritability from model () heritability from model (1)

Consider covariates Sometimes, we want to study the effects after controlling some nuisance variables by regressing them out. E.g., age, gender, handness

Covariates (N q) Consider covariates Y = XB + G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I U: N N q Y = U T Y = U T G + U T C + U T E = G + C + E vec G ~N 0, Σ A U T KU, vec C ~N 0, Σ C U T ΛU, vec E ~N 0, Σ E I U T X = 0 U T U = I UU T = I X X T X 1 X T

Datasets: Analysis Genomics Superstruct Project (GSP; N = 130) unrelated subjects Human Connectome Project (HCP; N = 590) 7 monozygotic twin pairs 69 dizygotic twin pairs 53 full siblings of twins 55 singletons 1 brain structures Traits Volume Truncated LBS

Volume heritability (GSP data) Before multiple comparisons correction: 3/1 brain structures are significant After multiple comparisons correction: none is significant Most structures: parametric & nonparametric p values are similar => standard errors estimates are accurate

Volume heritability (GSP data) Test-retest reliability: Lin s concordance correlation coefficient correlation coefficient ρ c = ρσ x σ y σ x + σ y + μ x μ y variance mean x, y: use repeated runs on separate days of the same set of subjects

Truncated LBS heritability (GSP data) Before multiple comparisons correction: 7/1 brain structures are significant After multiple comparisons correction: 5/1 brain structures are significant Most structures: parametric & nonparametric p values are similar => standard errors estimates are accurate Smaller standard error than volume-based heritability

Truncated LBS heritability (GSP data) Test-retest reliability: Averaged Lin s concordance correlation coefficient across M dimensions correlation coefficient ρ c = ρσ x σ y σ x + σ y + μ x μ y variance mean x, y: use repeated runs on separate days of the same set of subjects

Truncated LBS heritability (GSP data)

Truncated LBS heritability (HCP data) Structure h Standard Error Accumbens area 0.309 0.16 Caudate 0.583 0.14 Cerebellum 0.653 0.10 Corpus Callosum 0.558 0.136 Hippocampus 0.363 0.190 Third Ventricle 0.536 0.134 Putamen 0.483 0.1 Only significant brain structures results are shown Consistently higher than GSP dataset Possible reason: in unrelated subjects only the variation of some common SNPs are captured.

Visualizing principal mode of shape variation PCA is a kind of rotation of data. The first PC of LBS explains a large percentage of shape variation. Heritability model: (1) invariant to rotation; () heritability of multi-dimensional trait = weighted average of each dimension s heritability The heritability of truncated LBS is the weighted average of the first M PCs heritability.

Visualizing principal mode of shape variation Procedures (for one brain structure) 1. Register each subject s mask (1 in structure, 0 out of structure) to a common used template.. Create a population average of structure surface for plotting A weighted average of all subjects registered mask image Weight: Gaussian kernel center: average of first PC distance: subject-specific corresponding first PC <-> center Width: resulting 500 shapes have non-0 weights The isosurface with 0.5 in the averaged map 3. Use the same Gaussian kernel, generate averaged maps by including the shapes around + standard deviation of the first PC (- s.d. as well) 4. Plot the difference between the two maps in step 3 on the surface generated in step.

Visualizing principal mode of shape variation Red: shapes around + s.d. are larger than - s.d. Blue: shapes around - s.d. are larger than + s.d.

Strengths Use truncated LBS instead of volume as features Capture more shape variation Isometry invariance Does not require any registration or mapping (Reuter 006 & 009) Generalize the concept of heritability into multidimensional phenotypes Other applications (multi-tests of one behavior; disease study)

Strengths Variability of heritability estimation Multi-dimensional trait heritability model < original GCTA model (unrelated subject dataset) Heritability estimates are more accurate, more significant Propose a visualization method for shape variation Interpretation: shape variation along the first PC axis of the shape descriptor

Weakness Optimal number of eigenvalue may not be 50 Only 30, 50, 70 are tested Error bars for difference number of eigenvalues are not shown Other number except 50 (used in paper) could lead to higher heritability and smaller error bars

Weakness Optimal number of eigenvalue can be different for different brain structures Amygdala: heritability is similar for 30, 50, 70 eigenvalues (even decrease) 3 rd -ventricle: heritability increases from 0.4 to 0.6

Weakness Links between proposed visualization method and LBS heritability are not clear. Only volume-based GCTA heritability is compared to the new method and new model. More comparisons with the literature (e.g., Gilmore 010; Baare 001)

Backup: invariant to rotations of data cov vec GT = cov T T I vec G = T T I vec G T I = T T I Σ A K T I = T T Σ A T K Theorem: vec AXB = B T A vec X Here A = I, X = G, B = T A B T = A T B T cov AX = Acov X A T A B C D = AC BD Similarly, cov vec CT = T T Σ C T Λ, cov vec ET = T T Σ E T I h T = tr T T Σ A T tr T T Σ A T + tr T T Σ C T + tr T T Σ E T tr ABC = tr BCA = tr CAB Associative property of matrix multiplication = tr Σ A TT T tr Σ A TT T + tr Σ C TT T + tr Σ E (TT T ) = tr Σ A tr Σ A + tr Σ C + tr Σ E = h

Backup: multi-dimensional trait heritability is a weighted average of heritability of each dimension h = = tr Σ A tr Σ A + Σ C + Σ E p=1 M m=1 M M σ A pp + p=1 σ A mm σ C pp + p=1 M σ E pp = M m=1 σ A mm + σ Cmm + σ Emm M p=1 σ A pp + σ Cpp + σ Epp σ A mm σ A mm + σ Cmm + σ Emm = M m=1 γ m h m

Backup: moment-matching estimator for unrelated subjects (no shared environmental component) cov y r, y s = σ A rs K + σ Ers I y ry s T = σ A rs K + σ Ers I To estimate σ A rs, σ Ers, use a regression model: vec y r y T s = σ A rs vec K + σ Ersvec I y s y r = σ A rs vec K + σ Ersvec I vec K T y s y r vec I T y s y r = σ A rs vec K T vec K + σ E rs vec K T vec I = σ A rs vec I T vec K + σ E rs vec I T vec I y s y r T vec K = σ A rs vec K T vec K + σ E rs vec I T vec K y s y r T vec I = σ A rs vec K T vec I + σ E rs vec I T vec I y s T y r T vec K = σ A rs vec K T vec K + σ E rs vec I T vec K y s T y r T vec I = σ A rs vec K T vec I + σ E rs vec I T vec I y r T Ky s = σ A rstr K + y r T y s = σ A rs tr K + σ Ers σ E rstr K tr I

Backup: moment-matching estimator for unrelated subjects (no shared environmental component) σ Ars σ = tr K tr K E rs tr K tr I 1 yr T Ky s y r T y s σ A rs = y r T NK tr K I y s Ntr K tr K y r T K τi y s ν K σ E rs = y r T tr K I tr K K y s Ntr K tr [K] = y r T κi τk y s ν K where τ = tr K N, κ = tr K N, ν K = tr K Σ A = YT K τi Y ν K, Σ E = YT κi τk Y ν K tr K N = N κ τ

Backup: sampling variance of the point estimator Q A K τi ν K, Q E κi τk ν K t A tr Σ A = tr Y T Q A Y, t E = tr Σ E = tr Y T Q E Y, t = t A te The heritability is a function of t: f t = var h SNP = var f t f t t cov t t A t A +t E f t t T where f t t = f t t Define V rs = cov y r, y s, f t t = t E t A +t E, = σ A rs K + σ Ers I t A t A +t E

Backup: sampling variance of the point estimator = = cov tr Y T Q α Y, tr Y T Q β Y M r,s=1 M r,s=1 cov t = cov y r T Q α y r, y s T Q β y s tr Q α V rs Q β V rs M tr Q A V rs Q A V rs tr Q A V rs Q E V rs r,s=1 tr Q E V rs Q A V rs tr Q E V rs Q E V rs M r,s=1 σ A rs + σ Ers = tr Σ A + Σ E 1 τ ν K τ κ tr Σ A + Σ E 1 1 ν K 1 1 Quadratic form of statistics: cov ε T Λ 1 ε, ε T Λ ε = tr Λ 1 ΣΛ Σ + 4μ T Λ 1 ΣΛ μ Here μ = 0 tr Q A tr Q A Q E tr Q E Q A tr Q E K I τ 1, κ 1 V rs = σ A rs K + σ Ers I σ A rs I + σ Ers I

Backup: sampling variance of the point estimator tr Q A = tr K τi ν K = tr K tr K tr K N I tr K N = tr K tr K N tr K KI + tr K tr K N N I tr Q A Q E = tr K tr K N = tr = tr K tr K N K τi tr K N + tr K N = 1 κi τk ν K ν K tr K tr K N tr K tr K N ν K = tr κki τk τki + τ IK ν K + tr3 K N = tr K N tr K tr K tr K + tr K N tr K tr = τ K N ν K

Backup: sampling variance of the point estimator tr Q E = = = tr κi τk κ tr K N κ tr K = tr κ I κτk + τ K ν K ν K N tr K N tr K N + tr K ν K ν K tr K tr K N var h SNP = var f t f t t N + tr K N cov t f t N tr K = κ ν K t T tr Σ A + Σ E 1 1 t ν K t A + t 4 E, t A E 1 1 tr K = tr Σ A + Σ E ν K tr Σ A + tr Σ = tr Σ A + Σ E E ν K tr Σ A + Σ = E ν K t E t = tr Σ A + Σ E t A ν K t A + t 4 A + t E E tr Σ P tr Σ P

Backup: sampling variance of the point estimator For univariate trait, tr Σ P For multi-dimensional trait, = tr Σ P, var h SNP = ν K tr Σ P tr Σ P = i=1 M λ i i=1 M λ 1 var h SNP i ν K