Multidimensional heritability analysis of neuroanatomical shape. Jingwei Li

Similar documents
... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

Methods for Cryptic Structure. Methods for Cryptic Structure

Variance Component Models for Quantitative Traits. Biostatistics 666

Lesson 4: Understanding Genetics

(Genome-wide) association analysis

Quantitative characters II: heritability

Research Statement on Statistics Jun Zhang

Short Answers Worksheet Grade 6

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

Lecture WS Evolutionary Genetics Part I 1

Case-Control Association Testing. Case-Control Association Testing

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

G E INTERACTION USING JMP: AN OVERVIEW

Linear Regression (1/1/17)

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

Genotype Imputation. Biostatistics 666

PCA and admixture models

Affected Sibling Pairs. Biostatistics 666

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

The concept of breeding value. Gene251/351 Lecture 5

STATISTICAL SHAPE MODELS (SSM)

Resemblance between relatives

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Association studies and regression

Power and sample size calculations for designing rare variant sequencing association studies.

Objective 3.01 (DNA, RNA and Protein Synthesis)

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

The Mystery of Missing Heritability: Genetic interactions create phantom heritability - Supplementary Information

Lecture 9: Kernel (Variance Component) Tests and Omnibus Tests for Rare Variants. Summer Institute in Statistical Genetics 2017

Introduction to Machine Learning. Recitation 11

Variance Components: Phenotypic, Environmental and Genetic

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Régression en grande dimension et épistasie par blocs pour les études d association

Department of Forensic Psychiatry, School of Medicine & Forensics, Xi'an Jiaotong University, Xi'an, China;

LIFE SCIENCE CHAPTER 5 & 6 FLASHCARDS

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Lecture 9. QTL Mapping 2: Outbred Populations

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Unconstrained Ordination

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Genetics Studies of Multivariate Traits

Overview. Background

Some models of genomic selection

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Second-Order Inference for Gaussian Random Curves

Fundamental concepts of functional data analysis

Nonlinear Dimensionality Reduction

Descriptive Statistics

Big Idea 3: Living systems store, retrieve, transmit, and respond to information essential to life processes.

Quantitative characters - exercises

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

The Laplacian ( ) Matthias Vestner Dr. Emanuele Rodolà Room , Informatik IX

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 3. The genetic evaluation (for a single trait) The Estimated Breeding Values (EBV) The accuracy of EBVs

Probability of Detecting Disease-Associated SNPs in Case-Control Genome-Wide Association Studies

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Resemblance among relatives

On Expected Gaussian Random Determinants

Calculation of IBD probabilities

Neuroimage Processing

TAMS39 Lecture 2 Multivariate normal distribution

EE16B Designing Information Devices and Systems II

Bayesian Inference of Interactions and Associations

BTRY 7210: Topics in Quantitative Genomics and Genetics

Denisova cave. h,ps://

Quantitative Traits Modes of Selection

Lecture 7 Correlated Characters

Lecture 24: Multivariate Response: Changes in G. Bruce Walsh lecture notes Synbreed course version 10 July 2013

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Section 2. Basic formulas and identities in Riemannian geometry

Genetics Studies of Comorbidity

Lecture 32: Infinite-dimensional/Functionvalued. Functions and Random Regressions. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Heredity and Genetics WKSH

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Multivariate analysis of genetic data: an introduction

Grouped Network Vector Autoregression

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-marker Association Testing

Quantitative characters

A FAST, ACCURATE TWO-STEP LINEAR MIXED MODEL FOR GENETIC ANALYSIS APPLIED TO REPEAT MRI MEASUREMENTS

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 2. Genetics of quantitative (multifactorial) traits What is known about such traits How they are modeled

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

25 : Graphical induced structured input/output models

Notes on Twin Models

MIXED MODELS THE GENERAL MIXED MODEL

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Quantitative characters

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Calculation of IBD probabilities


PCA vignette Principal components analysis with snpstats

Genetic Studies of Multivariate Traits

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Transcription:

Multidimensional heritability analysis of neuroanatomical shape Jingwei Li

Brain Imaging Genetics Genetic Variation Behavior Cognition Neuroanatomy

Brain Imaging Genetics Genetic Variation Neuroanatomy

Descriptors of Brain Structures One-dimensional descriptors (Hibar015; Stein01; Sabuncu01) Volume Surface area Drawbacks Limited when capturing the anatomical variation Same area

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) ψ: R n R n+k is the local parametrization of a submonifold M of R n+k g ij =< i ψ, j ψ >, G = g ij, W = det G, g ij = G 1 i, j n n If f and φ are real-valued functions defined on M, then f, φ = i,j g i,j i f j φ, Δf = 1 W i,j i g ij W j f where f, φ < grad f, grad φ > and Δf div grad f. Nabla operator Laplace-Beltrami operator Solve Laplacian eigenvalue problem: Δf = λf eigenfunction eigenvalue

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Translate Laplacian eigenvalue problem: Δf = λf to a variational problem: φδf dσ = f, φ dσ Green formula Since f, φ = i,j g i,j i f j φ and φδf dσ = φ λf dσ = λ φfdσ i,j g i,j i f j φ dσ = λ φfdσ variational problem

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Discretization of i,j g i,j i f j φ dσ = λ φfdσ: Choose n linearly independent form functions: φ 1 x, φ x,, φ n x as basis functions (e.g. x, x, x 3, ) defined on the parameter space. Any eigenfunction f can be approximately projected to the basis functions: f x F x = U 1 φ 1 x + + U n φ n x To solve U, substitute f and φ into the variational problem. Define A = a lm n n = j,k j F l k F m g jk dσ n n and B = b lm n n = F l F m dσ n n => AU = λbu General eigenvalue problem

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Solve a Laplacian eigenvalue problem defined based on the brain region Obtain the first M eigenvalues Properties (Reuter 006): Isometric invariant For planar shapes and 3D-solids: isometry congruency (identical after rigid body transformation) For surface: isometry congruency

Descriptors of Brain Structures Multi-dimensional shape descriptor: truncated Laplace-Beltrami Spectrum (LBS) Solve a Laplacian eigenvalue problem defined based on the brain region Obtain the first M eigenvalues Properties (Reuter 006): Isometric invariant scaling a n-dimensional manifold by the factor a results in scaled eigenvalues by the factor 1 a In this paper, eigenvalues are scaled: λ i,m = λ i,m V i /3 i: subject; m: dimension

Heritability A phenotype/trait can be influenced by genetic and environmental effects. Heritability: how much of the variation in a phenotype/trait is due to variation in genetic factors.

Main Idea of This Paper Truncated LBS is more representative for a shape compared to volume. Use truncated LBS as descriptors for 1 brain regions to compute heritability. Compare that with volumebased heritability. To adapt truncated LBS into GCTA (Genome-wide Complex Trait Analysis) (Yang 011) heritability model, propose a multi-dimensional heritability model.

GCTA heritability model N 1 trait vector (N: #subjects) y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Additive genetic component Common environmental component Unique environmental component

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I K: genetic similarity matrix Familial study: K = Kinship Coefficients. E.g. parent-offspring (0.5), identical twins (1), full siblings (0.5), half siblings (0.5) Unrelated subjects study: genome-side single-nucleotide polymorphism (SNP) data

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I What is Single-Nucleotide Polymorphism (SNP): Each locus on a DNA sequence is a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G). SNP: a DNA sequence variation occurring when the types of single nucleotide in the genome (or other shared sequence) differs between individuals or paired chromosomes in one subject. E.g., AAGCCTA and AAGCTTA. SNP can leads to alleles (variants of a given gene). Each SNP can have 3 genotypes: AA, Aa, aa (denoted as 0-)

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I How to compute genetic similarity from SNP: X(#subjects x #SNPs). Standardize each column of X (mean 0, variance 1). 0 1 1 0 K = XXT #SNPs

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Λ: shared environment matrix between the subjects Familial study: e.g., twins & non-twin siblings (1) Unrelated subjects study: Λ vanishes

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I Identical matrix

GCTA heritability model y = g + c + e g~n 0, σ A K c~n 0, σ C Λ e~n 0, σ E I heritability h = σ A σ A + σ C + σ E h : the variance in the trait explained by the variance in additive genetic component

Multi-dimensional traits heritability model N M trait matrix (N: #subjects) (M: #dimensions) Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I Σ A = σ A rs M M : σ Ars is the genetic covariance between r-th and s-th dimensions in traits Σ C = σ C rs M M : σ Crs is the common environmental covariance between r-th and s-th dimensions in traits Σ E = σ E rs M M : σ Ers is the unique environmental covariance between r-th and s-th dimensions in traits

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I : Kronecker product Σ A rs K = σ A 11 K σ A1 K σ A1M K σ A 1 K σ A K σ A M K σ A M1 K σ AM K σ AMM K

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I vec a 1, a,, a k = a 1 a a k

Multi-dimensional traits heritability model Y = G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I heritability h = tr Σ A tr Σ A + tr Σ C + tr Σ E = M m=1 γ m h m where γ m = σ Amm + σ Cmm + σ Emm h m = M p=1 σ A pp + σ Cpp + σ Epp σ A mm σ A mm + σ Cmm + σ Emm The multi-dimensional trait heritability is a weighted average of the heritability of each dimension.

Multi-dimensional traits heritability model Properties Invariant to rotations of data Y = G + C + E (1) YT = GT + CT + ET () T T T = TT T = I h T = h heritability from model () heritability from model (1)

Consider covariates Sometimes, we want to study the effects after controlling some nuisance variables by regressing them out. E.g., age, gender, handness

Covariates (N q) Consider covariates Y = XB + G + C + E vec G ~N 0, Σ A K, vec C ~N 0, Σ C Λ, vec E ~N 0, Σ E I U: N N q Y = U T Y = U T G + U T C + U T E = G + C + E vec G ~N 0, Σ A U T KU, vec C ~N 0, Σ C U T ΛU, vec E ~N 0, Σ E I U T X = 0 U T U = I UU T = I X X T X 1 X T

Datasets: Analysis Genomics Superstruct Project (GSP; N = 130) unrelated subjects Human Connectome Project (HCP; N = 590) 7 monozygotic twin pairs 69 dizygotic twin pairs 53 full siblings of twins 55 singletons 1 brain structures Traits Volume Truncated LBS

Volume heritability (GSP data) Before multiple comparisons correction: 3/1 brain structures are significant After multiple comparisons correction: none is significant Most structures: parametric & nonparametric p values are similar => standard errors estimates are accurate

Volume heritability (GSP data) Test-retest reliability: Lin s concordance correlation coefficient correlation coefficient ρ c = ρσ x σ y σ x + σ y + μ x μ y variance mean x, y: use repeated runs on separate days of the same set of subjects

Truncated LBS heritability (GSP data) Before multiple comparisons correction: 7/1 brain structures are significant After multiple comparisons correction: 5/1 brain structures are significant Most structures: parametric & nonparametric p values are similar => standard errors estimates are accurate Smaller standard error than volume-based heritability

Truncated LBS heritability (GSP data) Test-retest reliability: Averaged Lin s concordance correlation coefficient across M dimensions correlation coefficient ρ c = ρσ x σ y σ x + σ y + μ x μ y variance mean x, y: use repeated runs on separate days of the same set of subjects

Truncated LBS heritability (GSP data)

Truncated LBS heritability (HCP data) Structure h Standard Error Accumbens area 0.309 0.16 Caudate 0.583 0.14 Cerebellum 0.653 0.10 Corpus Callosum 0.558 0.136 Hippocampus 0.363 0.190 Third Ventricle 0.536 0.134 Putamen 0.483 0.1 Only significant brain structures results are shown Consistently higher than GSP dataset Possible reason: in unrelated subjects only the variation of some common SNPs are captured.

Visualizing principal mode of shape variation PCA is a kind of rotation of data. The first PC of LBS explains a large percentage of shape variation. Heritability model: (1) invariant to rotation; () heritability of multi-dimensional trait = weighted average of each dimension s heritability The heritability of truncated LBS is the weighted average of the first M PCs heritability.

Visualizing principal mode of shape variation Procedures (for one brain structure) 1. Register each subject s mask (1 in structure, 0 out of structure) to a common used template.. Create a population average of structure surface for plotting A weighted average of all subjects registered mask image Weight: Gaussian kernel center: average of first PC distance: subject-specific corresponding first PC <-> center Width: resulting 500 shapes have non-0 weights The isosurface with 0.5 in the averaged map 3. Use the same Gaussian kernel, generate averaged maps by including the shapes around + standard deviation of the first PC (- s.d. as well) 4. Plot the difference between the two maps in step 3 on the surface generated in step.

Visualizing principal mode of shape variation Red: shapes around + s.d. are larger than - s.d. Blue: shapes around - s.d. are larger than + s.d.

Strengths Use truncated LBS instead of volume as features Capture more shape variation Isometry invariance Does not require any registration or mapping (Reuter 006 & 009) Generalize the concept of heritability into multidimensional phenotypes Other applications (multi-tests of one behavior; disease study)

Strengths Variability of heritability estimation Multi-dimensional trait heritability model < original GCTA model (unrelated subject dataset) Heritability estimates are more accurate, more significant Propose a visualization method for shape variation Interpretation: shape variation along the first PC axis of the shape descriptor

Weakness Optimal number of eigenvalue may not be 50 Only 30, 50, 70 are tested Error bars for difference number of eigenvalues are not shown Other number except 50 (used in paper) could lead to higher heritability and smaller error bars

Weakness Optimal number of eigenvalue can be different for different brain structures Amygdala: heritability is similar for 30, 50, 70 eigenvalues (even decrease) 3 rd -ventricle: heritability increases from 0.4 to 0.6

Weakness Links between proposed visualization method and LBS heritability are not clear. Only volume-based GCTA heritability is compared to the new method and new model. More comparisons with the literature (e.g., Gilmore 010; Baare 001)

Backup: invariant to rotations of data cov vec GT = cov T T I vec G = T T I vec G T I = T T I Σ A K T I = T T Σ A T K Theorem: vec AXB = B T A vec X Here A = I, X = G, B = T A B T = A T B T cov AX = Acov X A T A B C D = AC BD Similarly, cov vec CT = T T Σ C T Λ, cov vec ET = T T Σ E T I h T = tr T T Σ A T tr T T Σ A T + tr T T Σ C T + tr T T Σ E T tr ABC = tr BCA = tr CAB Associative property of matrix multiplication = tr Σ A TT T tr Σ A TT T + tr Σ C TT T + tr Σ E (TT T ) = tr Σ A tr Σ A + tr Σ C + tr Σ E = h

Backup: multi-dimensional trait heritability is a weighted average of heritability of each dimension h = = tr Σ A tr Σ A + Σ C + Σ E p=1 M m=1 M M σ A pp + p=1 σ A mm σ C pp + p=1 M σ E pp = M m=1 σ A mm + σ Cmm + σ Emm M p=1 σ A pp + σ Cpp + σ Epp σ A mm σ A mm + σ Cmm + σ Emm = M m=1 γ m h m

Backup: moment-matching estimator for unrelated subjects (no shared environmental component) cov y r, y s = σ A rs K + σ Ers I y ry s T = σ A rs K + σ Ers I To estimate σ A rs, σ Ers, use a regression model: vec y r y T s = σ A rs vec K + σ Ersvec I y s y r = σ A rs vec K + σ Ersvec I vec K T y s y r vec I T y s y r = σ A rs vec K T vec K + σ E rs vec K T vec I = σ A rs vec I T vec K + σ E rs vec I T vec I y s y r T vec K = σ A rs vec K T vec K + σ E rs vec I T vec K y s y r T vec I = σ A rs vec K T vec I + σ E rs vec I T vec I y s T y r T vec K = σ A rs vec K T vec K + σ E rs vec I T vec K y s T y r T vec I = σ A rs vec K T vec I + σ E rs vec I T vec I y r T Ky s = σ A rstr K + y r T y s = σ A rs tr K + σ Ers σ E rstr K tr I

Backup: moment-matching estimator for unrelated subjects (no shared environmental component) σ Ars σ = tr K tr K E rs tr K tr I 1 yr T Ky s y r T y s σ A rs = y r T NK tr K I y s Ntr K tr K y r T K τi y s ν K σ E rs = y r T tr K I tr K K y s Ntr K tr [K] = y r T κi τk y s ν K where τ = tr K N, κ = tr K N, ν K = tr K Σ A = YT K τi Y ν K, Σ E = YT κi τk Y ν K tr K N = N κ τ

Backup: sampling variance of the point estimator Q A K τi ν K, Q E κi τk ν K t A tr Σ A = tr Y T Q A Y, t E = tr Σ E = tr Y T Q E Y, t = t A te The heritability is a function of t: f t = var h SNP = var f t f t t cov t t A t A +t E f t t T where f t t = f t t Define V rs = cov y r, y s, f t t = t E t A +t E, = σ A rs K + σ Ers I t A t A +t E

Backup: sampling variance of the point estimator = = cov tr Y T Q α Y, tr Y T Q β Y M r,s=1 M r,s=1 cov t = cov y r T Q α y r, y s T Q β y s tr Q α V rs Q β V rs M tr Q A V rs Q A V rs tr Q A V rs Q E V rs r,s=1 tr Q E V rs Q A V rs tr Q E V rs Q E V rs M r,s=1 σ A rs + σ Ers = tr Σ A + Σ E 1 τ ν K τ κ tr Σ A + Σ E 1 1 ν K 1 1 Quadratic form of statistics: cov ε T Λ 1 ε, ε T Λ ε = tr Λ 1 ΣΛ Σ + 4μ T Λ 1 ΣΛ μ Here μ = 0 tr Q A tr Q A Q E tr Q E Q A tr Q E K I τ 1, κ 1 V rs = σ A rs K + σ Ers I σ A rs I + σ Ers I

Backup: sampling variance of the point estimator tr Q A = tr K τi ν K = tr K tr K tr K N I tr K N = tr K tr K N tr K KI + tr K tr K N N I tr Q A Q E = tr K tr K N = tr = tr K tr K N K τi tr K N + tr K N = 1 κi τk ν K ν K tr K tr K N tr K tr K N ν K = tr κki τk τki + τ IK ν K + tr3 K N = tr K N tr K tr K tr K + tr K N tr K tr = τ K N ν K

Backup: sampling variance of the point estimator tr Q E = = = tr κi τk κ tr K N κ tr K = tr κ I κτk + τ K ν K ν K N tr K N tr K N + tr K ν K ν K tr K tr K N var h SNP = var f t f t t N + tr K N cov t f t N tr K = κ ν K t T tr Σ A + Σ E 1 1 t ν K t A + t 4 E, t A E 1 1 tr K = tr Σ A + Σ E ν K tr Σ A + tr Σ = tr Σ A + Σ E E ν K tr Σ A + Σ = E ν K t E t = tr Σ A + Σ E t A ν K t A + t 4 A + t E E tr Σ P tr Σ P

Backup: sampling variance of the point estimator For univariate trait, tr Σ P For multi-dimensional trait, = tr Σ P, var h SNP = ν K tr Σ P tr Σ P = i=1 M λ i i=1 M λ 1 var h SNP i ν K