Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Similar documents
Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Partitioning the Genetic Variance

(Genome-wide) association analysis

Lecture WS Evolutionary Genetics Part I 1

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Linear Regression (1/1/17)

Partitioning the Genetic Variance. Partitioning the Genetic Variance

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Case-Control Association Testing. Case-Control Association Testing

The Quantitative TDT

Variance Components: Phenotypic, Environmental and Genetic

SNP Association Studies with Case-Parent Trios

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

I Have the Power in QTL linkage: single and multilocus analysis

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Evolution of phenotypic traits

Association studies and regression

Partitioning Genetic Variance

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Heritability estimation in modern genetics and connections to some new results for quadratic forms in statistics

An introduction to quantitative genetics

Lecture 11: Multiple trait models for QTL analysis

Quantitative Trait Variation

Lecture 2: Introduction to Quantitative Genetics

The concept of breeding value. Gene251/351 Lecture 5

Computational Systems Biology: Biology X

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Variation and its response to selection

Calculation of IBD probabilities

Lecture 2. Basic Population and Quantitative Genetics

Lecture 3. Introduction on Quantitative Genetics: I. Fisher s Variance Decomposition

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

HERITABILITY ESTIMATION USING A REGULARIZED REGRESSION APPROACH (HERRA)

Lecture 4: Allelic Effects and Genetic Variances. Bruce Walsh lecture notes Tucson Winter Institute 7-9 Jan 2013

Evolutionary Genetics Midterm 2008

Quantitative characters II: heritability

Biometrical Genetics

Calculation of IBD probabilities

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

The Mystery of Missing Heritability: Genetic interactions create phantom heritability - Supplementary Information

Resemblance among relatives

Quantitative Genetics

Overview. Background

Régression en grande dimension et épistasie par blocs pour les études d association

BTRY 4830/6830: Quantitative Genomics and Genetics Fall 2014

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

SNP-SNP Interactions in Case-Parent Trios

Lecture 28: BLUP and Genomic Selection. Bruce Walsh lecture notes Synbreed course version 11 July 2013

Genotype Imputation. Biostatistics 666

BTRY 7210: Topics in Quantitative Genomics and Genetics

Introduction to Linkage Disequilibrium

Lecture 2. Fisher s Variance Decomposition

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Breeding Values and Inbreeding. Breeding Values and Inbreeding

2. Map genetic distance between markers

Genetics and Natural Selection

heritable diversity feb ! gene 8840 biol 8990

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Lecture 9. QTL Mapping 2: Outbred Populations

Causal inference in biomedical sciences: causal models involving genotypes. Mendelian randomization genes as Instrumental Variables

INTRODUCTION TO ANIMAL BREEDING. Lecture Nr 2. Genetics of quantitative (multifactorial) traits What is known about such traits How they are modeled

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

On the limiting distribution of the likelihood ratio test in nucleotide mapping of complex disease

1. they are influenced by many genetic loci. 2. they exhibit variation due to both genetic and environmental effects.

Methods for Cryptic Structure. Methods for Cryptic Structure

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Objectives. Announcements. Comparison of mitosis and meiosis

Legend: S spotted Genotypes: P1 SS & ss F1 Ss ss plain F2 (with ratio) 1SS :2 WSs: 1ss. Legend W white White bull 1 Ww red cows ww ww red

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Quantitative Traits Modes of Selection

Lecture 6: Selection on Multiple Traits

Variance Component Models for Quantitative Traits. Biostatistics 666

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

Biometrical Genetics. Lindon Eaves, VIPBG Richmond. Boulder CO, 2012

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

Linkage and Linkage Disequilibrium

Power and sample size calculations for designing rare variant sequencing association studies.

Denisova cave. h,ps://

Solutions to Problem Set 4

Population Genetics I. Bio

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Evolution of quantitative traits

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

Theoretical and computational aspects of association tests: application in case-control genome-wide association studies.

Chapter 2: Extensions to Mendel: Complexities in Relating Genotype to Phenotype.

Computational Systems Biology: Biology X

Bayesian Inference of Interactions and Associations

What kind of cell does it occur in? Produces diploid or haploid cells? How many cell divisions? Identical cells or different cells?

BIOL Evolution. Lecture 9

Introduction to Statistical Genetics (BST227) Lecture 6: Population Substructure in Association Studies

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Dropping Your Genes. A Simulation of Meiosis and Fertilization and An Introduction to Probability

BTRY 4830/6830: Quantitative Genomics and Genetics

Statistical issues in QTL mapping in mice

Transcription:

Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29

Introduction to Quantitative Trait Mapping In the previous session, we gave an overview of association testing methods when the trait of interest is binary (e.g. 1/0, affected/unaffected, dead/alive), Phenotypes of interest are often quantitative, and in this session we focus on the topic of genetic association testing with quantitative traits. The field of quantitative genetics is the study of the inheritance of continuously measured traits and their mechanisms. Vast amounts of literature on this topic! 2 / 29

Introduction to Quantitative Trait Mapping Quantitative trait loci (QTL) mapping involves identifying genetic loci that influence the phenotypic variation of a quantitative trait. QTL mapping is commonly conducted with GWAS using common variants, such as variants with minor allele frequencies 1% 5% There generally is no simple Mendelian basis for variation of quantitative traits Some quantitative traits can be largely influenced by a single gene as well as by environmental factors 3 / 29

Introduction to Quantitative Trait Mapping Influences on a quantitative trait can also be due to a number of genes with similar (or differing) effects Many quantitative traits of interest are complex where phenotypic variation is due to a combination of both multiple genes and environmental factors Examples: Blood pressure, cholesterol levels, IQ, height, weight, etc. 4 / 29

Quantitative Genetic Model The classical quantitative genetics model introduced by Ronald Fisher (1918) is Y = G + E, where Y is the phenotypic value, G is the genetic value, and E is the environmental deviation. G is the combination of all genetic loci that influence the phenotypic value and E consists of all non-genetic factors that influence the phenotype The mean environmental deviation E is generally taken to be 0 so that the mean genotypic value is equal to the mean phenotypic value, i.e., E(Y ) = E(G) 5 / 29

Quantitative Genetic Model Consider a single locus. Fisher modeled the genotypic value G with a linear regression model (least squares) where the genotypic value can be partitioned into an additive component (A) and deviations from additivity as a result of dominance (D), where G = A + D 6 / 29

Linear Regression Model for Genetic Values Falconer model for single biallelic QTL a d m -a bb Bb BB Var (X) = Regression Variance + Residual Variance = Additive Variance + Dominance Variance 15 7 / 29

Components of Genetic Variance From the properties of least squares, the residuals are orthogonal to the fitted values, and thus Cov(A, D) = 0. So we have that or σ 2 A Var(G) = Var(A) + Var(D) σ 2 G = σ2 A + σ2 D is the additive genetic variance. It is the genetic variance associated with the average additive effects of alleles is the dominance genetic variance. It is the genetic variance associated with the dominance effects. σ 2 D 8 / 29

Heritability The heritability of a trait is written in terms of the components of variances of the trait. Remember that Y = G + E = A + D + E The following ratio of variance components h 2 = σ2 A σ 2 Y is defined to be the narrow-sense heritability (or simply heritability) h 2 is the proportion of the total phenotypic variance that is due to additive effects. Heritability can also be viewed as the extent to which phenotypes are determined by the alleles transmitted from the parents. 9 / 29

Heritability The broad-sense heritability is defined to be H 2 = σ2 G σ 2 Y H 2 is the proportion of the total phenotypic variance that is due to all genetic effects (additive and dominance) There are a number of methods for heritability estimation of a trait. 10 / 29

QTL Mapping For traits that are heritable, i.e., traits with a non-negligible genetic component that contributes to phenotypic variability, identifying (or mapping) QLT that influence the trait is often of interest. Linear regression models are commonly used for QTL mapping Linear regression models will often include a single genetic marker (e.g., a SNP) as predictor in the model, in addition to other relevant covariates (such as age, sex, etc.), with the quantitative phenotype as the response 11 / 29

Linear regression with SNPs Many analyses fit the additive model y = β 0 + β #minor alleles AA Aa aa cholesterol β β 0 1 2 12 / 29

Linear regression, with SNPs An alternative is the dominant model ; y = β 0 + β (G AA) AA Aa aa cholesterol β 0 1 1 13 / 29

Linear regression, with SNPs or the recessive model ; y = β 0 + β (G == aa) AA Aa aa cholesterol β 0 0 1 14 / 29

Linear regression, with SNPs Finally, the two degrees of freedom model ; y = β 0 + β Aa (G == Aa) + β aa (G == aa) AA Aa aa cholesterol 0 1 0 0 0 1 β Aa β aa 15 / 29

Additive Genetic Model Most GWAS perform single SNP association testing with linear regression assuming an additive model. 16 / 29

Additive Genetic Model The additive linear regression model also has a nice interpretation, as we saw from Fisher s classical quantitative trait model! The coefficient of determination (r 2 ) of an additive linear regression model gives an estimate of the proportion of phenotypic variation that is explained by the SNP (or SNPs) in the model, e.g., the SNP heritability 17 / 29

Additive Genetic Model Consider the following additive model for association testing with a quantitative trait and a SNP with alleles A and a: Y = β 0 + β 1 X + ɛ where X is the number of copies of the reference allele A. What would your interpretation of ɛ be for this particular model? 18 / 29

Association Testing with Additive Model Y = β 0 + β 1 X + ɛ Two test statistics for H 0 : β 1 = 0 versus H a : β 1 0 T = ˆβ 1 t N 2 N(0, 1) for large N var( ˆβ 1 ) where T 2 = ˆβ 2 1 var( ˆβ 1 ) F 1,N 2 χ 2 1 for large N var( ˆβ 1 ) = σ2 ɛ S XX and S XX is the corrected sum of squares for the X i s 19 / 29

Statistical Power for Detecting QTL Y = β 0 + β 1 X + ɛ We can also calculate the power for detecting a QTL for a given effect size β 1 for a SNP. For simplicity, assume that Y has been a standardized so that with σ 2 Y = 1. Let p be the frequency of the A allele in the population σ 2 Y = β2 1σ 2 X + σ2 ɛ = 2p(1 p)β 2 1 + σ 2 ɛ Let h 2 s = 2p(1 p)β 2 1, so we have σ2 Y = h2 s + σ 2 ɛ Interpret h 2 s (note that we assume that trait is standardized such that σ 2 Y = 1) 20 / 29

Statistical Power for Detecting QTL Also note that σ 2 ɛ = 1 h 2 s, so we can write Var( ˆβ 1 ) as the following: var( ˆβ 1 ) = σ2 ɛ S XX σ 2 ɛ N (2p(1 p)) = 1 h2 s 2Np(1 p) To calculate power of the test statistic T 2 for a given sample size N, we need to first obtain the expected value of the non-centrality parameter λ of the chi-squared (χ 2 ) distribution which is the expected value of the test statistic T squared: since h 2 s = 2p(1 p)β 2 1 λ = [E(T )] 2 β2 1 var( ˆβ 1 ) = Nh2 s 1 h 2 s 21 / 29

Required Sample Size for Power Can also obtain the required sample size given type-i error α and power 1 β, where the type II error is β : N = 1 h2 s h 2 s ( z(1 α/2) + z (1 β ) ) 2 where z (1 α/2) and z (1 β) are the (1 α/2)th and (1 β)th quantiles, respectively, for the standard normal distribution. 22 / 29

Statistical Power for Detecting QTL 23 / 29

Genetic Power Calculator (PGC) http://pngu.mgh.harvard.edu/~purcell/gpc/ 23 24 / 29

Missing Heritability Disease Number of loci Percent of Heritability Measure Explained Heritability Measure Age-related macular degeneration 5 50% Sibling recurrence risk Crohn s disease 32 20% Genetic risk (liability) Systemic lupus erythematosus 6 15% Sibling recurrence risk Type 2 diabetes 18 6% Sibling recurrence risk HDL cholesterol 7 5.2% Phenotypic variance Height 40 5% Phenotypic variance Early onset myocardial 9 2.8% Phenotypic infarction variance Fasting glucose 4 1.5% Phenotypic variance!"#$%='(a1 D&&06-%1.T01%+(0%-F8.6+??F%1:+?? >.10+10C%<a%mI4I%-'%mI4d h*+,-.-+-./0%-(+.-1c%n%/+(%0e8?+.,0;% ooin 25 / 29

!0,0-.6%2'=0(%O+?6*?+-'(%H$7+*,%2*(60??M 7--8CVV8,B*4:B747+(/+(;40;*Vm8*(60??VB86V 26 / 29

LD Mapping of QTL For GWAS, the QTL generally will not be genotyped in a study #$%&'()*)+,$-$./$,0**********************#$%&'()*1$2)+,$-$./$,0 Q 1 M 1 Q 1 M 1 Q 2 M 1 Q 1 M 1 Q 2 M 2 Q 1 M 1 Q 2 M 2 Q 2 M 1 Q 2 M 2 Q 2 M 2 Q 1 M 2 Q 2 M 2 Q 2 M 2 Q 1 M 2 Q 1 M 1 Q 1 M 1 27 / 29

LD Mapping of QTL Linkage disequilibrium around an ancestral mutation 28 / 29

LD Mapping of QTL r 2 = LD correlation between QTL and genotyped SNP Proportion of variance of the trait explained at a SNP r 2 h 2 s Required sample size for detection is N 1 r 2 h 2 s r 2 h 2 s ( z(1 α/2) + z (1 β) ) 2 Power of LD mapping depends on the experimental sample size, variance explained by the causal variant and LD with a genotyped SNP 29 / 29