I Have the Power in QTL linkage: single and multilocus analysis

Similar documents
Analytic power calculation for QTL linkage analysis of small pedigrees

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Lecture 9. QTL Mapping 2: Outbred Populations

Powerful Regression-Based Quantitative-Trait Linkage Analysis of General Pedigrees

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Lecture 11: Multiple trait models for QTL analysis

Affected Sibling Pairs. Biostatistics 666

2. Map genetic distance between markers

Case-Control Association Testing. Case-Control Association Testing

(Genome-wide) association analysis

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

QTL mapping under ascertainment

Univariate Linkage in Mx. Boulder, TC 18, March 2005 Posthuma, Maes, Neale

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Lecture 6. QTL Mapping

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

The Quantitative TDT

Linkage and Linkage Disequilibrium

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

QTL Mapping I: Overview and using Inbred Lines

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Multiple QTL mapping

Calculation of IBD probabilities

Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees

Quantitative characters II: heritability

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Evolution of phenotypic traits

Calculation of IBD probabilities

Overview. Background

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

Methods for QTL analysis

Combining dependent tests for linkage or association across multiple phenotypic traits

BTRY 4830/6830: Quantitative Genomics and Genetics

Partitioning Genetic Variance

STAT 536: Genetic Statistics

Variance Component Models for Quantitative Traits. Biostatistics 666

Lecture WS Evolutionary Genetics Part I 1

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

SOLUTIONS TO EXERCISES FOR CHAPTER 9

Nature Genetics: doi: /ng Supplementary Figure 1. Number of cases and proxy cases required to detect association at designs.

DNA polymorphisms such as SNP and familial effects (additive genetic, common environment) to

Asymptotic properties of the likelihood ratio test statistics with the possible triangle constraint in Affected-Sib-Pair analysis

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

Resemblance between relatives

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

QTL model selection: key players

Genotype Imputation. Biostatistics 666

Statistical issues in QTL mapping in mice

Correction for Ascertainment

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Introduction to QTL mapping in model organisms

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Power and Sample Size + Principles of Simulation. Benjamin Neale March 4 th, 2010 International Twin Workshop, Boulder, CO

Introduction to QTL mapping in model organisms

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Quantitative characters - exercises

Goodness of Fit Goodness of fit - 2 classes

3. Properties of the relationship matrix

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Introduction to QTL mapping in model organisms

Lecture 2. Basic Population and Quantitative Genetics

The concept of breeding value. Gene251/351 Lecture 5

Multipoint Quantitative-Trait Linkage Analysis in General Pedigrees

Model Selection for Multiple QTL

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

The Admixture Model in Linkage Analysis

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Topic 21 Goodness of Fit

Evolutionary quantitative genetics and one-locus population genetics

Gamma frailty model for linkage analysis. with application to interval censored migraine data

Methods for Cryptic Structure. Methods for Cryptic Structure

Hypothesis Testing for Var-Cov Components

Association studies and regression

TOPICS IN STATISTICAL METHODS FOR HUMAN GENE MAPPING

EXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series:

10. How many chromosomes are in human gametes (reproductive cells)? 23

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics

Evolutionary Genetics Midterm 2008

Introduction to QTL mapping in model organisms

Mapping quantitative trait loci in oligogenic models

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Linear Regression (1/1/17)

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

Lecture 3. Introduction on Quantitative Genetics: I. Fisher s Variance Decomposition

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Computational Systems Biology: Biology X

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

AFFECTED RELATIVE PAIR LINKAGE STATISTICS THAT MODEL RELATIONSHIP UNCERTAINTY

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Power and Design Considerations for a General Class of Family-Based Association Tests: Quantitative Traits

Transcription:

I Have the Power in QTL linkage: single and multilocus analysis Benjamin Neale 1, Sir Shaun Purcell 2 & Pak Sham 13 1 SGDP, IoP, London, UK 2 Harvard School of Public Health, Cambridge, MA, USA 3 Department of Psychiatry, Hong Kong University, HK

Overview 1) Brief power primer Practical 1 : Using GPC for elementary power calculations 2) Calculating power for QTL linkage analysis Practical 2 : Using GPC for linkage power calculations 3) Structure of Mx power script

What will be discussed What is power? (refresher) Why and when to do power? What affects power in linkage analysis? How do we calculate power for QTL linkage analysis Practical 1 : Using GPC for linkage power calculations The adequacy of additive single locus analysis Practical 2 : Using Mx for linkage power calculations

Needed for power calculations Test statistic Distribution of test statistic under H 0 to set significance threshold Distribution of test statistic under H a to calculate probability of exceeding significance threshold

P(T) Sampling distribution if H 0 were true Standard Case alpha 0.05 Sampling distribution if H a were true POWER = 1 - β β α T Effect Size, Sample Size (NCP)

Type-I & Type-II error probabilities Null hypothesis True Null hypothesis False Accept H 0 Reject H 0 1-α α (type-i error) (false positive) β (type-ii error) (false negative) 1-β (power)

Rejection of H 0 Nonrejection of H 0 H 0 true Type I error at rate α Nonsignificant result H A true Significant result Type II error at rate β POWER =(1- β)

P(T) Sampling distribution if H 0 were true Standard Case alpha 0.05 Sampling distribution if H a were true POWER = 1 - β β α T Effect Size, Sample Size (NCP)

Impact of effect size, N P(T) Critical value β α T

Impact of α P(T) Critical value β α T

χ 2 distributions 1 df 2 df 3 df 6 df http://www2.ipcku.kansai-u.ac.jp/~aki/pdf/chi21.htm

Noncentral χ 2 Null χ 2 has µ=df and σ 2 =2df Noncentral χ 2 has µ=df + λ and σ 2 =2df + 4 λ Where df are degrees of freedom and λ is the noncentrality parameter

Noncentral χ 2 3 degrees of freedom λ=1 λ=4 λ=9 λ=16 http://www2.ipcku.kansai-u.ac.jp/~aki/pdf/chi21.htm

Short practical on GPC Genetic Power Calculator is an online resource for carrying out basic power calculations For our 1 st example we will use the probability function calculator to play with power http://ibgwww.colorado.edu/~pshaun/gpc/

Parameters in probability function calculator Click on the link to probability function calculator 4 main terms: X: critical value of the chi-square P(X>x): Power df: degrees of freedom NCP: non-centrality parameter

Exercises 1) Find the power when NCP=5, degrees of freedom=1, and the critical X is 3.84 2) Find the NCP for power of.8, degrees of freedom=1 and critical X is 13.8

Answers 1) Power=0.608922, when NCP=5, degrees of freedom=1, and the critical X is 3.84 2) NCP=20.7613 when power of.8, degrees of freedom=1 and critical X is 13.8

2) Power for QTL linkage For chi-squared tests on large samples, power is determined by non-centrality parameter (λ) and degrees of freedom (df) λ = E(2lnL A -2lnL 0 ) = E(2lnL A ) - E(2lnL 0 ) where expectations are taken at asymptotic values of maximum likelihood estimates (MLE) under an assumed true model

Linkage test 2ln L = ln Σ x Σ 1 x H A [ Σ ] L ij = V A πˆ V + A V + D zv ˆ + V D S + + V V S N for i=j for i j H 0 [ Σ ] N ij = V V 2 A A + + V V 4 D D + V + V S S + V N for i=j for i j

Expected NCP λ = ln Linkage test m Σ 0 P i ln i = 1 For sib-pairs under complete marker information λ λ L [ ] 1 1 1 ln Σ + ln Σ + = Σ 0 4 π = 0 2 π = 1 4 ln Σ π ln = Determinant of 2-by-2 standardised covariance matrix = 1 - r 2 = 1 2 2 2 ) ) ) ln(1 2 0 1 2 + rs ln(1 r 4 1 ln(1 r 2 Σ 1 ln(1 r 4 i Note: standardised trait See Sham et al (2000) AJHG, 66. for further details 2 )

Concrete example λ L 200 sibling pairs; sibling correlation 0.5. To calculate NCP if QTL explained 10% variance: 1 2 1 2 1 2 2 = ln(1 r0 ) ln(1 r1 ) ln(1 r2 ) + ln(1 rs ) 4 2 4 1 2 1 2 1 2 = ln(1 0.45 ) ln(1 0.5 ) ln(1 0.55 ) + ln(1 4 2 4 = 0.0565 + 0.1438 + 0.0900 0.2877 = 0.002791 200 0.002791 = 0.5581 0.5 2 )

Approximation of NCP NCP s( s 1) 2 (1 + r (1 r 2 ) ) 2 2 Var( r π ) s( s 1) 2 (1 + r (1 r 2 ) ) 2 2 [ ( ) ( ) (, )] 2 2 V Var π + V Var z + V V Cov π z A D A D NCP per sibship is proportional to - the # of pairs in the sibship (large sibships are powerful) - the square of the additive QTL variance (decreases rapidly for QTL of v. small effect) - the sibling correlation (structure of residual variance is important)

Using GPC Comparison to Haseman-Elston regression linkage Amos & Elston (1989) H-E regression - 90% power (at significant level 0.05) - QTL variance 0.5 - marker & major gene completely linked (θ = 0) 320 sib pairs -if θ = 0.1 778 sib pairs

GPC input parameters Proportions of variance additive QTL variance dominance QTL variance residual variance (shared / nonshared) Recombination fraction ( 0-0.5 ) Sample size & Sibship size ( 2-8 ) Type I error rate Type II error rate

GPC output parameters Expected sibling correlations - by IBD status at the QTL - by IBD status at the marker Expected NCP per sibship Power Sample size - at different levels of alpha given sample size - for specified power at different levels of alpha given power

GPC http://ibgwww.colorado.edu/~pshaun/gpc/

Practical 2 Using GPC, what is the effect on power to detect linkage of : 1. QTL variance? 2. residual sibling correlation? 3. marker QTL recombination fraction?

GPC Input

GPC output

Practical 2 1) One good way of understanding power is to start with a basic case and then change relevant factors in both directions one at a time 2) Let s begin with a basic case of: 1) Additive QTL.15 2) No dominance (check the box) 3) Residual shared variance.35 4) Residual nonshared environment.5 5) Recombination fraction.1 6) Sample size 200 7) Sibship size 2 8) User-defined Type I error rate.0001 9) User-defined power.8

GPC What happens when you vary: 1. QTL variance 2. Dominance vs. additive QTL variance 3. Residual sibling shared variance 4. Recombination fraction 5. Sibship sizes

300000 250000 200000 Pairs required (θ=0, p=0.05, power=0.8) N 150000 100000 50000 0 0 0.1 0.2 0.3 0.4 0.5 QTL variance

1000000 100000 Pairs required (θ=0, p=0.05, power=0.8) log N 10000 1000 100 10 1 0 0.1 0.2 0.3 0.4 0.5 QTL variance

Effect of residual correlation QTL additive effects account for 10% trait variance Sample size required for 80% power (α=0.05) No dominance θ = 0.1 A residual correlation 0.35 B residual correlation 0.50 C residual correlation 0.65

Individuals required 25000 20000 15000 10000 r=0.35 r=0.5 r=0.65 5000 0 Pairs Trios Quads Quints

Effect of incomplete linkage Attenuation in NCP 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 Recombination fraction

Effect of incomplete linkage Attenuation in NCP 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 Map distance cm

Some factors influencing power 1. QTL variance 2. Sib correlation 3. Sibship size 4. Marker informativeness & density 5. Phenotypic selection

Marker informativeness: Markers should be highly polymorphic - alleles inherited from different sources are likely to be distinguishable Heterozygosity (H) Polymorphism Information Content (PIC) - measure number and frequency of alleles at a locus

Polymorphism Information Content IF a parent is heterozygous, their gametes will usually be informative. BUT if both parents & child are heterozygous for the same genotype, origins of child s alleles are ambiguous IF C = the probability of this occurring, PIC = H C = 1 n i= 1 p 2 i n n i= 1 j= i+ 1 2 p 2 i p 2 j

Singlepoint θ 1 Marker 1 Trait locus Multipoint T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T 10 T 11 T 12 T 13 T 14 T 15 T 16 T 17 T 18 T 19 T 20 Marker 1 Trait locus Marker 2

Multipoint PIC: 10 cm map 100 90 80 Multipoint PIC (%) 70 60 50 40 10 cm 30 20-10 10 30 50 70 90 110 Position of Trait Locus (cm)

Multipoint PIC: 5 cm map 100 90 80 Multipoint PIC (%) 70 60 50 40 30 5 cm 10 cm 20-10 10 30 50 70 90 110 Position of Trait Locus (cm)

The Singlepoint Information Content of the markers: Locus 1 PIC = 0.375 Locus 2 PIC = 0.375 Locus 3 PIC = 0.375 The Multipoint Information Content of the markers: Pos MPIC -10 22.9946-9 24.9097-8 26.9843-7 29.2319-6 31.6665-5 34.304-4 37.1609-3 40.256-2 43.6087-1 47.2408 0 51.1754 1 49.6898 meaninf 50.2027 MPIC 100 80 60 40 20 0-15 -10-5 0 5 10 15 20 25 30 35 cm

Selective genotyping Unselected Proband Selection EDAC Maximally Dissimilar ASP Extreme Discordant EDAC Mahalanobis Distance

E(-2LL) Sib 1 Sib 2 Sib 3 0.00121621 1.00 1.00 0.14137692-2.00 2.00 0.00957190 2.00 1.80 2.20 0.00005954-0.50 0.50

Sibship informativeness : sib pairs Sibship NCP 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0-4 -3-2 -1 0 Sib 1 trait 1 2 3 4-2 -3-4 2 3 4-1 0 1 Sib 2 trait

Impact of selection

QTL power using Mx Power can be calculated theoretically or empirically We have shown the theoretical power calculations from Sham et al. 2000 Empirical calculations can be computed in Mx or from simulated data Most of us are too busy (short IQ pts.) to figure out the theoretical power calculation so empirical is useful

Mx power script 1) Download the script powerfeq.mx 2) I ll open it and walk through precisely what Mx is doing 3) Briefly, Mx requires that you set up the model under the true model, using algebra generating the variance covariance matrices 4) Refit the model from the variance covariance models fixing the parameter you wish to test to 0. 5) At end of script include the option power= α, df

Same again with raw data Mx can now estimate the power distribution from raw data. The change in likelihood is taken to be the NCP and this governs the power. Download realfeqpower.mx and we will use the lipidall.dat data from Danielle s session. I ve highlighted position 79 the maximum.

Summary The power of linkage analysis is related to: 1. QTL variance 2. Sib correlation 3. Sibship size 4. Marker informativeness & density 5. Phenotypic selection

If we have time slide We ll move on to 2 locus models

3) Single additive locus model locus A shows an association with the trait locus B appears unrelated AA Aa aa BB Bb bb Locus A Locus B

Joint analysis locus B modifies the effects of locus A: epistasis AA Aa aa BB Bb bb

Partitioning of effects Locus A M P Locus B M P

4 main effects M P Additive effects M P

6 twoway interactions M M P P Dominance

6 twoway interactions M M P P Additive-additive epistasis M P M P

4 threeway interactions M P M M M P M P P Additivedominance epistasis P M P

1 fourway interaction M P M P Dominancedominance epistasis

One locus Genotypic means 0 AA Aa m + a m + d -a d +a aa m - a

Two loci BB Bb m m AA Aa aa + a A + a A m + a B + aa + d A + a a B + da A + a B m + d B + ad + d A + d B + dd a A + d B m m aa ad bb m + a A m a B aa + d A a B da a A a B + aa m

IBD locus 1 2 Expected Sib Correlation 0 0 σ 2 S 0 1 σ 2 A/2 + σ 2 S 0 2 σ 2 A + σ 2 D + σ 2 S 1 0 σ 2 A/2 + σ 2 S 1 1 σ 2 A/2 + σ 2 A/2 + σ 2 AA/4 + σ 2 S 1 2 σ 2 A/2 + σ 2 A + σ 2 D + σ 2 AA/2 + σ 2 AD/2 + σ 2 S 2 0 σ 2 A + σ 2 D + σ 2 S 2 1 σ 2 A + σ 2 D + σ 2 A/2 + σ 2 AA/2 + σ 2 DA/2 + σ 2 S 2 2 σ 2 A + σ 2 D + σ 2 A + σ 2 D+ σ 2 AA + σ 2 AD + σ 2 DA + σ 2 DD + σ 2 S

Estimating power for QTL models Using Mx to calculate power i. Calculate expected covariance matrices under the full model ii. Fit model to data with value of interest fixed to null value i.true model ii. Submodel Q 0 S N S N -2LL 0.000 =NCP

Model misspecification Using the domqtl.mx script i.true ii. Full iii. Null Q A Q A 0 Q D 0 0 S S S N N N -2LL 0.000 T 1 T 2 Test: dominance only T 1 additive & dominance T 2 additive only T 2 -T 1

Results Using the domqtl.mx script i.true ii. Full iii. Null Q A 0.1 0.217 0 Q D 0.1 0 0 S 0.4 0.367 0.475 N 0.4 0.417 0.525-2LL 0.000 1.269 12.549 Test: dominance only (1df) 1.269 additive & dominance (2df) 12.549 additive only (1df) 12.549-1.269 = 11.28

Expected variances, covariances i.true ii. Full iii. Null Var 1.00 1.0005 1.0000 Cov(IBD=0) 0.40 0.3667 0.4750 Cov(IBD=1) 0.45 0.4753 0.4750 Cov(IBD=2) 0.60 0.5839 0.4750

Potential importance of epistasis a gene s effect might only be detected within a framework that accommodates epistasis Locus A A 1 A 1 A 1 A 2 A 2 A 2 Marginal Freq. 0.25 0.50 0.25 B 1 B 1 0.25 0 0 1 0.25 Locus B B 1 B 2 0.50 0 0.5 0 0.25 B 2 B 2 0.25 1 0 0 0.25 Marginal 0.25 0.25 0.25

Full V A1 V D1 V A2 V D2 V AA V AD V DA V DD -DD V* A1 V* D1 V* A2 V* D2 V* AA V* AD V* DA - -AD V* A1 V* D1 V* A2 V* D2 V* AA - - - -AA V* A1 V* D1 V* A2 V* D2 - - - - -D V* A1 - V* A2 - - - - - -A V* A1 - - - - - - - H 0 - - - - - - - - V S and V N estimated in all models

True model VC Means matrix 0 0 0 0 0 0 0 1 1

NCP for test of linkage NCP1 Full model NCP2 Non-epistatic model

Apparent VC under non-epistatic model Means matrix 0 0 0 0 0 0 0 1 1

Summary Linkage has low power to detect QTL of small effect Using selected and/or larger sibships increases power Single locus additive analysis is usually acceptable

GPC: two-locus linkage Using the module, for unlinked loci A and B with Means : Frequencies: 0 0 1 p A = p B = 0.5 0 0.5 0 1 0 0 Power of the full model to detect linkage? Power to detect epistasis? Power of the single additive locus model? (1000 pairs, 20% joint QTL effect, VS=VN)