The Quantitative TDT

Similar documents
1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

2. Map genetic distance between markers

SNP Association Studies with Case-Parent Trios

Lecture 9. QTL Mapping 2: Outbred Populations

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Calculation of IBD probabilities

Population Genetics. with implications for Linkage Disequilibrium. Chiara Sabatti, Human Genetics 6357a Gonda

Power and Design Considerations for a General Class of Family-Based Association Tests: Quantitative Traits

Case-Control Association Testing. Case-Control Association Testing

SNP-SNP Interactions in Case-Parent Trios

When one gene is wild type and the other mutant:

Calculation of IBD probabilities

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

TOPICS IN STATISTICAL METHODS FOR HUMAN GENE MAPPING

Affected Sibling Pairs. Biostatistics 666

Genetics (patterns of inheritance)

Lesson 4: Understanding Genetics

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Natural Selection. Population Dynamics. The Origins of Genetic Variation. The Origins of Genetic Variation. Intergenerational Mutation Rate

Lecture 1: Case-Control Association Testing. Summer Institute in Statistical Genetics 2015

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

Solutions to Problem Set 4

Lecture WS Evolutionary Genetics Part I 1

Guided Notes Unit 6: Classical Genetics

EXERCISES FOR CHAPTER 7. Exercise 7.1. Derive the two scales of relation for each of the two following recurrent series:

F1 Parent Cell R R. Name Period. Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

Linkage and Linkage Disequilibrium

I Have the Power in QTL linkage: single and multilocus analysis

Patterns of inheritance

Breeding Values and Inbreeding. Breeding Values and Inbreeding

Lecture 2. Basic Population and Quantitative Genetics

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Evolutionary quantitative genetics and one-locus population genetics

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Friday Harbor From Genetics to GWAS (Genome-wide Association Study) Sept David Fardo

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

SNP Association Studies with Case-Parent Trios

Lecture 6: Introduction to Quantitative genetics. Bruce Walsh lecture notes Liege May 2011 course version 25 May 2011

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

(Genome-wide) association analysis

Life Cycles, Meiosis and Genetic Variability24/02/2015 2:26 PM

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Major Genes, Polygenes, and

Unit 6 Reading Guide: PART I Biology Part I Due: Monday/Tuesday, February 5 th /6 th

Introduction to Genetics

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

A General and Accurate Approach for Computing the Statistical Power of the Transmission Disequilibrium Test for Complex Disease Genes

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Linear Regression (1/1/17)

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

Concept 15.1 Mendelian inheritance has its physical basis in the behavior of chromosomes

REVISION: GENETICS & EVOLUTION 20 MARCH 2013

Preparing Case-Parent Trio Data and Detecting Disease-Associated SNPs, SNP Interactions, and Gene-Environment Interactions with trio

Lecture 9. Short-Term Selection Response: Breeder s equation. Bruce Walsh lecture notes Synbreed course version 3 July 2013

Introduction to Linkage Disequilibrium

genome a specific characteristic that varies from one individual to another gene the passing of traits from one generation to the next

Prediction of the Confidence Interval of Quantitative Trait Loci Location

3/4/2015. Review. Phenotype

BIOLOGY LTF DIAGNOSTIC TEST MEIOSIS & MENDELIAN GENETICS

7.014 Problem Set 6 Solutions

Outline for today s lecture (Ch. 14, Part I)

Introduction to Genetics

2014 Pearson Education, Inc.

On coding genotypes for genetic markers with multiple alleles in genetic association study of quantitative traits

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

Objectives. Announcements. Comparison of mitosis and meiosis

Family resemblance can be striking!

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Variance Components: Phenotypic, Environmental and Genetic

Tilburg University. Published in: American Journal of Human Genetics. Document version: Peer reviewed version. Publication date: 2000

Evolution of phenotypic traits

STAT 536: Genetic Statistics

The universal validity of the possible triangle constraint for Affected-Sib-Pairs

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Labs 7 and 8: Mitosis, Meiosis, Gametes and Genetics

Introduction to Genetics

Lecture 1 Hardy-Weinberg equilibrium and key forces affecting gene frequency

Constructing a Pedigree

Introduc)on to Gene)cs How to Analyze Your Own Genome Fall 2013

Keys to Success on the Quarter 3 EXAM

Lecture 6. QTL Mapping

Introduction to Quantitative Genetics. Introduction to Quantitative Genetics

Overview. Background

Ch 11.Introduction to Genetics.Biology.Landis

1 Errors in mitosis and meiosis can result in chromosomal abnormalities.

Lecture 11: Multiple trait models for QTL analysis

Phasing via the Expectation Maximization (EM) Algorithm

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Unit 3 - Molecular Biology & Genetics - Review Packet

Stationary Distribution of the Linkage Disequilibrium Coefficient r 2

Inbreeding depression due to stabilizing selection on a quantitative character. Emmanuelle Porcher & Russell Lande

Lecture 2: Introduction to Quantitative Genetics

7.014 Problem Set 6. Question 1. MIT Department of Biology Introductory Biology, Spring 2004

Optimal Allele-Sharing Statistics for Genetic Mapping Using Affected Relatives

The phenotype of this worm is wild type. When both genes are mutant: The phenotype of this worm is double mutant Dpy and Unc phenotype.

Transcription:

The Quantitative TDT (Quantitative Transmission Disequilibrium Test) Warren J. Ewens NUS, Singapore 10 June, 2009

The initial aim of the (QUALITATIVE) TDT was to test for linkage between a marker locus M and a disease locus D. The aim of a QTDT is also to test for linkage between a marker locus M (with alleles M 1 and M 2 ) and a loci involved with a quantitative character. Notation for the qualitative TDT

D M θ D 1 M 1 D 2 M 2

θ is the recombination fraction between disease and marker loci. The null hypothesis (for linkage): θ = ½ (Disease and marker loci unlinked. i.e. on different chromosomes) The alternative hypothesis: θ < ½ (Disease and marker loci linked, i.e (usually) close on the same chromosome) So if we reject the null hypothesis, we get information on the location of the disease locus.

δ = Coefficient of linkage disequilibrium (LD) = population frequency of D 1 M 1 - population frequency of D 1 x population frequency of M 1 This is better called the coefficient of association. It is a purely statistical concept. Linkage is a genetic concept. However, tests of linkage are often conducted by tests of association. Why?????

The TDT (transmission-disequilibrium test) of linkage is a family-based test. It thus avoids the problems of population stratification that can arise with population-based association (two-by-two table) tests, which are tests of the null hypothesis δ = 0. δ = Coefficient of linkage disequilibrium (LD) = population frequency of D 1 M 1 - population frequency of D 1 x population frequency of M 1 This is better called the coefficient of association. It is a purely statistical concept. Linkage is a genetic concept. Tests of linkage are often conducted by testing whether δ = 0. Why?????

The basic unit is the family trio of mother, father and affected child. (More complicated cases, including pedigrees, can also be considered. Here, for simplicity, we focus throughout only on family trios.) Population stratification problems are overcome by basing the test on the within-family transmission numbers.

Only transmissions from heterozygous (M 1 M 2 ) parents are informative, so we (normally) only consider these. Given that the child is affected, the probability that any such parent transmits M 1 is not necessarily the Mendelian value ½ when θ < ½, i.e. disease and marker loci are linked.

Prob (M 1 M 2 parent transmits M 1 to an affected child) = ½ + δ( 1-2θ)K. Here K is a complicated constant depending on disease and marker allele frequencies, the nature of the disease (dominant, recessive, additive, etc). Note that if θ = ½, the two probabilities given above are equal. These two probabilities are also equal if δ = 0. Thus the test can be used (with care) as a test of the hypothesis δ = 0. The formula confirms the fact that the test has no power as a test of θ = ½ if δ = 0.

Notation (suitable later for QTDTs): For family trio # i, (i = 1, 2,, n) w i is the observed excess of M 1 genes transmitted to the child over the null hypothesis mean of this number. W i is the corresponding random variable. These two quantities are central to all that follows.

Example 1. Suppose that both parents are M 1 M 2. There are three possibilities for the child: M 1 M 1 : here w = 1 M 1 M 2 : here w = 0 M 2 M 2 : here w = -1 It follows that for these matings, Variance (W) = ½ under the null hypothesis.

Example 2. Suppose that one parent is M x M x (for x = 1 or 2) and the other parent is M 1 M 2. There are now two possibilities for the child: M 1 M x : here w = ½. M 2 M x : here w = -½. For these matings, Variance (W) = ¼ under the null hypothesis.

Then the standard TDT statistic, viz. (n 1 n 2 ) 2 /m can be written as: Var ( ( n i = 1 n i = 1 w w i i ) 2 ) H 0 This is (for all practical purposes) distributed as chisquare with one degree of freedom under the null hypothesis if m is large.

Quantitative TDTs Here we consider some quantitative measurement in the child in each trio (e.g. BMI), and not the qualitative state (affected / not affected). The approaches that we consider are those of: Allison (1997) Abecasis et al. (2000a, 2000b), Rabinowitz (1997), Monks-Kaplan (2000).

All the above are in the Abecasis QTDT package. The Rabinowitz procedure is also in the FBAT package. (There are several other approaches available that are not considered here. Also, the PLINK package uses the Abecasis methods.)

Again to keep things simple, we consider only the case of n family trios, each consisting of mother, father and child. We assume that we know the marker locus genotypes of all three members of each trio, and also the quantitative measurement (y), (for example BMI) in each child. The Allison and the Abecasis methods are regressionbased. The Rabinowitz and the Monks-Kaplan approaches are in a sense the converse of this see later. We consider first the Allison and Abecasis approaches.

Let Y i be the continuous phenotype (e.g. BMI) of the child in family i, (i = 1, 2,, n). This is taken as a random variable (hence the upper case notation). For both Allison and Abecasis, the model is Y i ~ N(μ i, σ 2 ) For the Allison and Abecasis models there is a null hypothesis value and an alternative hypothesis value for μ i.

Regression test notation PMT = parental mating type PMT 1: one parent is M 1 M 1, the other M 1 M 2 PMT 2: both parents are M 1 M 2 PMT 3: one parent is M 1 M 2, the other M 2 M 2

Allison linear For family (trio) i, the mean of the measured quantity in the child is: PMT 1: μ + β w PMT 2: μ + α 1 + β w PMT 3: μ + α 2 + β w The null hypothesis θ = ½ becomes β = 0. The alternative hypothesis makes no specification about β. (Why linear? Linear in w.)

Abecasis within only The mean of the measured quantity in the child is: μ + βw for all three mating types. In this model the null hypothesis θ = ½ becomes β = 0. The alternative hypothesis makes no specification about β.

Abecasis orthogonal The mean of the measured quantity in the child is: PMT 1: μ + βw PMT 2: μ + α + βw PMT 3: μ + 2α + βw The null hypothesis θ = ½ again becomes β = 0. The alternative hypothesis again makes no specification about β.

Allison quadratic The mean of the measured quantity in the child is: PMT 1: μ + β 1 w + β 2 w 2 PMT 2: μ + α 1 + β 1 w + β 2 w 2 PMT 3: μ + α 2 + β 1 w + β 2 w 2 The null hypothesis θ = ½ becomes β 1 = β 2 = 0. (This is the same null hypothesis as for the Allison linear model.) The alternative hypothesis makes no specification about β 1 and β 2.

Abecasis dominance The mean of the measured quantity in the child is: PMT 1: μ + βw + γd PMT 2: μ + α + βw + γd PMT 3: μ + 2α + βw + γd Here d = -1 if the child is homozygous (M 1 M 1 or M 2 M 2 ) and d = +1 if the child is heterozygous (M 1 M 2 ). In this model the null hypothesis is β = γ = 0. (Why??)

These models are all thus regression models. They take w as the independent variable. Also, some of them are nested within others. In all models there is a proportion (R 12 ) of the total sum of squares removed under the null hypothesis and a (larger) proportion (R 22 ) of the total sum of squares removed under the alternative hypothesis. This leads to standard regression hypothesis testing procedures.

Some examples of the test statistic: Allison linear: F = Abecasis orthogonal: F = 2 2 ( R R ) 2 2 ( 1 R )/( n 4) 2 1 2 2 ( R R ) 2 2 ( 1 R )/( n 3) 1 1 Abecasis within: F = (1 ( R R 2 2 2 1 R1 ) /( n 2 ) 2) Thus these procedures make an assumption of normality of Y. Note also the degrees of freedom.

The Rabinowitz Approach The main thing to remember about this approach (which is the basic FBAT approach) is that the quantitative measurements Y i are taken as given, (that is, these are the independent variables, and thus denoted by y i ), and the transmission information w i is then the dependent variable (and is thus denoted W i ). THUS THE MEANING OF THE RANDOM VARIABLES IS REVERSED COMPARED TO THE ALLISON AND ABECASIS REGRESSION MODELS. But: having W i as the random variable is in line with the assumptions of the original qualitative TDT.

The numerator component of the test statistic is: S = Σ i (y i - y ) W i Here W i is, as before, the (random) difference between the number of genes that the child in trio i has and its null hypothesis mean. Under the null hypothesis (no linkage between disease and marker loci) S has mean zero and variance V = Σ i (y i - ) 2 Var (W i ) y

Recall: Var(W i ) = ½ for M 1 M 2 M 1 M 2 matings, = ¼ for all other matings Thus the variance V is easily computed. The test statistic is then S / V, approx N(0,1) under H 0.

The numerator component of the Monks-Kaplan test statistic is S = Σ i (y i - y ) W i That is, it is identical to the numerator of the Rabinowitz statistic. The null hypothesis (no linkage between disease and marker) variance of S is estimated by V* = Σ i [(y i - y ) w i ] 2 The test statistic is then S/ V*, approx N(0,1) under H 0.

The reversed-role regression. The Allison and Abecasis procedures are regressions of Y on w. What about a reversed role regression of W on y? This regression model can be written as W i = α + β(y i - y ) + E. The estimate of the slope in this regression is i w i (y i - y )/ i (y i - y ) 2. The numerator is the same as the numerator in the Rabinowitz z statistic. Using standard regression methods, we would test H 0 via a t statistic, defined as t = numerator / s. (s = standard regression SD estimate of the numerator.)

That is, we test for non-zero slope of this regression line. But the original TDT is a test of a non-zero intercept of this line!!!! What is going on??? Isn t this very weird??

What are the properties, good and bad, of these procedures? Main property of all of them: they do not test for the absolute values of the w i. What they test is for changes in these values as a function of y, the phenotype in the child. This is obvious in the regression procedures, but is true also of the Rabinowitz procedure. ALL these procedures would be unchanged if any arbitrary constant were added to the transmission values w. This is in complete contrast to the aim of the qualitative TDT, which tests for the ABSOLUTE w values. Thus quantitative TDTs test quite a different null hypothesis than do qualitative TDTs.

The aim of using the transmission approach is to overcome problems of population stratification. Do these procedures do this? No the Abecasis procedures do not do this if mating type is associated with population strata. For the Abecasis within test. If mating type is associated with strata, the mean of the numerator in the Abecasis F ratio is σ 2 + positive term The mean of the denominator is σ 2 + a different positive term

The Rabinowitz and Monks-Kaplan procedures, and the Allison regression procedures, ARE immune to population stratification. There are many further considerations: power, dominance, using uninformative mating types (for example M 1 M 1 M 2 M 2 ) etc. The take-home message: use QTDT packages with extreme caution. (More details are available in a handout.)