Statistical issues in QTL mapping in mice

Similar documents
Gene mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Mapping multiple QTL in experimental crosses

Introduction to QTL mapping in model organisms

The genomes of recombinant inbred lines

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Mapping multiple QTL in experimental crosses

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Multiple QTL mapping

Use of hidden Markov models for QTL mapping

Mapping QTL to a phylogenetic tree

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Overview. Background

An applied statistician does probability:

Affected Sibling Pairs. Biostatistics 666

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Binary trait mapping in experimental crosses with selective genotyping

56th Annual EAAP Meeting Uppsala, 2005

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Lecture 6. QTL Mapping

Methods for QTL analysis

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Inferring Genetic Architecture of Complex Biological Processes

Lecture 11: Multiple trait models for QTL analysis

Causal Graphical Models in Systems Genetics

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Model Selection for Multiple QTL

Causal Model Selection Hypothesis Tests in Systems Genetics

Lecture 9. QTL Mapping 2: Outbred Populations

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Calculation of IBD probabilities

QTL model selection: key players

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Solutions to Problem Set 4

Calculation of IBD probabilities

The Genomes of Recombinant Inbred Lines: The Gory Details

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Genotype Imputation. Biostatistics 666

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Quantitative Genetics. February 16, 2010

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

The Admixture Model in Linkage Analysis

Variance Component Models for Quantitative Traits. Biostatistics 666

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

(Genome-wide) association analysis

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

DISSECTING the genetic architecture of complex traits is

theta H H H H H H H H H H H K K K K K K K K K K centimorgans

Locating multiple interacting quantitative trait. loci using rank-based model selection

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Teachers Guide. Overview

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Quantitative Genetics I: Traits controlled my many loci. Quantitative Genetics: Traits controlled my many loci

I Have the Power in QTL linkage: single and multilocus analysis

Unit 3 - Molecular Biology & Genetics - Review Packet

Principles of QTL Mapping. M.Imtiaz

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Case-Control Association Testing. Case-Control Association Testing

Lesson 4: Understanding Genetics

Linear Regression (1/1/17)

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

Users Manual of QTL IciMapping v3.1

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Chapter 13 Meiosis and Sexual Reproduction

QTL model selection: key players

Meiosis. ~ fragmentation - pieces split off and each piece becomes a new organism - starfish

Inferring Causal Phenotype Networks from Segregating Populat

Users Guide for New BC s F t Tools for R/qtl

2. Map genetic distance between markers

Quantitative characters II: heritability

On the mapping of quantitative trait loci at marker and non-marker locations

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Chapter 11 INTRODUCTION TO GENETICS

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

InGen: Dino Genetics Lab Post-Lab Activity: DNA and Genetics

Meiosis and Mendel. Chapter 6

MANY agriculturally and biomedically important

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

EXERCISES FOR CHAPTER 3. Exercise 3.2. Why is the random mating theorem so important?

arxiv: v4 [stat.me] 2 May 2015

QTL mapping of Poisson traits: a simulation study

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Modeling IBD for Pairs of Relatives. Biostatistics 666 Lecture 17

Transcription:

Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping multiple QTLs Recombinant inbred lines Heterogeneous stock and 8-way RILs 2 1

The intercross 3 The data Phenotypes, y i Genotypes, x ij = AA/AB/BB, at genetic markers A genetic map, giving the locations of the markers. 4 2

Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs. 5 Phenotypes 133 females (NOD B6) (NOD B6) 6 3

NOD 7 C57BL/6 8 4

Agouti coat 9 Genetic map 10 5

Genotype data 11 Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs. 12 6

Statistical structure Missing data: markers QTL Model selection: genotypes phenotype 13 Models: recombination No crossover interference Locations of breakpoints according to a Poisson process. Genotypes along chromosome follow a Markov chain. Clearly wrong, but super convenient. 14 7

Models: gen phe Phenotype = y, whole-genome genotype = g Imagine that p sites are all that matter. E(y g) = m(g 1,,g p ) SD(y g) = s(g 1,,g p ) Simplifying assumptions: SD(y g) = s, independent of g y g ~ normal( m(g 1,,g p ), s ) m(g 1,,g p ) = m + a j 1{g j = AB} + b j 1{g j = BB} 15 Interval mapping Lander and Botstein 1989 Imagine that there is a single QTL, at position z. Let q i = genotype of mouse i at the QTL, and assume y i q i ~ normal( m(q i ), s ) We won t know q i, but we can calculate p ig = Pr(q i = g marker data) y i, given the marker data, follows a mixture of normal distributions with known mixing proportions (the p ig ). Use an EM algorithm to get MLEs of q = (m AA, m AB, m BB, s). Measure the evidence for a QTL via the LOD score, which is the log 10 likelihood ratio comparing the hypothesis of a single QTL at position z to the hypothesis of no QTL anywhere. 16 8

LOD curves 17 LOD thresholds To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. The 95th percentile of this distribution is used as a significance threshold. Such a threshold may be estimated via permutations (Churchill and Doerge 1994). 18 9

Permutation test Shuffle the phenotypes relative to the genotypes. Calculate M* = max LOD*, with the shuffled data. Repeat many times. LOD threshold = 95th percentile of M*. P-value = Pr(M* M) 19 Permutation distribution 20 10

Chr 9 and 11 21 Non-normal traits 22 11

Non-normal traits Standard interval mapping assumes that the residual variation is normally distributed (and so the phenotype distribution follows a mixture of normal distributions). In reality: we see binary traits, counts, skewed distributions, outliers, and all sorts of odd things. Interval mapping, with LOD thresholds derived via permutation tests, often performs fine anyway. Alternatives to consider: Nonparametric linkage analysis (Kruglyak and Lander 1995). Transformations (e.g., log or square root). Specially-tailored models (e.g., a generalized linear model, the Cox proportional hazards model, the model of Broman 2003). 23 Split by sex 24 12

Split by sex 25 Split by parent-of-origin 26 13

Split by parent-of-origin Percent of individuals with phenotype Genotype at D15Mit252 Genotype at D19Mit59 P-O-O AA AB AA AB Dad 63% 54% 75% 43% Mom 57% 23% 38% 40% 27 The X chromosome (N B) (N B) (B N) (B N) 28 14

The X chromosome BB BY? NN NY? Different degrees of freedom Autosome NN : NB : BB Females, one direction NN : NB Both sexes, both dir. NY : NN : NB : BB : BY fi Need an X-chr-specific LOD threshold. Null model should include a sex effect. 29 Chr 9 and 11 30 15

Epistasis 31 Going after multiple QTLs Greater ability to detect QTLs. Separate linked QTLs. Learn about interactions between QTLs (epistasis). 32 16

Model selection Choose a class of models. Additive; pairwise interactions; regression trees Fit a model (allow for missing genotype data). Linear regression; ML via EM; Bayes via MCMC Search model space. Forward/backward/stepwise selection; MCMC Compare models. BIC d (g) = log L(g) + (d/2) g log n Miss important loci include extraneous loci. 33 Special features Relationship among the covariates. Missing covariate information. Identify the key players vs. minimize prediction error. 34 17

Opportunities for improvements Each individual is unique. Must genotype each mouse. Unable to obtain multiple invasive phenotypes (e.g., in multiple environmental conditions) on the same genotype. Relatively low mapping precision. Æ Design a set of inbred mouse strains. Genotype once. Study multiple phenotypes on the same genotype. 35 Recombinant inbred lines 36 18

AXB/BXA panel 37 AXB/BXA panel 38 19

The usual analysis Calculate the phenotype average within each strain. Use these strain averages for QTL mapping as with a backcross (taking account of the map expansion in RILs). Can we do better? With the above data: Ave. no. mice per strain = 15.8 (SD = 8.4) Range of no. mice per strain = 3 39 39 A simple model for RILs y si = m + b x s + h s + e si x s = 0 or 1, according to genotype at putative QTL s s 2 h s = strain (polygenic) effect ~ normal(0, ) s e 2 e si = residual environment effect ~ normal(0, ) y s = Â i y si n s var(y s ) = s s 2 +s e 2 n s 40 20

RIL analysis If and were known: s s 2 s e 2 Work with the strain averages, Weight by Equivalently, weight by n s n s h 2 + (1- h 2 ) where 1 { s 2 +s 2 e n s } h 2 = s s 2 (s s 2 +s e 2 ) y s { } Equal n s : The usual analysis is fine. h 2 large: Weight the strains equally. h 2 small: Weight the strains by n s. 41 LOD curves 42 21

Chr 7 and 19 43 Recombination fractions 44 22

Chr 7 and 19 45 RI lines Advantages Each strain is a eternal resource. Only need to genotype once. Reduce individual variation by phenotyping multiple individuals from each strain. Study multiple phenotypes on the same genotype. Greater mapping precision. Disadvantages Time and expense. Available panels are generally too small (10-30 lines). Can learn only about 2 particular alleles. All individuals homozygous. 46 23

The RIX design 47 Heterogeneous stock McClearn et al. (1970) Mott et al. (2000); Mott and Flint (2002) Start with 8 inbred strains. Randomly breed 40 pairs. Repeat the random breeding of 40 pairs for each of ~60 generations (30 years). The genealogy (and protocol) is not completely known. 48 24

Heterogeneous stock 49 Heterogeneous stock Advantages Disadvantages Great mapping precision. Learn about 8 alleles. Time. Each individual is unique. Need extremely dense markers. 50 25

The Collaborative Cross 51 Genome of an 8-way RI 52 26

The Collaborative Cross Advantages Great mapping precision. Eternal resource. Genotype only once. Study multiple invasive phenotypes on the same genotype. Barriers Advantages not widely appreciated. Ask one question at a time, or Ask many questions at once? Time. Expense. Requires large-scale collaboration. 53 To be worked out Breakpoint process along an 8-way RI chromosome. Reconstruction of genotypes given multipoint marker data. Single-QTL analyses. Mixed models, with random effects for strains and genotypes/alleles. Power and precision (relative to an intercross). 54 27

Haldane & Waddington 1930 r = recombination fraction per meiosis between two loci Autosomes Pr(G 1 =AA) = Pr(G 1 =BB) = 1/2 Pr(G 2 =BB G 1 =AA) = Pr(G 2 =AA G 1 =BB) = 4r / (1+6r) X chromosome Pr(G 1 =AA) = 2/3 Pr(G 1 =BB) = 1/3 Pr(G 2 =BB G 1 =AA) = 2r / (1+4r) Pr(G 2 =AA G 1 =BB) = 4r / (1+4r) Pr(G 2 G 1 ) = (8/3) r / (1+4r) 55 8-way RILs Autosomes Pr(G 1 = i) = 1/8 Pr(G 2 = j G 1 = i) = r / (1+6r) for i j Pr(G 2 G 1 ) = 7r / (1+6r) X chromosome Pr(G 1 =AA) = Pr(G 1 =BB) = Pr(G 1 =EE) = Pr(G 1 =FF) =1/6 Pr(G 1 =CC) = 1/3 Pr(G 2 =AA G 1 =CC) = r / (1+4r) Pr(G 2 =CC G 1 =AA) = 2r / (1+4r) Pr(G 2 =BB G 1 =AA) = r / (1+4r) Pr(G 2 G 1 ) = (14/3) r / (1+4r) 56 28

Acknowledgments Terry Speed, Univ. of California, Berkeley and WEHI Tom Brodnicki, WEHI Gary Churchill, The Jackson Laboratory Joe Nadeau, Case Western Reserve Univ. 57 29