Introduction to QTL mapping in model organisms

Similar documents
Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Mapping multiple QTL in experimental crosses

Gene mapping in model organisms

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Statistical issues in QTL mapping in mice

Mapping multiple QTL in experimental crosses

Multiple QTL mapping

Mapping QTL to a phylogenetic tree

Use of hidden Markov models for QTL mapping

Binary trait mapping in experimental crosses with selective genotyping

QTL Mapping I: Overview and using Inbred Lines

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Methods for QTL analysis

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Overview. Background

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Lecture 9. QTL Mapping 2: Outbred Populations

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Inferring Causal Phenotype Networks from Segregating Populat

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

A new simple method for improving QTL mapping under selective genotyping

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Multiple interval mapping for ordinal traits

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Lecture 6. QTL Mapping

QTL model selection: key players

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

Locating multiple interacting quantitative trait. loci using rank-based model selection

An applied statistician does probability:

BTRY 4830/6830: Quantitative Genomics and Genetics

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Affected Sibling Pairs. Biostatistics 666

Lecture WS Evolutionary Genetics Part I 1

The evolutionary success of flowering plants is to a certain

The Admixture Model in Linkage Analysis

Lecture 11: Multiple trait models for QTL analysis

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Quantitative genetics

On the mapping of quantitative trait loci at marker and non-marker locations

How to display data badly

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Causal Model Selection Hypothesis Tests in Systems Genetics

Inferring Genetic Architecture of Complex Biological Processes

Genotype Imputation. Biostatistics 666

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

I Have the Power in QTL linkage: single and multilocus analysis

arxiv: v4 [stat.me] 2 May 2015

ANIMAL models and their corresponding genomes parameter estimation, because the sample size for the

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Causal Graphical Models in Systems Genetics

QTL mapping of Poisson traits: a simulation study

Users Manual of QTL IciMapping v3.1

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Review Article Statistical Methods for Mapping Multiple QTL

Quantitative Genetics. February 16, 2010

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits. Chen-Hung Kao 1

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

The genomes of recombinant inbred lines

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

QTL model selection: key players

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Users Guide for New BC s F t Tools for R/qtl

Association studies and regression

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

2. Map genetic distance between markers

Estimation: Problems & Solutions

A multivariate model for ordinal trait analysis

Major Genes, Polygenes, and

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when

Chapter 17: Undirected Graphical Models

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Causal Model Selection Hypothesis Tests. in Systems Genetics

Calculation of IBD probabilities

Model Selection for Multiple QTL

Bayesian Regression (1/31/13)

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Principles of QTL Mapping. M.Imtiaz

Variance Component Models for Quantitative Traits. Biostatistics 666

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Transcription:

Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

Intercross P 1 P 2 F 1 F 1 F 2 5 Phenotype data Body weight 30 35 40 45 Heart weight 80 100 120 140 160 180 200 30 35 40 45 80 100 120 140 160 180 200 Body weight Heart weight Sugiyama et al. (2002) Physiol Genomics 10:5 12 6

Genetic map 0 20 Location (cm) 40 60 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 7 Genotype data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 150 100 Individuals 50 20 40 60 80 Markers 8

Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 9 Statistical structure QTL Covariates Markers Phenotype The missing data problem: Markers QTL The model selection problem: QTL, covariates phenotype 10

ANOVA at marker loci Also known as marker regression. Split mice into groups according to genotype at a marker. Do a t-test / ANOVA. Repeat for each marker. 30 35 40 45 Genotype at D15Mit184 Body weight CC CB BB Genotype at D6Mit259 CC CB BB 11 ANOVA at marker loci Advantages Simple. Easily incorporates covariates. Easily extended to more complex models. Doesn t require a genetic map. Disadvantages Must exclude individuals with missing genotype data. Imperfect information about QTL location. Suffers in low density scans. Only considers one QTL at a time. 12

Interval mapping Lander & Botstein (1989) Assume a single QTL model. Each position in the genome, one at a time, is posited as the putative QTL. Let q = 1/0 if the (unobserved) QTL genotype is BB/AB. (Or 2/1/0 if the QTL genotype is BB/AB/AA in an intercross.) Assume y q N(µ q, σ) Given genotypes at linked markers, y mixture of normal dist ns with mixing proportions Pr(q marker data): 13 Genotype probabilities A B B A A? Calculate Pr(q marker data), assuming No crossover interference No genotyping errors Or use the hidden Markov model (HMM) technology To allow for genotyping errors To incorporate dominant markers (Still assume no crossover interference.) 14

The normal mixtures 7 cm 13 cm AB/AB M 1 Q M 2 µ 0 µ 1 Two markers separated by 20 cm, with the QTL closer to the left marker. The figure at right shows the distributions of the phenotype conditional on the genotypes at the two markers. AB/BB BB/AB BB/BB The dashed curves correspond to the components of the mixtures. 20 40 µ 0 60 µ 1 80 100 µ 0 µ 0 µ 1 µ 1 Phenotype 15 Interval mapping Let p ij = Pr(q i = j marker data) y i q i N(µ qi, σ 2 ) Pr(y i marker data, µ 0, µ 1, σ) = j p ij f(y i ; µ j, σ) where f(y; µ, σ) = exp[ (y µ) 2 /(2σ 2 )]/ 2πσ 2 Log likelihood: l(µ 0, µ 1, σ) = i log Pr(y i marker data, µ 0, µ 1, σ) Maximum likelihood estimates (MLEs) of µ 0, µ 1, σ: values for which l(µ 0, µ 1, σ) is maximized. 16

Dempster et al. (1977) E step: Let M step: Let w (k) ij ˆµ (k) j EM algorithm = Pr(q i = j y i, marker data, ˆµ (k 1) 0, ˆµ (k 1) 1, ˆσ (k 1) ) = p ij f(y i ;ˆµ (k 1) j,ˆσ (k 1) ) j p ij f(y i ;ˆµ (k 1) j,ˆσ (k 1) ) = i y iw (k) ij / i w(k) ij ˆσ (k) = i The algorithm: Start with w (1) ij j w(k) ij (y i ˆµ (k) j ) 2 /n = p ij ; iterate the E & M steps until convergence. 17 LOD scores The LOD score is a measure of the strength of evidence for the presence of a QTL at a particular location. LOD(λ) = log 10 likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL = log 10 { Pr(y QTL at λ,ˆµ0λ,ˆµ 1λ,ˆσ λ ) Pr(y no QTL,ˆµ,ˆσ) } ˆµ 0λ, ˆµ 1λ, ˆσ λ are the MLEs, assuming a single QTL at position λ. No QTL model: The phenotypes are independent and identically distributed (iid) N(µ, σ 2 ). 19

LOD curves 7 6 Body weight Heart weight 5 LOD score 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 20 Interval mapping Advantages Takes proper account of missing data. Allows examination of positions between markers. Gives improved estimates of QTL effects. Provides pretty graphs. Disadvantages Increased computation time. Requires specialized software. Difficult to generalize. Only considers one QTL at a time. 22

LOD thresholds Large LOD scores indicate evidence for the presence of a QTL Question: How large is large? LOD threshold = 95 %ile of distr n of max LOD, genome-wide, if there are no QTLs anywhere Derivation: Analytical calculations (L & B 1989) Simulations (L & B 1989) Permutation tests (Churchill & Doerge 1994) 23 Null distribution of the LOD score Null distribution derived by computer simulation of backcross with genome of typical size. Dashed curve: distribution of LOD score at any one point. Solid curve: distribution of maximum LOD score, genome-wide. 0 1 2 3 4 5 LOD score 24

Permutation test markers individuals genotype data phenotypes LOD scores maximum LOD score 25 Permutation results 0 1 2 3 4 5 6 Genome wide maximum LOD score 26

LOD support intervals 7 6 1.5 5 LOD score 4 3 2 1 0 1.5 LOD support interval 0 10 20 30 40 50 Map position (cm) 29 Modelling multiple QTL Reduce residual variation = increased power Separate linked QTL Identify interactions among QTL 30

Epistasis in BC Additive Epistatic 100 100 H H 80 80 Ave. phenotype 60 40 A QTL 2 Ave. phenotype 60 40 A QTL 2 20 20 0 0 A H A H QTL 1 QTL 1 31 Epistasis in F 2 Additive Epistatic 100 100 B B 80 H QTL 2 80 QTL 2 Ave. phenotype 60 40 A Ave. phenotype 60 40 H 20 20 0 0 A A H B QTL 1 A H B QTL 1 32

References Broman KW (2001) Review of statistical methods for QTL mapping in experimental crosses. Lab Animal 30:44 52 A review for non-statisticians. Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, MA, chapter 15 Chapter on QTL mapping. Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185 199 The seminal paper. Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963 971 LOD thresholds by permutation tests. Strickberger MW (1985) Genetics, 3rd edition. Macmillan, New York, chapter 11. An old but excellent general genetics textbook with a very interesting discussion of epistasis. 33 URLs http://kbroman.org http://kbroman.org/pages/teaching.html http://www.biostat.wisc.edu/~kbroman/d3/em_alg http://www.biostat.wisc.edu/~kbroman/d3/lod_and_effect http://www.biostat.wisc.edu/~kbroman/d3/lod_random 34