Introduction to QTL mapping in model organisms

Similar documents
Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Mapping multiple QTL in experimental crosses

Gene mapping in model organisms

Mapping multiple QTL in experimental crosses

Statistical issues in QTL mapping in mice

Multiple QTL mapping

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Mapping QTL to a phylogenetic tree

Use of hidden Markov models for QTL mapping

Binary trait mapping in experimental crosses with selective genotyping

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

QTL Mapping I: Overview and using Inbred Lines

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Methods for QTL analysis

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Overview. Background

An applied statistician does probability:

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Inferring Causal Phenotype Networks from Segregating Populat

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

A new simple method for improving QTL mapping under selective genotyping

Lecture 6. QTL Mapping

Multiple interval mapping for ordinal traits

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

BTRY 4830/6830: Quantitative Genomics and Genetics

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Locating multiple interacting quantitative trait. loci using rank-based model selection

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Lecture WS Evolutionary Genetics Part I 1

Lecture 11: Multiple trait models for QTL analysis

QTL model selection: key players

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Affected Sibling Pairs. Biostatistics 666

The Admixture Model in Linkage Analysis

Inferring Genetic Architecture of Complex Biological Processes

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Quantitative genetics

The evolutionary success of flowering plants is to a certain

On the mapping of quantitative trait loci at marker and non-marker locations

... x. Variance NORMAL DISTRIBUTIONS OF PHENOTYPES. Mice. Fruit Flies CHARACTERIZING A NORMAL DISTRIBUTION MEAN VARIANCE

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Nonparametric Modeling of Longitudinal Covariance Structure in Functional Mapping of Quantitative Trait Loci

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

Causal Model Selection Hypothesis Tests in Systems Genetics

Users Manual of QTL IciMapping v3.1

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Variance Component Models for Quantitative Traits. Biostatistics 666

I Have the Power in QTL linkage: single and multilocus analysis

arxiv: v4 [stat.me] 2 May 2015

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Causal Graphical Models in Systems Genetics

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

QTL mapping of Poisson traits: a simulation study

Association studies and regression

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

Review Article Statistical Methods for Mapping Multiple QTL

Quantitative Genetics. February 16, 2010

Genotype Imputation. Biostatistics 666

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Multiple-Interval Mapping for Quantitative Trait Loci Controlling Endosperm Traits. Chen-Hung Kao 1

ANIMAL models and their corresponding genomes parameter estimation, because the sample size for the

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

The genomes of recombinant inbred lines

On Computation of P-values in Parametric Linkage Analysis

QTL model selection: key players

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

MATH5745 Multivariate Methods Lecture 07

Users Guide for New BC s F t Tools for R/qtl

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

Analysis of Variance

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Selecting explanatory variables with the modified version of Bayesian Information Criterion

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

Bayesian Regression (1/31/13)

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

Model Selection for Multiple QTL

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Linear Regression (1/1/17)

Transcription:

Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Backcross P 1 P 2 P 1 F 1 BC 4

Intercross P 1 P 2 F 1 F 1 F 2 5 Phenotype data log 2 liver 5 6 7 8 log 2 spleen 6 7 8 9 10 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 6 7 8 9 10 log 2 liver log 2 spleen Male Female 6

Genetic map 0 20 Location (cm) 40 60 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Chromosome 7 Genotype data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X 250 200 Individuals 150 100 50 10 20 30 40 50 60 Markers 8

Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 9 Statistical structure QTL Covariates Markers Phenotype The missing data problem: Markers QTL The model selection problem: QTL, covariates phenotype 10

ANOVA at marker loci Also known as marker regression. Split mice into groups according to genotype at a marker. Do a t-test / ANOVA. Repeat for each marker. 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Genotype at D16Mit30 log 2 liver SS BS BB Genotype at D14Mit54 SS BS BB 11 ANOVA at marker loci Advantages Simple. Easily incorporates covariates. Easily extended to more complex models. Doesn t require a genetic map. Disadvantages Must exclude individuals with missing genotype data. Imperfect information about QTL location. Suffers in low density scans. Only considers one QTL at a time. 12

Interval mapping Lander & Botstein (1989) Assume a single QTL model. Each position in the genome, one at a time, is posited as the putative QTL. Let q = 1/0 if the (unobserved) QTL genotype is BB/AB. (Or 2/1/0 if the QTL genotype is BB/AB/AA in an intercross.) Assume y q N(µ q, σ) Given genotypes at linked markers, y mixture of normal dist ns with mixing proportions Pr(q marker data): QTL genotype M 1 M 2 BB AB BB BB (1 r L )(1 r R )/(1 r) r L r R /(1 r) BB AB (1 r L )r R /r r L (1 r R )/r AB BB r L (1 r R )/r (1 r L )r R /r AB AB r L r R /(1 r) (1 r L )(1 r R )/(1 r) 13 Genotype probabilities A B B A A? Calculate Pr(q marker data), assuming No crossover interference No genotyping errors Or use the hidden Markov model (HMM) technology To allow for genotyping errors To incorporate dominant markers (Still assume no crossover interference.) 14

The normal mixtures 7 cm 13 cm AB/AB M 1 Q M 2 µ 0 µ 1 Two markers separated by 20 cm, with the QTL closer to the left marker. The figure at right shows the distributions of the phenotype conditional on the genotypes at the two markers. AB/BB BB/AB BB/BB The dashed curves correspond to the components of the mixtures. 20 40 µ 0 60 µ 1 80 100 µ 0 µ 0 µ 1 µ 1 Phenotype 15 Interval mapping Let p ij = Pr(q i = j marker data) y i q i N(µ qi, σ 2 ) Pr(y i marker data, µ 0, µ 1, σ) = j p ij f(y i ; µ j, σ) where f(y; µ, σ) = exp[ (y µ) 2 /(2σ 2 )]/ 2πσ 2 Log likelihood: l(µ 0, µ 1, σ) = i log Pr(y i marker data, µ 0, µ 1, σ) Maximum likelihood estimates (MLEs) of µ 0, µ 1, σ: values for which l(µ 0, µ 1, σ) is maximized. 16

Dempster et al. (1977) E step: Let M step: Let w (k) ij ˆµ (k) j EM algorithm = Pr(q i = j y i, marker data, ˆµ (k 1) 0, ˆµ (k 1) 1, ˆσ (k 1) ) = p ij f(y i ;ˆµ (k 1) j,ˆσ (k 1) ) j p ij f(y i ;ˆµ (k 1) j,ˆσ (k 1) ) = i y iw (k) ij / i w(k) ij ˆσ (k) = i The algorithm: Start with w (1) ij j w(k) ij (y i ˆµ (k) j ) 2 /n = p ij ; iterate the E & M steps until convergence. 17 LOD scores The LOD score is a measure of the strength of evidence for the presence of a QTL at a particular location. LOD(λ) = log 10 likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL = log 10 { Pr(y QTL at λ,ˆµ0λ,ˆµ 1λ,ˆσ λ ) Pr(y no QTL,ˆµ,ˆσ) } ˆµ 0λ, ˆµ 1λ, ˆσ λ are the MLEs, assuming a single QTL at position λ. No QTL model: The phenotypes are independent and identically distributed (iid) N(µ, σ 2 ). 18

LOD curves 12 log 2 liver log 2 spleen 10 8 LOD score 6 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 19 Interval mapping Advantages Takes proper account of missing data. Allows examination of positions between markers. Gives improved estimates of QTL effects. Provides pretty graphs. Disadvantages Increased computation time. Requires specialized software. Difficult to generalize. Only considers one QTL at a time. 20

LOD thresholds Large LOD scores indicate evidence for the presence of a QTL Question: How large is large? LOD threshold = 95 %ile of distr n of max LOD, genome-wide, if there are no QTLs anywhere Derivation: Analytical calculations (L & B 1989) Simulations (L & B 1989) Permutation tests (Churchill & Doerge 1994) 21 Null distribution of the LOD score Null distribution derived by computer simulation of backcross with genome of typical size. Dashed curve: distribution of LOD score at any one point. Solid curve: distribution of maximum LOD score, genome-wide. 0 1 2 3 4 5 LOD score 22

Permutation test markers individuals genotype data phenotypes LOD scores maximum LOD score 23 Permutation results 0 1 2 3 4 5 6 Genome wide maximum LOD score 24

LOD support intervals 8 1.5 6 LOD score 4 2 1.5 LOD support interval 0 0 10 20 30 40 50 Map position (cm) 26 Modelling multiple QTL Reduce residual variation = increased power Separate linked QTL Identify interactions among QTL 27

Epistasis in BC Additive Epistatic 100 100 H H 80 80 Ave. phenotype 60 40 A QTL 2 Ave. phenotype 60 40 A QTL 2 20 20 0 0 A H A H QTL 1 QTL 1 28 Epistasis in F 2 Additive Epistatic 100 100 B B 80 H QTL 2 80 QTL 2 Ave. phenotype 60 40 A Ave. phenotype 60 40 H 20 20 0 0 A A H B QTL 1 A H B QTL 1 29

References Broman KW (2001) Review of statistical methods for QTL mapping in experimental crosses. Lab Animal 30:44 52 A review for non-statisticians. Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, MA, chapter 15 Chapter on QTL mapping. Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185 199 The seminal paper. Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963 971 LOD thresholds by permutation tests. Strickberger MW (1985) Genetics, 3rd edition. Macmillan, New York, chapter 11. An old but excellent general genetics textbook with a very interesting discussion of epistasis. 30