Multiple QTL mapping

Similar documents
Mapping multiple QTL in experimental crosses

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Mapping multiple QTL in experimental crosses

Gene mapping in model organisms

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Statistical issues in QTL mapping in mice

QTL model selection: key players

Overview. Background

Introduction to QTL mapping in model organisms

QTL Mapping I: Overview and using Inbred Lines

Introduction to QTL mapping in model organisms

Mapping QTL to a phylogenetic tree

Locating multiple interacting quantitative trait. loci using rank-based model selection

QTL model selection: key players

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Use of hidden Markov models for QTL mapping

Model Selection for Multiple QTL

Causal Graphical Models in Systems Genetics

Inferring Genetic Architecture of Complex Biological Processes

Causal Model Selection Hypothesis Tests in Systems Genetics

Multiple interval mapping for ordinal traits

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Model comparison and selection

The genomes of recombinant inbred lines

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

BTRY 4830/6830: Quantitative Genomics and Genetics

A Frequentist Assessment of Bayesian Inclusion Probabilities

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Methods for QTL analysis

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Data Mining Stat 588

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Binary trait mapping in experimental crosses with selective genotyping

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

what is Bayes theorem? posterior = likelihood * prior / C

I Have the Power in QTL linkage: single and multilocus analysis

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Statistics 262: Intermediate Biostatistics Model selection

MOST complex traits are influenced by interacting

Chapter 1 Statistical Inference

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Linear Model Selection and Regularization

Regression, Ridge Regression, Lasso

Review Article Statistical Methods for Mapping Multiple QTL

Lecture 9. QTL Mapping 2: Outbred Populations

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Sparse Linear Models (10/7/13)

MANY complex human diseases and traits of bio- tive corrections for multiple testing. Non-Bayesian model

Variance Component Models for Quantitative Traits. Biostatistics 666

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Lecture 11: Multiple trait models for QTL analysis

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Association studies and regression

Linear Regression (1/1/17)

TECHNICAL REPORT NO December 1, 2008

Single Marker Analysis and Interval Mapping

Penalized Loss functions for Bayesian Model Choice

CS Homework 3. October 15, 2009

The Admixture Model in Linkage Analysis

Bayesian reversible-jump for epistasis analysis in genomic studies

GWAS IV: Bayesian linear (variance component) models

Bayesian Inference of Interactions and Associations

Genotype Imputation. Biostatistics 666

Bayesian Model Selection for Multiple QTL. Brian S. Yandell

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

(Genome-wide) association analysis

Quantitative Genetics. February 16, 2010

Bayesian Regression (1/31/13)

Lecture 6. QTL Mapping

Introduction to Statistical modeling: handout for Math 489/583

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

How the mean changes depends on the other variable. Plots can show what s happening...

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Model Checking and Improvement

arxiv: v4 [stat.me] 2 May 2015

Bayesian model selection: methodology, computation and applications

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

A new simple method for improving QTL mapping under selective genotyping

Inferring Causal Phenotype Networks from Segregating Populat

Multiple linear regression S6

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Transcription:

Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1

Why? Reduce residual variation = increased power Separate linked QTL Identify interactions among QTL

Hypothesis testing? In the past, QTL mapping has been regarded as a task of hypothesis testing. Is this a QTL? Much of the focus has been on adjusting for test multiplicity. It is better to view the problem as one of model selection. What set of QTL are well supported? Is there evidence for QTL-QTL interactions? Model = a defined set of QTL and QTL-QTL interactions (and possibly covariates and QTL-covariate interactions). 3

Statistical structure QTL Covariates Markers Phenotype The missing data problem: Markers QTL The model selection problem: QTL, covariates phenotype

Starting points Single-QTL scan (ie, interval mapping) Loci with marginal effects should appear -dim, -QTL scan Ability to separate linked QTL Identify interacting loci 5

Example: 1d scan 6 5 lod 3 5% 0% 63% 1 0 1 3 5 6 7 8 9 10 11 1 13 1 15 16 17 18 19 Chromosome 6

Example: d scan Chromosome Chromosome 1 1 3 3 5 5 6 6 7 7 8 8 9 9 10 10 11 11 1 1 13 13 1 1 15 15 16 16 17 17 18 18 19 19 0 6 8 0 6 7

LOD i and LOD fv1 6 6 3 Chromosome 1 0 0 1 3 Chromosome 8

LOD av1 and LOD fv1 3 6 3 Chromosome 1 1 0 0 1 3 Chromosome 9

QTL effects D3M3 D1M6 x DM5 DM x DM6 70 70 70 65 65 65 Ave. phenotype 60 55 Ave. phenotype 60 55 A DM5 Ave. phenotype 60 55 H DM6 50 50 H 50 A 5 5 5 A H A H A H Genotype D1M6 DM 10

Exploratory methods Condition on a large-effect QTL Reduce residual variation Conditional LOD score: { } Pr(data q1, q ) LOD(q q 1 ) = log 10 Pr(data q 1 ) Piece together the putative QTL from the 1d and d scans Omit loci that no longer look interesting (drop-one-at-a-time analysis) Study potential interactions among the identified loci Scan for additional loci (perhaps allowing interactions), conditional on these 11

Condition on D3M3 6 IM condition on D3M3 5 lod 3 1 0 1 3 5 6 7 8 9 10 11 1 13 1 15 16 17 18 19 Chromosome 1

Drop-one-at-a-time chr pos df LOD % var 1 7.5 7.6 10. 0.0 7.8 11.1 3 0.0 1 6.6 9.3 7.5 1.16 5.7 5.5 1.87 3.9 1 1 7.17 10.1 Overall: LOD = 18., % var = 8.5 13

Refined positions chr pos df LOD % var 1 50.0 7.57 10.6 0.0 8.1 11.6 3.5 1 6.90 9.6 30.0 1.69 6. 5.5 1 3.30. 1 1 7.51 10.5 Overall: LOD = 18.8, % var = 9. 1

Scan for further QTL 6 IM condition on QTL 5 lod 3 1 0 1 3 5 6 7 8 9 10 11 1 13 1 15 16 17 18 19 Chromosome 15

Composite interval mapping Identify a set of markers, S = {x 1, x,..., x k }, proxies for QTL Scan the genome, using these markers as covariates What do we do in the case of missing marker genotype data? At a position far from any of the marker covariates, compare S q and S Within some fixed window of a marker covariate, compare S \ {x} q and S \ {x} The key issue: How to select S? QTL Cartographer: forward selection to some fixed number of markers 16

CIM results 1 marker 5 markers 6 5 IM CIM 6 5 IM CIM lod 3 lod 3 1 1 0 1 3 5 6 9 15 17 0 1 3 5 6 9 15 17 Chromosome Chromosome 3 markers 7 markers 6 5 IM CIM 6 5 IM CIM lod 3 lod 3 1 1 0 1 3 5 6 9 15 17 0 1 3 5 6 9 15 17 Chromosome Chromosome 17

Perfect data situation To ease discussion, we ll focus on a simple special case: Complete marker genotype data Markers are only putative QTL Normally distributed residuals Example model (in a backcross): y i = µ + β 1 q i1 + β q i + β 3 q i3 + ɛ i, ɛ i N(0, σ ) q j are 0/1 variable (QTL genotypes) µ, β s are parameters, estimated by least squares Fitted values: ŷ i = ˆµ + ˆβ 1 q i1 + ˆβ q i + ˆβ 3 q i3 RSS = i (y i ŷ i ) indicates model fit. 18

The problem Consider all putative QTL and QTL QTL interactions: y = µ + j β j q j + j<k γ jk q j q k + ɛ Which β j 0? Which γ jk 0? 19

Model selection Class of models Additive models + pairwise interactions + higher-order interactions Model comparison Estimated prediction error AIC, BIC, penalized likelihood Bayes Model fit Maximum likelihood Haley-Knott regression extended Haley-Knott Multiple imputation MCMC Model search Forward selection Backward elimination Stepwise selection Randomized algorithms 0

Intercross: class of models Always bring in both degrees of freedom with a QTL or Try to distinguish additivity/dominance/recessiveness? A H B Always bring in all four d.f. with a QTL:QTL interaction or Try to distinguish ways to subdivide the 3 3 table? A H B A H B 1

Regression tree QTL 1 A H B A 100 80 80 QTL H 0 0 0 B 0 0 0 A q 1 A q H or B H or B 0 100 80

Model comparison Imagine you could fit all possible models; which would you like best? This issue is like LOD thresholds, but more complex. For models with the same number of parameters (QTLs and interactions), we prefer that with the best fit (smallest RSS or largest likelihood). If you fit more parameters, you ll get a better fit. How much better, before including additional terms? I like a form of penalized likelihood. 3

The additive QTL case n backcross mice; M markers x ij = genotype (1/0) of mouse i at marker j y i = phenotype (trait value) of mouse i y i = µ + M β j x ij + ɛ i Which β j 0? j=1 BIC δ = log RSS + no. markers ( δ log n ) n

Smaller δ: include more loci; higher false positive rate Larger δ: include fewer loci; lower false positive rate Let T = 95% genome-wide LOD threshold (compare single-qtl models to the null model) Choose δ = T / log 10 n Choice of δ With this choice of δ, in the absence of QTLs, we ll include at least one extraneous locus, 5% of the time. Note that now we have BIC δ = log 10 RSS + no. markers (LOD no. markers T) ( ) T n 5

Model search Consider the case of additive QTL models, with 100 putative QTLs. There are 100 10 30 possible models, far more than can be inspected individually. Need a way to search through this space, to find the good ones. This is really a matter of grunt work. (More is better; the tradeoff is with computational time.) 6

Target Selection of a model includes two types of errors: Miss important terms (QTLs or interactions) Include extraneous terms Unlike in hypothesis testing, we can make both errors at the same time! Identify as many correct terms as possible, while controlling the rate of inclusion of extraneous terms. You can t know the performance of your procedure with your data you need to know the truth. You can know: How a particular procedure performs in simulated cases How a procedure performs in simulated data close to what you ve inferred 7

What is special here? Goal: identify the major players A continuum of ordinal-valued covariates (the genetic loci) Association among the covariates Loci on different chromosomes are independent Along chromosome, a very simple (and known) correlation structure 8

A simulation study Backcross with n=50 No crossover interference 9 chr, each 100 cm Markers at 10 cm spacing; complete genotype data 7 QTL One pair in coupling One pair in repulsion Three unlinked QTL Heritability = 50% 000 simulation replicates 1 3 5 6 7 8 9 9

Methods ANOVA at marker loci Composite interval mapping (CIM) Forward selection with permutation tests Forward selection with BIC δ Backward elimination with BIC δ FS followed by BE with BIC δ MCMC with BIC δ A selected marker was deemed correct if it was within 10 cm of a QTL (i.e. correct or adjacent). 30

Correct 7 6 Ave no. chosen 5 3 1 0 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 31

Extraneous linked 0. Ave no. chosen 0.3 0. 0.1 0.0 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 3

Extraneous unlinked 0.06 0.05 Ave no. chosen 0.0 0.03 0.0 0.01 0.00 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 33

QTL in coupling.0 Ave no. chosen 1.5 1.0 0.5 0.0 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 3

QTL in repulsion.0 Ave no. chosen 1.5 1.0 0.5 0.0 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 35

Unlinked QTL 3.0.5 Ave no. chosen.0 1.5 1.0 0.5 0.0 ANOVA 3 5 7 9 11 fs, perm fs be fs/be mcmc CIM BIC δ 36

Epistasis γ = model γ m = no. main effects γ i = no. interactions Additive QTL case: LOD(γ) γ m T m With pairwise interactions: LOD(γ) γ m T m γ i T i Need a more complex penalty on interactions. 37

Models as graphs A C B D 38

Bayesian methods The likelihood function L γ (θ) = Pr(data θ, γ) γ = model, θ = parameters Frequentists L γ = max θ L γ (θ) Penalize model complexity Bayesians Prior Pr(γ), Pr(θ γ) Posterior Pr(γ data) = Pr(γ) Pr(θ γ) L γ (θ) dθ 39

Summary QTL mapping is a model selection problem (rather than hypothesis testing). Model selection = Select a class of models Select a method for fitting models Selecting a criterion for comparing models Select a method of searching model space Key issue: the comparison of models. Large-scale computer simulations are necessary for assessing the performance of procedures. 0

References Broman KW, Speed TP (00) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J R Statist Soc B 6:61 656, 737 775 Contains the simulation study described above. Zeng Z-B, Kao C-H, Basten CJ (1999) Estimating the genetic architecture of quantitative traits. Genet Res 7:79 89 Another paper on the model selection aspects of QTL mapping. Sillanpaa MJ, Corander J (00) Model choice in gene mapping: what and why. Trends Genet 18:301 307 A good review of model selection in QTL mapping. Miller AJ (00) Subset selection in regression, nd edition. Chapman & Hall, New York A good book on model selection. Yi N (00) A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167:967 975 A Bayeian approach for QTL mapping. Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D (005) Bayesian model selection for genome-wide epistatic QTL analysis. Genetics 170:1333 13 A Bayesian approach for identifying interacting QTL. 1