Mapping multiple QTL in experimental crosses

Similar documents
R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Mapping multiple QTL in experimental crosses

Multiple QTL mapping

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Statistical issues in QTL mapping in mice

Introduction to QTL mapping in model organisms

Gene mapping in model organisms

Introduction to QTL mapping in model organisms

Mapping QTL to a phylogenetic tree

QTL Mapping I: Overview and using Inbred Lines

Overview. Background

QTL model selection: key players

Use of hidden Markov models for QTL mapping

Causal Model Selection Hypothesis Tests in Systems Genetics

QTL model selection: key players

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Inferring Causal Phenotype Networks from Segregating Populat

Causal Graphical Models in Systems Genetics

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Locating multiple interacting quantitative trait. loci using rank-based model selection

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Binary trait mapping in experimental crosses with selective genotyping

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

An applied statistician does probability:

Inferring Genetic Architecture of Complex Biological Processes

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

How the mean changes depends on the other variable. Plots can show what s happening...

Model Selection for Multiple QTL

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

Causal Model Selection Hypothesis Tests. in Systems Genetics

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Multiple interval mapping for ordinal traits

TECHNICAL REPORT NO December 1, 2008

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Variance Component Models for Quantitative Traits. Biostatistics 666

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Bayesian Model Selection for Multiple QTL. Brian S. Yandell

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

1. what is the goal of QTL study?

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Methods for QTL analysis

arxiv: v4 [stat.me] 2 May 2015

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 11: Multiple trait models for QTL analysis

A new simple method for improving QTL mapping under selective genotyping

Users Manual of QTL IciMapping v3.1

Quantitative Genetics. February 16, 2010

Lecture 6. QTL Mapping

Quantitative genetics

BTRY 4830/6830: Quantitative Genomics and Genetics

Data Mining Stat 588

MOST complex traits are influenced by interacting

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Causal Network Models for Correlated Quantitative Traits. outline

Lecture 8 Genomic Selection

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

On the mapping of quantitative trait loci at marker and non-marker locations

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Single Marker Analysis and Interval Mapping

Genotype Imputation. Biostatistics 666

Hotspots and Causal Inference For Yeast Data

what is Bayes theorem? posterior = likelihood * prior / C

Mixture models for heterogeneity in ranked data

Regression, Ridge Regression, Lasso

NIH Public Access Author Manuscript Ann Appl Stat. Author manuscript; available in PMC 2011 January 7.

Association studies and regression

MANY important problems in biology and medicine

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Dimensionality Reduction Techniques (DRT)

Linear Regression (1/1/17)

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

THE basic principle of using genetic markers to study genetic marker map throughout the genome, IM can

Calculation of IBD probabilities

Calculation of IBD probabilities

GWAS. Genotype-Phenotype Association CMSC858P Spring 2012 Hector Corrada Bravo University of Maryland. logistic regression. logistic regression

The Admixture Model in Linkage Analysis

The genomes of recombinant inbred lines

Affected Sibling Pairs. Biostatistics 666

QUANTITATIVE trait analysis plays an important observed. These assumptions are likely to be false when

STUDIES of genotype phenotype association are central to. Multivariate Analysis of Genotype Phenotype Association HIGHLIGHTED ARTICLE INVESTIGATION

Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping

MANY complex human diseases and traits of bio- tive corrections for multiple testing. Non-Bayesian model

Transcription:

Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com 3 Backcross P P P F BC

Intercross Genetic map P P F F Location (cm) 8 F 3 5 8 9 3 5 8 9 X 5 Phenotype data Genotype data Sugiyama et al. Genomics :-, 5 male mice from the backcross (A B) B Blood pressure after two weeks drinking water with % NaCl 5 3 5 8 9 3 5 89 X Individuals 5 5 9 3 Blood pressure 5 5 Markers 8

Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 9 Statistical structure QTL Markers Phenotype Covariates The missing data problem: Markers QTL The model selection problem: QTL, covariates phenotype ANOVA at marker loci Split mice into groups according to genotype at a marker. Do a t-test / ANOVA. Repeat for each marker. 9 Genotype at DMit blood pressure AB Genotype at DMit AB Interval mapping Lander & Botstein (989) Assume a single QTL model. Consider each position in the genome, one at a time, as the location of the putative QTL. Let q = / if the (unobserved) QTL genotype is /AB. (Or // if the QTL genotype is AA/AB/ in an intercross.) Assume y q N(µ q, σ) Calculate p q = Pr(q marker data). y marker data q p q φ(y µ q, σ)

LOD scores Permutation test LOD(λ) = log likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL { } Pr(y QTL at λ,ˆµqλ,ˆσ = log λ ) Pr(y no QTL,ˆµ,ˆσ) markers ˆµ qλ, ˆσ λ are the MLEs, assuming a single QTL at position λ. individuals genotype data phenotypes LOD scores maximum LOD score No QTL model: The phenotypes are iid N(µ, σ ). 3 5 LOD curves Permutation results 8 LOD score 3 5 8 9 3 5 8 9 3 5 Genome wide maximum LOD score

LOD curves Modeling multiple QTL 8 Reduce residual variation increased power Separate linked QTL Identify interactions among QTL (epistasis) LOD score 5% 3 5 8 9 3 5 8 9 Estimated effects -dim, -QTL scan Chr @ 8 cm Chr @ 3 cm Chr @ cm Chr 5 @ cm For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ H a : y = µ + β q + β q + ɛ blood pressure H : y = µ + β q + ɛ H : y = µ + ɛ 98 98 98 98 BA BA BA BA Genotype Genotype Genotype Genotype 9

-dim, -QTL scan LOD i and LOD f For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ H a : y = µ + β q + β q + ɛ H : y = µ + β q + ɛ H : y = µ + ɛ LOD f (s, t) = log { L f (s, t) / L } 9 8 5 3 9 8 5 3 3 5 3 5 8 9 3 5 89 -dim, -QTL scan LOD i and LOD f For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ 5 H a : y = µ + β q + β q + ɛ H : y = µ + β q + ɛ H : y = µ + ɛ LOD i (s, t) = log { L f (s, t) / L a (s, t) } 3 5 5 3 5

LOD i and LOD fv Chr : LOD i and LOD av 5.5.5 3 8 Position (cm).5.5 5 8 Position (cm) 8 Estimated effects Hypothesis testing? x x 5 In the past, QTL mapping has been regarded as a task of hypothesis testing. Chr genotype BA Chr genotype BA Is this a QTL? Much of the focus has been on adjusting for test multiplicity. blood pressure 5 5 It is better to view the problem as one of model selection. What set of QTL are well supported? Is there evidence for QTL-QTL interactions? 95 BA 95 BA Model = a defined set of QTL and QTL-QTL interactions (and possibly covariates and QTL-covariate interactions). Chr genotype Chr 5 Genotype 9

Model selection What is special here? Class of models Additive models + pairwise interactions + higher-order interactions Regression trees Model fit Maximum likelihood Haley-Knott regression extended Haley-Knott Multiple imputation MCMC Model comparison Estimated prediction error AIC, BIC, penalized likelihood Bayes Model search Forward selection Backward elimination Stepwise selection Randomized algorithms Goal: identify the major players A continuum of ordinal-valued covariates (the genetic loci) Association among the covariates Loci on different chromosomes are independent Along chromosome, a very simple (and known) correlation structure 3 3 Target Exploratory methods Selection of a model includes two types of errors: Miss important terms (QTLs or interactions) Include extraneous terms Unlike in hypothesis testing, we can make both errors at the same time. Identify as many correct terms as possible, while controlling the rate of inclusion of extraneous terms. Condition on a large-effect QTL Reduce residual variation Conditional LOD score: { } Pr(data q, q ) LOD(q q ) = log Pr(data q ) Piece together the putative QTL from the d and d scans Omit loci that no longer look interesting (drop-one-at-a-time analysis) Study potential interactions among the identified loci Scan for additional loci (perhaps allowing interactions), conditional on these 3 33

Automation Experience Assistance to the masses Understanding performance Many phenotypes Controls rate of inclusion of extraneous terms Forward selection over-selects Forward selection followed by backward elimination works as well as MCMC Need to define performance criteria Need large-scale simulations Broman & Speed, JRSS B :-5, 3 Additive QTL Epistasis Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + γ jk q j q k + ɛ LOD δɛ (γ) = LOD(γ) T m γ m + T i γ i y = µ + β j q j + ɛ which β j? T m = as chosen previously LOD δ (γ) = LOD(γ) T γ T i =? vs QTL: LOD δ ( ) = LOD δ ({λ}) = LOD({λ}) T 39

Idea Models as graphs Imagine there are two additive QTL and consider a d, -QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) A C For the mouse genome: T m =.9 (BC) or 3.5 (F ) T H i =. (BC) or.8 (F ) B D Idea Results Imagine there is one QTL and consider a d, -QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD (s).3. For the mouse genome: T m =.9 (BC) or 3.5 (F ) T H i =. (BC) or.8 (F ) T L i =.9 (BC) or.9 (F ).9 5. 5. T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5

Add an interaction? Add another QTL?. 5.8 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 58 Add another QTL? Add a pair of QTL?.8.5 5. 5. 5 3a 3b T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 59

To do Acknowledgments Improve search procedures Study performance (especially relative to other approaches) Measuring model uncertainty Measuring uncertainty in QTL location Ani Manichaikul Gary Churchill Śaunak Sen Terry Speed Brian Yandell Johns Hopkins University Jackson Laboratory University of California, San Francisco University of California, Berkeley University of Wisconsin, Madison Fumihiro Sugiyama Bev Paigen now at University of Tsukuba, Japan Jackson Laboratory Summary QTL mapping is a model selection problem The criterion for comparing models is most important We re focusing on a penalized likelihood method and are close to a practiceable solution