R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Similar documents
Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses

Multiple QTL mapping

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Introduction to QTL mapping in model organisms

Statistical issues in QTL mapping in mice

Introduction to QTL mapping in model organisms

Gene mapping in model organisms

Mapping QTL to a phylogenetic tree

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

QTL Mapping I: Overview and using Inbred Lines

Overview. Background

Causal Model Selection Hypothesis Tests in Systems Genetics

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Use of hidden Markov models for QTL mapping

Binary trait mapping in experimental crosses with selective genotyping

QTL model selection: key players

Model Selection for Multiple QTL

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

An applied statistician does probability:

Causal Graphical Models in Systems Genetics

How the mean changes depends on the other variable. Plots can show what s happening...

Variance Component Models for Quantitative Traits. Biostatistics 666

QTL model selection: key players

How to display data badly

Locating multiple interacting quantitative trait. loci using rank-based model selection

Inferring Causal Phenotype Networks from Segregating Populat

Inferring Genetic Architecture of Complex Biological Processes

Prediction of the Confidence Interval of Quantitative Trait Loci Location

Lecture 11: Multiple trait models for QTL analysis

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

Multiple interval mapping for ordinal traits

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Statistics 262: Intermediate Biostatistics Model selection

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

TECHNICAL REPORT NO December 1, 2008

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Quantitative Genetics. February 16, 2010

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

MOST complex traits are influenced by interacting

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

Users Manual of QTL IciMapping v3.1

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Causal Model Selection Hypothesis Tests. in Systems Genetics

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Case-Control Association Testing. Case-Control Association Testing

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Linear Regression (1/1/17)

Affected Sibling Pairs. Biostatistics 666

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

A partial regression coefficient analysis framework to infer candidate genes potentially causal to traits in recombinant inbred lines

On the mapping of quantitative trait loci at marker and non-marker locations

Multiple linear regression S6

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Lecture 9. QTL Mapping 2: Outbred Populations

Genotype Imputation. Biostatistics 666

Tail probability of linear combinations of chi-square variables and its application to influence analysis in QTL detection

Hotspots and Causal Inference For Yeast Data

(Genome-wide) association analysis

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Lecture 8 Genomic Selection

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

The Admixture Model in Linkage Analysis

arxiv: v4 [stat.me] 2 May 2015

Methods for QTL analysis

Dimensionality Reduction Techniques (DRT)

Causal Network Models for Correlated Quantitative Traits. outline

Calculation of IBD probabilities

Niche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

CS6220: DATA MINING TECHNIQUES

Chapter 3. Linear Models for Regression

Calculation of IBD probabilities

Model Selection. Frank Wood. December 10, 2009

I Have the Power in QTL linkage: single and multilocus analysis

CS6220: DATA MINING TECHNIQUES

The genomes of recombinant inbred lines

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Sparse Linear Models (10/7/13)

Chapter 1 Statistical Inference

Linear Regression In God we trust, all others bring data. William Edwards Deming

Confirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models

25 : Graphical induced structured input/output models

A new simple method for improving QTL mapping under selective genotyping

Transcription:

R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman

Example Sugiyama et al. Genomics 71:70-77, 2001 250 male mice from the backcross (A B) B Blood pressure after two weeks drinking water with 1% NaCl 90 100 110 120 130 Blood pressure 2

Genetic map 0 20 40 Location (cm) 60 80 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Chromosome 3

Genotype data 250 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X 200 Individuals 150 100 50 50 100 150 Markers 4

Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 5

LOD curves 8 6 LOD score 4 5% 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 6

Estimated effects Chr 1 @ 48 cm Chr 4 @ 30 cm Chr 6 @ 24 cm Chr 15 @ 20 cm 106 106 106 106 104 104 104 104 blood pressure 102 102 102 102 100 100 100 100 98 98 98 98 BB BA BB BA BB BA BB BA Genotype Genotype Genotype Genotype 7

Modeling multiple QTL Reduce residual variation increased power Separate linked QTL Identify interactions among QTL (epistasis) 8

2-dim, 2-QTL scan For all pairs of positions, fit the following models: H f : y = µ + β 1 q 1 + β 2 q 2 + γq 1 q 2 + ɛ H a : y = µ + β 1 q 1 + β 2 q 2 + +ɛ H 1 : y = µ + β 1 q 1 + ɛ H 0 : y = µ + ɛ log 10 likelihoods: l f (s, t) l a (s, t) l 1 (s) l 0 9

2-dim, 2-QTL scan LOD scores: LOD f (s, t) = l f (s, t) l 0 LOD a (s, t) = l a (s, t) l 0 LOD i (s, t) = l f (s, t) l a (s, t) LOD 1 (s) = l 1 (s) l 0 10

Results: LOD i and LOD f 19 18 17 16 15 14 13 12 11 3 10 Chromosome 10 9 8 7 6 5 4 3 2 1 5 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16171819 Chromosome 11

Results: LOD i and LOD f 15 4 7 3 10 Chromosome 6 4 2 5 1 1 0 0 1 4 6 7 15 Chromosome 12

Summaries Consider each pair of chromosomes, (j, k), and let c(s) denote the chromosome for position s. M f (j, k) = M a (j, k) = M 1 (j, k) = max LOD f(s, t) c(s)=j,c(t)=k max LOD a(s, t) c(s)=j,c(t)=k max LOD 1(s) c(s)=j or k M i (j, k) = M f (j, k) M a (j, k) M fv1 (j, k) = M f (j, k) M 1 (j, k) M av1 (j, k) = M a (j, k) M 1 (j, k) 13

Results: LOD i and LOD fv1 15 4 6 7 3 Chromosome 6 4 2 4 2 1 1 0 0 1 4 6 7 15 Chromosome 14

[R/qtl] 15

Thresholds A pair of chromosomes (j, k) is considered interesting if: M f (j, k) > T f and { M fv1 (j, k) > T fv1 or M i (j, k) > T i } or M a (j, k) > T a and M av1 (j, k) > T av1 where the thresholds (T f, T fv1, T i, T a, T av1 ) are determined by a permutation test with a 2d scan 16

2d scan summary pos1f pos2f lod.full lod.fv1 lod.int c1:c4 71.3 30.0 14.36 6.78 0.27 c6:c15 55.0 20.5 6.91 4.95 2.92 c1:c1 39.3 78.3 5.10 1.58 0.09 pos1a pos2a lod.add lod.av1 c1:c4 68.3 30.0 14.09 6.50 c6:c15 24.0 22.5 3.99 2.03 c1:c1 48.3 79.3 5.02 1.50 17

Estimated effects 1 x 4 6 x 15 110 Chr 1 genotype 110 Chr 6 genotype BB BA BB BA blood pressure 105 100 105 100 95 95 BB BA BB BA Chr 4 genotype Chr 15 Genotype 18

Chr 1: LOD i and LOD av1 100 1.5 1.5 80 Position (cm) 60 1 1 40 0.5 0.5 20 20 40 60 80 100 Position (cm) 0 0 19

[R/qtl] 20

Hypothesis testing? In the past, QTL mapping has been regarded as a task of hypothesis testing. Is this a QTL? Much of the focus has been on adjusting for test multiplicity. It is better to view the problem as one of model selection. What set of QTL are well supported? Is there evidence for QTL-QTL interactions? Model = a defined set of QTL and QTL-QTL interactions (and possibly covariates and QTL-covariate interactions). 21

Model selection Class of models Additive models + pairwise interactions + higher-order interactions Regression trees Model comparison Estimated prediction error AIC, BIC, penalized likelihood Bayes Model fit Maximum likelihood Haley-Knott regression extended Haley-Knott Multiple imputation MCMC Model search Forward selection Backward elimination Stepwise selection Randomized algorithms 22

Target Selection of a model includes two types of errors: Miss important terms (QTLs or interactions) Include extraneous terms Unlike in hypothesis testing, we can make both errors at the same time. Identify as many correct terms as possible, while controlling the rate of inclusion of extraneous terms. 23

What is special here? Goal: identify the major players A continuum of ordinal-valued covariates (the genetic loci) Association among the covariates Loci on different chromosomes are independent Along chromosome, a very simple (and known) correlation structure 24

Exploratory methods Condition on a large-effect QTL Reduce residual variation Conditional LOD score: { } Pr(data q1, q 2 ) LOD(q 2 q 1 ) = log 10 Pr(data q 1 ) Piece together the putative QTL from the 1d and 2d scans Omit loci that no longer look interesting (drop-one-at-a-time analysis) Study potential interactions among the identified loci Scan for additional loci (perhaps allowing interactions), conditional on these 25

Controlling for chr 4 8 Interval mapping Control for chr 4 6 LOD score 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 26

Drop-one-QTL table df LOD %var 1@68.3 1 6.30 11.0 4@30.0 1 12.21 20.1 6@61.0 2 7.93 13.6 15@17.5 2 7.14 12.3 6@61.0 : 15@17.5 1 5.68 9.9 27

[R/qtl] 28

Automation Assistance to non-specialists Understanding performance Many phenotypes 29

Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ 30

Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ 0 vs 1 QTL: plod( ) = 0 plod({λ}) = LOD(λ) T 30

Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ For the mouse genome: T = 2.69 (BC) or 3.52 (F 2 ) 30

Experience Controls rate of inclusion of extraneous terms Forward selection over-selects Forward selection followed by backward elimination works as well as MCMC Need to define performance criteria Need large-scale simulations Broman & Speed, JRSS B 64:641-656, 2002 31

[R/qtl] 32

Epistasis y = µ + β j q j + γ jk q j q k + ɛ plod(γ) = LOD(γ) T m γ m T i γ i T m = as chosen previously T i =? 33

Idea 1 Imagine there are two additive QTL and consider a 2d, 2-QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) 34

Idea 1 Imagine there are two additive QTL and consider a 2d, 2-QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) For the mouse genome: T m = 2.69 (BC) or 3.52 (F 2 ) T H i = 2.62 (BC) or 4.28 (F 2 ) 34

Idea 2 Imagine there is one QTL and consider a 2d, 2-QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD 1 (s) 35

Idea 2 Imagine there is one QTL and consider a 2d, 2-QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD 1 (s) For the mouse genome: T m = 2.69 (BC) or 3.52 (F 2 ) T H i = 2.62 (BC) or 4.28 (F 2 ) T L i = 1.19 (BC) or 2.69 (F 2 ) 35

Models as graphs A C B D 36

Results 1 4 LOD = 23.1 6 15 37

Results 6.3 1 4 LOD = 23.1 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 37

Results 6.3 1 4 12.2 7.9 6 15 7.1 5.7 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 37

Profile LOD curves 4@30.0 12 10 Profile LOD score 8 6 1@68.3 6@61.0 15@17.5 4 2 0 1 4 6 15 Chromosome 38

Drop-one-QTL table df LOD %var 1@68.3 1 6.30 11.0 4@30.0 1 12.21 20.1 6@61.0 2 7.93 13.6 15@17.5 2 7.14 12.3 6@61.0 : 15@17.5 1 5.68 9.9 39

Add an interaction? 1 4 6 15 0.6 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40

Add an interaction? 1 4 6 15 0.1 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40

Add an interaction? 1 4 6 15 0.4 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40

Add an interaction? 1 4 6 15 0.2 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40

Add an interaction? 1 4 6 15 0.0 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40

Add another QTL? 1.87 1b 1 4 1.52 2 6 15 1.62 5 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41

Add another QTL? 1 4 6 15 2.66 7 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41

Add another QTL? 1 4 2.82 4b 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41

Add a pair of QTL? 1 4 3a 4.60 3b 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 42

Summary QTL mapping is a model selection problem The criterion for comparing models is most important We re focusing on a penalized likelihood method, with penalties derived from permutation tests with 1d and 2d scans Manichaikul et al., Genetics 181:1077 1086, 2009 43