R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman
Example Sugiyama et al. Genomics 71:70-77, 2001 250 male mice from the backcross (A B) B Blood pressure after two weeks drinking water with 1% NaCl 90 100 110 120 130 Blood pressure 2
Genetic map 0 20 40 Location (cm) 60 80 100 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X Chromosome 3
Genotype data 250 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 X 200 Individuals 150 100 50 50 100 150 Markers 4
Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 5
LOD curves 8 6 LOD score 4 5% 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 6
Estimated effects Chr 1 @ 48 cm Chr 4 @ 30 cm Chr 6 @ 24 cm Chr 15 @ 20 cm 106 106 106 106 104 104 104 104 blood pressure 102 102 102 102 100 100 100 100 98 98 98 98 BB BA BB BA BB BA BB BA Genotype Genotype Genotype Genotype 7
Modeling multiple QTL Reduce residual variation increased power Separate linked QTL Identify interactions among QTL (epistasis) 8
2-dim, 2-QTL scan For all pairs of positions, fit the following models: H f : y = µ + β 1 q 1 + β 2 q 2 + γq 1 q 2 + ɛ H a : y = µ + β 1 q 1 + β 2 q 2 + +ɛ H 1 : y = µ + β 1 q 1 + ɛ H 0 : y = µ + ɛ log 10 likelihoods: l f (s, t) l a (s, t) l 1 (s) l 0 9
2-dim, 2-QTL scan LOD scores: LOD f (s, t) = l f (s, t) l 0 LOD a (s, t) = l a (s, t) l 0 LOD i (s, t) = l f (s, t) l a (s, t) LOD 1 (s) = l 1 (s) l 0 10
Results: LOD i and LOD f 19 18 17 16 15 14 13 12 11 3 10 Chromosome 10 9 8 7 6 5 4 3 2 1 5 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16171819 Chromosome 11
Results: LOD i and LOD f 15 4 7 3 10 Chromosome 6 4 2 5 1 1 0 0 1 4 6 7 15 Chromosome 12
Summaries Consider each pair of chromosomes, (j, k), and let c(s) denote the chromosome for position s. M f (j, k) = M a (j, k) = M 1 (j, k) = max LOD f(s, t) c(s)=j,c(t)=k max LOD a(s, t) c(s)=j,c(t)=k max LOD 1(s) c(s)=j or k M i (j, k) = M f (j, k) M a (j, k) M fv1 (j, k) = M f (j, k) M 1 (j, k) M av1 (j, k) = M a (j, k) M 1 (j, k) 13
Results: LOD i and LOD fv1 15 4 6 7 3 Chromosome 6 4 2 4 2 1 1 0 0 1 4 6 7 15 Chromosome 14
[R/qtl] 15
Thresholds A pair of chromosomes (j, k) is considered interesting if: M f (j, k) > T f and { M fv1 (j, k) > T fv1 or M i (j, k) > T i } or M a (j, k) > T a and M av1 (j, k) > T av1 where the thresholds (T f, T fv1, T i, T a, T av1 ) are determined by a permutation test with a 2d scan 16
2d scan summary pos1f pos2f lod.full lod.fv1 lod.int c1:c4 71.3 30.0 14.36 6.78 0.27 c6:c15 55.0 20.5 6.91 4.95 2.92 c1:c1 39.3 78.3 5.10 1.58 0.09 pos1a pos2a lod.add lod.av1 c1:c4 68.3 30.0 14.09 6.50 c6:c15 24.0 22.5 3.99 2.03 c1:c1 48.3 79.3 5.02 1.50 17
Estimated effects 1 x 4 6 x 15 110 Chr 1 genotype 110 Chr 6 genotype BB BA BB BA blood pressure 105 100 105 100 95 95 BB BA BB BA Chr 4 genotype Chr 15 Genotype 18
Chr 1: LOD i and LOD av1 100 1.5 1.5 80 Position (cm) 60 1 1 40 0.5 0.5 20 20 40 60 80 100 Position (cm) 0 0 19
[R/qtl] 20
Hypothesis testing? In the past, QTL mapping has been regarded as a task of hypothesis testing. Is this a QTL? Much of the focus has been on adjusting for test multiplicity. It is better to view the problem as one of model selection. What set of QTL are well supported? Is there evidence for QTL-QTL interactions? Model = a defined set of QTL and QTL-QTL interactions (and possibly covariates and QTL-covariate interactions). 21
Model selection Class of models Additive models + pairwise interactions + higher-order interactions Regression trees Model comparison Estimated prediction error AIC, BIC, penalized likelihood Bayes Model fit Maximum likelihood Haley-Knott regression extended Haley-Knott Multiple imputation MCMC Model search Forward selection Backward elimination Stepwise selection Randomized algorithms 22
Target Selection of a model includes two types of errors: Miss important terms (QTLs or interactions) Include extraneous terms Unlike in hypothesis testing, we can make both errors at the same time. Identify as many correct terms as possible, while controlling the rate of inclusion of extraneous terms. 23
What is special here? Goal: identify the major players A continuum of ordinal-valued covariates (the genetic loci) Association among the covariates Loci on different chromosomes are independent Along chromosome, a very simple (and known) correlation structure 24
Exploratory methods Condition on a large-effect QTL Reduce residual variation Conditional LOD score: { } Pr(data q1, q 2 ) LOD(q 2 q 1 ) = log 10 Pr(data q 1 ) Piece together the putative QTL from the 1d and 2d scans Omit loci that no longer look interesting (drop-one-at-a-time analysis) Study potential interactions among the identified loci Scan for additional loci (perhaps allowing interactions), conditional on these 25
Controlling for chr 4 8 Interval mapping Control for chr 4 6 LOD score 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chromosome 26
Drop-one-QTL table df LOD %var 1@68.3 1 6.30 11.0 4@30.0 1 12.21 20.1 6@61.0 2 7.93 13.6 15@17.5 2 7.14 12.3 6@61.0 : 15@17.5 1 5.68 9.9 27
[R/qtl] 28
Automation Assistance to non-specialists Understanding performance Many phenotypes 29
Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ 30
Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ 0 vs 1 QTL: plod( ) = 0 plod({λ}) = LOD(λ) T 30
Additive QTL Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + ɛ which β j 0? plod(γ) = LOD(γ) T γ For the mouse genome: T = 2.69 (BC) or 3.52 (F 2 ) 30
Experience Controls rate of inclusion of extraneous terms Forward selection over-selects Forward selection followed by backward elimination works as well as MCMC Need to define performance criteria Need large-scale simulations Broman & Speed, JRSS B 64:641-656, 2002 31
[R/qtl] 32
Epistasis y = µ + β j q j + γ jk q j q k + ɛ plod(γ) = LOD(γ) T m γ m T i γ i T m = as chosen previously T i =? 33
Idea 1 Imagine there are two additive QTL and consider a 2d, 2-QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) 34
Idea 1 Imagine there are two additive QTL and consider a 2d, 2-QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) For the mouse genome: T m = 2.69 (BC) or 3.52 (F 2 ) T H i = 2.62 (BC) or 4.28 (F 2 ) 34
Idea 2 Imagine there is one QTL and consider a 2d, 2-QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD 1 (s) 35
Idea 2 Imagine there is one QTL and consider a 2d, 2-QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD 1 (s) For the mouse genome: T m = 2.69 (BC) or 3.52 (F 2 ) T H i = 2.62 (BC) or 4.28 (F 2 ) T L i = 1.19 (BC) or 2.69 (F 2 ) 35
Models as graphs A C B D 36
Results 1 4 LOD = 23.1 6 15 37
Results 6.3 1 4 LOD = 23.1 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 37
Results 6.3 1 4 12.2 7.9 6 15 7.1 5.7 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 37
Profile LOD curves 4@30.0 12 10 Profile LOD score 8 6 1@68.3 6@61.0 15@17.5 4 2 0 1 4 6 15 Chromosome 38
Drop-one-QTL table df LOD %var 1@68.3 1 6.30 11.0 4@30.0 1 12.21 20.1 6@61.0 2 7.93 13.6 15@17.5 2 7.14 12.3 6@61.0 : 15@17.5 1 5.68 9.9 39
Add an interaction? 1 4 6 15 0.6 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40
Add an interaction? 1 4 6 15 0.1 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40
Add an interaction? 1 4 6 15 0.4 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40
Add an interaction? 1 4 6 15 0.2 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40
Add an interaction? 1 4 6 15 0.0 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 40
Add another QTL? 1.87 1b 1 4 1.52 2 6 15 1.62 5 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41
Add another QTL? 1 4 6 15 2.66 7 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41
Add another QTL? 1 4 2.82 4b 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 41
Add a pair of QTL? 1 4 3a 4.60 3b 6 15 T m = 2.69 T i H = 2.62 T i L = 1.19 T m + T i H = 5.31 T m + T i L = 3.88 2T m = 5.38 42
Summary QTL mapping is a model selection problem The criterion for comparing models is most important We re focusing on a penalized likelihood method, with penalties derived from permutation tests with 1d and 2d scans Manichaikul et al., Genetics 181:1077 1086, 2009 43