Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com 3 Backcross P P P F BC
Intercross Genetic map P P F F Location (cm) 8 F 3 5 8 9 3 5 8 9 X 5 Phenotype data Genotype data Sugiyama et al. Genomics :-, 5 male mice from the backcross (A B) B Blood pressure after two weeks drinking water with % NaCl 5 3 5 8 9 3 5 89 X Individuals 5 5 9 3 Blood pressure 5 5 Markers 8
Goals Identify quantitative trait loci (QTL) (and interactions among QTL) Interval estimates of QTL location Estimated QTL effects 9 Statistical structure QTL Markers Phenotype Covariates The missing data problem: Markers QTL The model selection problem: QTL, covariates phenotype ANOVA at marker loci Split mice into groups according to genotype at a marker. Do a t-test / ANOVA. Repeat for each marker. 9 Genotype at DMit blood pressure AB Genotype at DMit AB Interval mapping Lander & Botstein (989) Assume a single QTL model. Consider each position in the genome, one at a time, as the location of the putative QTL. Let q = / if the (unobserved) QTL genotype is /AB. (Or // if the QTL genotype is AA/AB/ in an intercross.) Assume y q N(µ q, σ) Calculate p q = Pr(q marker data). y marker data q p q φ(y µ q, σ)
LOD scores Permutation test LOD(λ) = log likelihood ratio comparing the hypothesis of a QTL at position λ versus that of no QTL { } Pr(y QTL at λ,ˆµqλ,ˆσ = log λ ) Pr(y no QTL,ˆµ,ˆσ) markers ˆµ qλ, ˆσ λ are the MLEs, assuming a single QTL at position λ. individuals genotype data phenotypes LOD scores maximum LOD score No QTL model: The phenotypes are iid N(µ, σ ). 3 5 LOD curves Permutation results 8 LOD score 3 5 8 9 3 5 8 9 3 5 Genome wide maximum LOD score
LOD curves Modeling multiple QTL 8 Reduce residual variation increased power Separate linked QTL Identify interactions among QTL (epistasis) LOD score 5% 3 5 8 9 3 5 8 9 Estimated effects -dim, -QTL scan Chr @ 8 cm Chr @ 3 cm Chr @ cm Chr 5 @ cm For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ H a : y = µ + β q + β q + ɛ blood pressure H : y = µ + β q + ɛ H : y = µ + ɛ 98 98 98 98 BA BA BA BA Genotype Genotype Genotype Genotype 9
-dim, -QTL scan LOD i and LOD f For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ H a : y = µ + β q + β q + ɛ H : y = µ + β q + ɛ H : y = µ + ɛ LOD f (s, t) = log { L f (s, t) / L } 9 8 5 3 9 8 5 3 3 5 3 5 8 9 3 5 89 -dim, -QTL scan LOD i and LOD f For each pair of positions, fit the following models: H f : y = µ + β q + β q + γq q + ɛ 5 H a : y = µ + β q + β q + ɛ H : y = µ + β q + ɛ H : y = µ + ɛ LOD i (s, t) = log { L f (s, t) / L a (s, t) } 3 5 5 3 5
LOD i and LOD fv Chr : LOD i and LOD av 5.5.5 3 8 Position (cm).5.5 5 8 Position (cm) 8 Estimated effects Hypothesis testing? x x 5 In the past, QTL mapping has been regarded as a task of hypothesis testing. Chr genotype BA Chr genotype BA Is this a QTL? Much of the focus has been on adjusting for test multiplicity. blood pressure 5 5 It is better to view the problem as one of model selection. What set of QTL are well supported? Is there evidence for QTL-QTL interactions? 95 BA 95 BA Model = a defined set of QTL and QTL-QTL interactions (and possibly covariates and QTL-covariate interactions). Chr genotype Chr 5 Genotype 9
Model selection What is special here? Class of models Additive models + pairwise interactions + higher-order interactions Regression trees Model fit Maximum likelihood Haley-Knott regression extended Haley-Knott Multiple imputation MCMC Model comparison Estimated prediction error AIC, BIC, penalized likelihood Bayes Model search Forward selection Backward elimination Stepwise selection Randomized algorithms Goal: identify the major players A continuum of ordinal-valued covariates (the genetic loci) Association among the covariates Loci on different chromosomes are independent Along chromosome, a very simple (and known) correlation structure 3 3 Target Exploratory methods Selection of a model includes two types of errors: Miss important terms (QTLs or interactions) Include extraneous terms Unlike in hypothesis testing, we can make both errors at the same time. Identify as many correct terms as possible, while controlling the rate of inclusion of extraneous terms. Condition on a large-effect QTL Reduce residual variation Conditional LOD score: { } Pr(data q, q ) LOD(q q ) = log Pr(data q ) Piece together the putative QTL from the d and d scans Omit loci that no longer look interesting (drop-one-at-a-time analysis) Study potential interactions among the identified loci Scan for additional loci (perhaps allowing interactions), conditional on these 3 33
Automation Experience Assistance to the masses Understanding performance Many phenotypes Controls rate of inclusion of extraneous terms Forward selection over-selects Forward selection followed by backward elimination works as well as MCMC Need to define performance criteria Need large-scale simulations Broman & Speed, JRSS B :-5, 3 Additive QTL Epistasis Simple situation: Dense markers Complete genotype data No epistasis y = µ + β j q j + γ jk q j q k + ɛ LOD δɛ (γ) = LOD(γ) T m γ m + T i γ i y = µ + β j q j + ɛ which β j? T m = as chosen previously LOD δ (γ) = LOD(γ) T γ T i =? vs QTL: LOD δ ( ) = LOD δ ({λ}) = LOD({λ}) T 39
Idea Models as graphs Imagine there are two additive QTL and consider a d, -QTL scan. T i = 95th percentile of the distribution of max LOD f (s, t) max LOD a (s, t) A C For the mouse genome: T m =.9 (BC) or 3.5 (F ) T H i =. (BC) or.8 (F ) B D Idea Results Imagine there is one QTL and consider a d, -QTL scan. T m + T i = 95th percentile of the distribution of max LOD f (s, t) max LOD (s).3. For the mouse genome: T m =.9 (BC) or 3.5 (F ) T H i =. (BC) or.8 (F ) T L i =.9 (BC) or.9 (F ).9 5. 5. T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5
Add an interaction? Add another QTL?. 5.8 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 58 Add another QTL? Add a pair of QTL?.8.5 5. 5. 5 3a 3b T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 5 T m =.9 T i H =. T i L =.9 T m + T i H = 5.3 T m + T i L = 3.88 59
To do Acknowledgments Improve search procedures Study performance (especially relative to other approaches) Measuring model uncertainty Measuring uncertainty in QTL location Ani Manichaikul Gary Churchill Śaunak Sen Terry Speed Brian Yandell Johns Hopkins University Jackson Laboratory University of California, San Francisco University of California, Berkeley University of Wisconsin, Madison Fumihiro Sugiyama Bev Paigen now at University of Tsukuba, Japan Jackson Laboratory Summary QTL mapping is a model selection problem The criterion for comparing models is most important We re focusing on a penalized likelihood method and are close to a practiceable solution