Gene mapping in model organisms

Similar documents
Statistical issues in QTL mapping in mice

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Multiple QTL mapping

The genomes of recombinant inbred lines

Mapping multiple QTL in experimental crosses

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Mapping multiple QTL in experimental crosses

Use of hidden Markov models for QTL mapping

Mapping QTL to a phylogenetic tree

Binary trait mapping in experimental crosses with selective genotyping

Overview. Background

QTL Mapping I: Overview and using Inbred Lines

Causal Graphical Models in Systems Genetics

Inferring Genetic Architecture of Complex Biological Processes

Prediction of the Confidence Interval of Quantitative Trait Loci Location

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

An applied statistician does probability:

Eiji Yamamoto 1,2, Hiroyoshi Iwata 3, Takanari Tanabata 4, Ritsuko Mizobuchi 1, Jun-ichi Yonemaru 1,ToshioYamamoto 1* and Masahiro Yano 5,6

Affected Sibling Pairs. Biostatistics 666

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Anumber of statistical methods are available for map- 1995) in some standard designs. For backcross populations,

QTL model selection: key players

Methods for QTL analysis

Lecture 11: Multiple trait models for QTL analysis

Genotype Imputation. Biostatistics 666

Principles of QTL Mapping. M.Imtiaz

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Lecture 9. QTL Mapping 2: Outbred Populations

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Calculation of IBD probabilities

56th Annual EAAP Meeting Uppsala, 2005

Calculation of IBD probabilities

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

The Admixture Model in Linkage Analysis

Locating multiple interacting quantitative trait. loci using rank-based model selection

Lecture 6. QTL Mapping

Solutions to Problem Set 4

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Causal Model Selection Hypothesis Tests in Systems Genetics

Mixture Cure Model with an Application to Interval Mapping of Quantitative Trait Loci

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Model Selection for Multiple QTL

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Selecting explanatory variables with the modified version of Bayesian Information Criterion

The Lander-Green Algorithm. Biostatistics 666 Lecture 22

On the mapping of quantitative trait loci at marker and non-marker locations

Linkage analysis and QTL mapping in autotetraploid species. Christine Hackett Biomathematics and Statistics Scotland Dundee DD2 5DA

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Multiple interval mapping for ordinal traits

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

QTL model selection: key players

The Genomes of Recombinant Inbred Lines: The Gory Details

Outline. P o purple % x white & white % x purple& F 1 all purple all purple. F purple, 224 white 781 purple, 263 white

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Evolutionary Genetics Midterm 2008

Users Guide for New BC s F t Tools for R/qtl

A new simple method for improving QTL mapping under selective genotyping

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

BS 50 Genetics and Genomics Week of Oct 3 Additional Practice Problems for Section. A/a ; B/B ; d/d X A/a ; b/b ; D/d

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Quantitative Genetics. February 16, 2010

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia

Variance Component Models for Quantitative Traits. Biostatistics 666

On Computation of P-values in Parametric Linkage Analysis

DISSECTING the genetic architecture of complex traits is

Proportional Variance Explained by QLT and Statistical Power. Proportional Variance Explained by QTL and Statistical Power

Users Manual of QTL IciMapping v3.1

Review Article Statistical Methods for Mapping Multiple QTL

Lecture 5: BLUP (Best Linear Unbiased Predictors) of genetic values. Bruce Walsh lecture notes Tucson Winter Institute 9-11 Jan 2013

UNIT 8 BIOLOGY: Meiosis and Heredity Page 148

Parts 2. Modeling chromosome segregation

Causal Model Selection Hypothesis Tests. in Systems Genetics

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

QUANTITATIVE trait analysis has many applica- several crosses, since the correlations may not be the

Computation of Multilocus Prior Probability of Autozygosity for Complex Inbred Pedigrees

Biol. 303 EXAM I 9/22/08 Name

Hierarchical Generalized Linear Models for Multiple QTL Mapping

ANIMAL models and their corresponding genomes parameter estimation, because the sample size for the

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Linear Regression (1/1/17)

QTL mapping of Poisson traits: a simulation study

Solutions to Even-Numbered Exercises to accompany An Introduction to Population Genetics: Theory and Applications Rasmus Nielsen Montgomery Slatkin

Parts 2. Modeling chromosome segregation

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Quantitative genetics

Chapter 13 Meiosis and Sexual Reproduction

I Have the Power in QTL linkage: single and multilocus analysis

THE data in the QTL mapping study are usually composed. A New Simple Method for Improving QTL Mapping Under Selective Genotyping INVESTIGATION

Chapter 10 Beyond Mendel s Laws of Inheritance

arxiv: v4 [stat.me] 2 May 2015

Transcription:

Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2 1

Inbred mice 3 Advantages of the mouse Small and cheap Inbred lines Large, controlled crosses Experimental interventions Knock-outs and knock-ins 4 2

The mouse as a model Same genes? The genes involved in a phenotype in the mouse may also be involved in similar phenotypes in the human. Similar complexity? The complexity of the etiology underlying a mouse phenotype provides some indication of the complexity of similar human phenotypes. Transfer of statistical methods. The statistical methods developed for gene mapping in the mouse serve as a basis for similar methods applicable in direct human studies. 5 The intercross 6 3

The data Phenotypes, y i Genotypes, x ij = AA/AB/BB, at genetic markers A genetic map, giving the locations of the markers. 7 Phenotypes 133 females (NOD B6) (NOD B6) 8 4

NOD 9 C57BL/6 10 5

Agouti coat 11 Genetic map 12 6

Genotype data 13 Goals Identify genomic regions (QTLs) that contribute to variation in the trait. Obtain interval estimates of the QTL locations. Estimate the effects of the QTLs. 14 7

Statistical structure Missing data: markers QTL Model selection: genotypes phenotype 15 Models: recombination No crossover interference Locations of breakpoints according to a Poisson process. Genotypes along chromosome follow a Markov chain. Clearly wrong, but super convenient. 16 8

Models: gen phe Phenotype = y, whole-genome genotype = g Imagine that p sites are all that matter. E(y g) = µ(g 1,,g p ) SD(y g) = σ(g 1,,g p ) Simplifying assumptions: SD(y g) = σ, independent of g y g ~ normal( µ(g 1,,g p ), σ ) µ(g 1,,g p ) = µ + α j 1{g j = AB} + β j 1{g j = BB} 17 Before you do anything Check data quality Genetic markers on the correct chromosomes Markers in the correct order Identify and resolve likely errors in the genotype data 18 9

The simplest method Marker regression Consider a single marker Split mice into groups according to their genotype at a marker Do an ANOVA (or t-test) Repeat for each marker 19 Marker regression Advantages + Simple + Easily incorporates covariates + Easily extended to more complex models Disadvantages Must exclude individuals with missing genotypes data Imperfect information about QTL location Suffers in low density scans Only considers one QTL at a time 20 10

Interval mapping Lander and Botstein 1989 Imagine that there is a single QTL, at position z. Let q i = genotype of mouse i at the QTL, and assume y i q i ~ normal( µ(q i ), σ ) We won t know q i, but we can calculate (by an HMM) p ig = Pr(q i = g marker data) y i, given the marker data, follows a mixture of normal distributions with known mixing proportions (the p ig ). Use an EM algorithm to get MLEs of θ = (µ AA, µ AB, µ BB, σ). Measure the evidence for a QTL via the LOD score, which is the log 10 likelihood ratio comparing the hypothesis of a single QTL at position z to the hypothesis of no QTL anywhere. 21 Interval mapping Advantages + Takes proper account of missing data + Allows examination of positions between markers + Gives improved estimates of QTL effects + Provides pretty graphs Disadvantages Increased computation time Requires specialized software Difficult to generalize Only considers one QTL at a time 22 11

LOD curves 23 LOD thresholds To account for the genome-wide search, compare the observed LOD scores to the distribution of the maximum LOD score, genome-wide, that would be obtained if there were no QTL anywhere. The 95th percentile of this distribution is used as a significance threshold. Such a threshold may be estimated via permutations (Churchill and Doerge 1994). 24 12

Permutation test Shuffle the phenotypes relative to the genotypes. Calculate M* = max LOD*, with the shuffled data. Repeat many times. LOD threshold = 95th percentile of M* P-value = Pr(M* M) 25 Permutation distribution 26 13

Chr 9 and 11 27 Epistasis 28 14

Going after multiple QTLs Greater ability to detect QTLs. Separate linked QTLs. Learn about interactions between QTLs (epistasis). 29 Multiple QTL mapping Simplistic but illustrative situation: No missing genotype data Dense markers (so ignore positions between markers) No gene-gene interactions y = β j x j +ε Which β j 0? Model selection in regression 30 15

Model selection Choose a class of models Additive; pairwise interactions; regression trees Fit a model (allow for missing genotype data) Linear regression; ML via EM; Bayes via MCMC Search model space Forward/backward/stepwise selection; MCMC Compare models BIC δ (γ) = log L(γ) + (δ/2) γ log n Miss important loci include extraneous loci. 31 Special features Relationship among the covariates Missing covariate information Identify the key players vs. minimize prediction error 32 16

Opportunities for improvements Each individual is unique. Must genotype each mouse. Unable to obtain multiple invasive phenotypes (e.g., in multiple environmental conditions) on the same genotype. Relatively low mapping precision. Design a set of inbred mouse strains. Genotype once. Study multiple phenotypes on the same genotype. 33 Recombinant inbred lines 34 17

AXB/BXA panel 35 AXB/BXA panel 36 18

LOD curves 37 Chr 7 and 19 38 19

Pairwise recombination fractions Upper-tri: Lower-tri: rec. fracs. lik. ratios Red = association Blue = no association 39 RI lines Advantages Each strain is a eternal resource. Only need to genotype once. Reduce individual variation by phenotyping multiple individuals from each strain. Study multiple phenotypes on the same genotype. Disadvantages Time and expense. Available panels are generally too small (10-30 lines). Can learn only about 2 particular alleles. All individuals homozygous. Greater mapping precision. 40 20

The RIX design 41 The Collaborative Cross 42 21

Genome of an 8-way RI 43 The Collaborative Cross Advantages Great mapping precision. Eternal resource. Genotype only once. Study multiple invasive phenotypes on the same genotype. Barriers Advantages not widely appreciated. Ask one question at a time, or Ask many questions at once? Time. Expense. Requires large-scale collaboration. 44 22

To be worked out Breakpoint process along an 8-way RI chromosome. Reconstruction of genotypes given multipoint marker data. QTL analyses. Mixed models, with random effects for strains and genotypes/alleles. Power and precision (relative to an intercross). 45 Haldane & Waddington 1931 r = recombination fraction per meiosis between two loci Autosomes Pr(G 1 =AA) = Pr(G 1 =BB) = 1/2 Pr(G 2 =BB G 1 =AA) = Pr(G 2 =AA G 1 =BB) = 4r / (1+6r) X chromosome Pr(G 1 =AA) = 2/3 Pr(G 1 =BB) = 1/3 Pr(G 2 =BB G 1 =AA) = 2r / (1+4r) Pr(G 2 =AA G 1 =BB) = 4r / (1+4r) Pr(G 2 G 1 ) = (8/3) r / (1+4r) 46 23

8-way RILs Autosomes Pr(G 1 = i) = 1/8 Pr(G 2 = j G 1 = i) = r / (1+6r) for i j Pr(G 2 G 1 ) = 7r / (1+6r) X chromosome Pr(G 1 =AA) = Pr(G 1 =BB) = Pr(G 1 =EE) = Pr(G 1 =FF) =1/6 Pr(G 1 =CC) = 1/3 Pr(G 2 =AA G 1 =CC) = r / (1+4r) Pr(G 2 =CC G 1 =AA) = 2r / (1+4r) Pr(G 2 =BB G 1 =AA) = r / (1+4r) Pr(G 2 G 1 ) = (14/3) r / (1+4r) 47 Areas for research Model selection procedures for QTL mapping Gene expression microarrays + QTL mapping Combining multiple crosses Association analysis: mapping across mouse strains Analysis of multi-way recombinant inbred lines 48 24

References Broman KW (2001) Review of statistical methods for QTL mapping in experimental crosses. Lab Animal 30:44 52 Jansen RC (2001) Quantitative trait loci in inbred lines. In Balding DJ et al., Handbook of statistical genetics, Wiley, New York, pp 567 597 Lander ES, Botstein D (1989) Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121:185 199 Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963 971 Kruglyak L, Lander ES (1995) A nonparametric approach for mapping quantitative trait loci. Genetics 139:1421-1428 Broman KW (2003) Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163:1169 1175 Miller AJ (2002) Subset selection in regression, 2nd edition. Chapman & Hall, New York 49 More references Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J R Statist Soc B 64:641-656, 737-775 Zeng Z-B, Kao C-H, Basten CJ (1999) Estimating the genetic architecture of quantitative traits. Genet Res 74:279-289 Mott R, Talbot CJ, Turri MG, Collins AC, Flint J (2000) A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci U S A 97:12649-12654 Mott R, Flint J (2002) Simultaneous detection and fine mapping of quantitative trait loci in mice using heterogeneous stocks. Genetics 160:1609-1618 The Complex Trait Consortium (2004) The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nature Genetics 36:1133-1137 Broman KW. The genomes of recombinant inbred lines. Genetics, in press 50 25

Software R/qtl http://www.biostat.jhsph.edu/~kbroman/qtl Mapmaker/QTL http://www.broad.mit.edu/genome_software Mapmanager QTX http://www.mapmanager.org/mmqtx.html QTL Cartographer http://statgen.ncsu.edu/qtlcart/index.php Multimapper http://www.rni.helsinki.fi/~mjs 51 26