Bayesian Model Selection for Multiple QTL. Brian S. Yandell

Similar documents
QTL model selection: key players

QTL model selection: key players

Model Selection for Multiple QTL

Inferring Genetic Architecture of Complex Biological Processes

contact information & resources

1. what is the goal of QTL study?

what is Bayes theorem? posterior = likelihood * prior / C

Multiple QTL mapping

Mapping multiple QTL in experimental crosses

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms

Causal Graphical Models in Systems Genetics

Statistical issues in QTL mapping in mice

QTL Mapping I: Overview and using Inbred Lines

Mapping multiple QTL in experimental crosses

Seattle Summer Institute : Systems Genetics for Experimental Crosses

Overview. Background

Gene mapping in model organisms

Introduction to QTL mapping in model organisms

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Introduction to QTL mapping in model organisms

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Introduction to QTL mapping in model organisms

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

Causal Model Selection Hypothesis Tests in Systems Genetics

Mapping QTL to a phylogenetic tree

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Inferring Causal Phenotype Networks from Segregating Populat

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Causal Network Models for Correlated Quantitative Traits. outline

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Methods for Machine Learning

MANY complex human diseases and traits of bio- tive corrections for multiple testing. Non-Bayesian model

STA 4273H: Statistical Machine Learning

Use of hidden Markov models for QTL mapping

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Lecture 9. QTL Mapping 2: Outbred Populations

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

BTRY 4830/6830: Quantitative Genomics and Genetics

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

MOST complex traits are influenced by interacting

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci

Bayesian QTL mapping using skewed Student-t distributions

Lecture 8 Genomic Selection

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Principles of Bayesian Inference

Bayesian Inference of Interactions and Associations

Variance Component Models for Quantitative Traits. Biostatistics 666

Bayesian reversible-jump for epistasis analysis in genomic studies

Locating multiple interacting quantitative trait. loci using rank-based model selection

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Bayesian linear regression

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

CSC 2541: Bayesian Methods for Machine Learning

MCMC: Markov Chain Monte Carlo

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (BL)

Stat 516, Homework 1

Bayesian Regression Linear and Logistic Regression

Bayesian shrinkage approach in variable selection for mixed

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Markov Chain Monte Carlo methods

Principles of QTL Mapping. M.Imtiaz

CPSC 540: Machine Learning

Bayesian Nonparametric Regression for Diabetes Deaths

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Lecture 13: Population Structure. October 8, 2012

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

TECHNICAL REPORT NO December 1, 2008

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Calculation of IBD probabilities

Evolution of phenotypic traits

One-week Course on Genetic Analysis and Plant Breeding January 2013, CIMMYT, Mexico LOD Threshold and QTL Detection Power Simulation

Binary trait mapping in experimental crosses with selective genotyping

Forward Problems and their Inverse Solutions

I Have the Power in QTL linkage: single and multilocus analysis

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

Or How to select variables Using Bayesian LASSO

Statistical Data Analysis Stat 3: p-values, parameter estimation

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Quantitative characters II: heritability

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

The Quantitative TDT

Bayesian model selection in graphs by using BDgraph package

A Bayesian Approach to Detect Quantitative Trait Loci Using Markov Chain Monte Carlo

Linkage Mapping. Reading: Mather K (1951) The measurement of linkage in heredity. 2nd Ed. John Wiley and Sons, New York. Chapters 5 and 6.

Genotype Imputation. Biostatistics 666

THE problem of identifying the genetic factors un- QTL models because of their ability to separate linked

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Transcription:

Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgen Jackson Laboratory, October 005 October 005 Jax Workshop Brian S. Yandell 1 outline 1. what is the goal of QTL study?. Bayesian priors & posteriors 3. model search using MCMC Gibbs sampler and Metropolis-Hastings 4. model assessment Bayes factors & model averaging 5. data examples in detail plants & animals October 005 Jax Workshop Brian S. Yandell

1. what is the goal of QTL study? major QTL on linkage map 3 4 1 additive effect 5 0 1 3 major QTL Pareto diagram of QTL effects minor QTL 0 5 10 15 0 5 30 rank order of QTL polygenes October 005 Jax Workshop Brian S. Yandell 3 interval mapping basics observed measurements y = phenotypic trait m = markers & linkage map i = individual index (1,,n) missing data missing marker data missing q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities λ = QT locus (or loci) µ = phenotype model parameters H = QTL model/genetic architecture pr(q m,λ,h) genotype model grounded by linkage map, experimental cross recombination yields multinomial for q given m pr(y q,µ,h) phenotype model distribution shape (assumed normal here) unknown parameters µ (could be non-parametric) observed mx Yy Qq unknown λ θµ after Sen Churchill (001) October 005 Jax Workshop Brian S. Yandell 4

. Bayesian priors & posteriors augment data (y,m) with missing genotypes q study model parameters (µ,λ) given data (y,m,q) properties of posterior pr(µ,λ,q y,m ) average posterior over missing genotypes q pr(µ,λ y,m ) = sum q posterior sample from posterior in some clever way multiple imputation (Sen Churchill 00) Markov chain Monte Carlo (MCMC) posterior = likelihood * prior / constant pr( µ, λ, q y, m) = pr( y q, µ )pr( q m, λ)pr( µ )pr( λ m) pr( y m) October 005 Jax Workshop Brian S. Yandell 5 Bayes posterior vs. maximum likelihood classical approach maximizes likelihood Bayesian posterior averages over other parameters LOD( λ ) = log {max µ pr( y m, µ, λ)} + c LPD( λ) = log 10 10 pr( y m, µ, λ) = {pr( λ m) q pr( y m, µ, λ)pr( µ ) dµ } + C pr( y q, µ )pr( q m, λ) October 005 Jax Workshop Brian S. Yandell 6

BC with 1 QTL: IM vs. BIM LOD and LPD: QTL at 100 substitution effect lod 1 10 8 6 4 0 black=bim purple=im nd QTL? nd QTL? substitution effect 1 5 50 1000-0.5 0.0 0.5 1.0 1.5.0 blue=ideal 0 50 100 150 00 0 50 100 150 00 Map position (cm) Map position (cm) October 005 Jax Workshop Brian S. Yandell 7 how does phenotype y improve posterior for genotype q? D4Mit41 D4Mit14 bp 10 110 100 90 what are probabilities for genotype q between markers? recombinants AA:AB all 1:1 if ignore y and if we use y? AA AA AB AA AA AB AB AB Genotype October 005 Jax Workshop Brian S. Yandell 8

posterior on QTL genotypes full conditional of q given data, parameters proportional to prior pr(q m, λ ) weight toward q that agrees with flanking markers proportional to likelihood pr(y q,µ) weight toward q with similar phenotype values phenotype and flanking markers may conflict posterior recombination balances these two weights this is the E-step of EM computations pr( q y, m, µ, λ) = pr( y q, µ )pr( q m, λ) pr( y m, µ, λ) October 005 Jax Workshop Brian S. Yandell 9 prior & posteriors: genotypic means µ q data means prior mean data mean n large n small prior 6 8 10 1 14 16 qq Qq QQ y = phenotype values October 005 Jax Workshop Brian S. Yandell 10

prior & posteriors: genotypic means µ q shrink posterior from sample genotypic mean toward overall mean partition individuals into sets S q by genotype q hyperparameter κ is related to heritability prior: µ q ~ N( y, κσ ) posterior given data: shrinkage factor: µ ~ q N b q y y b q q = sum Sq, n κnq = 1 κn + 1 q y n q q σ + (1 b ), q y bq n q q = count{ S q } October 005 Jax Workshop Brian S. Yandell 11 what if variance σ is unknown? sample variance is proportional to chi-square ns / σ ~ χ ( n ) likelihood of sample variance s given n, σ conjugate prior is inverse chi-square ντ / σ ~ χ ( ν ) prior of population variance σ given ν, τ posterior is weighted average of likelihood and prior (ντ +ns ) / σ ~ χ ( ν+n ) posterior of population variance σ given n, s, ν, τ empirical choice of hyper-parameters τ = s /3, ν=6 E(σ ν,τ ) = s /, Var(σ ν,τ ) = s 4 /4 October 005 Jax Workshop Brian S. Yandell 1

multiple QTL phenotype model phenotype affected by genotype & environment pr(y q,µ) ~ N(µ q, σ ) y = µ q + environment µ q = β 0 +sum {j in H} β j (q) number of terms in QTL model H nqtl (3 nqtl for F ) partition genotypic mean into QTL effects µ q = β 0 + β 1 (q 1 ) + β (q ) + β 1 (q 1,q ) µ q = mean + main effects + epistatic interactions partition prior and posterior (details omitted) October 005 Jax Workshop Brian S. Yandell 13 prior & posterior on QTL model H index model H by number of QTL what prior on number of QTL? uniform over some range Poisson with prior mean geometric with prior mean prior influences posterior good: reflects prior belief push data in discovery process bad: skeptic revolts! answer depends on guess prior probability 0.00 0.10 0.0 0.30 0 4 6 8 10 m = number of QTL October 005 Jax Workshop Brian S. Yandell 14 e e p p e p u exponential Poisson uniform u u p eu u pu u u e e p e p e p e p e p e p p e u u u u

epistatic interactions model space issues -QTL interactions only? Fisher-Cockerham partition vs. tree-structured? general interactions among multiple QTL model search issues epistasis between significant QTL check all possible pairs when QTL included? allow higher order epistasis? epistasis with non-significant QTL whole genome paired with each significant QTL? pairs of non-significant QTL? Yi Xu (000) Genetics; Yi, Xu, Allison (003) Genetics; Yi (004) October 005 Jax Workshop Brian S. Yandell 15 3. QTL Model Search using MCMC construct Markov chain around posterior want posterior as stable distribution of Markov chain in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well update QTL model components from full conditionals update locus λ given q,h (using Metropolis-Hastings step) update genotypes q given λ,µ,y,h (using Gibbs sampler) update effects µ given q,y,h (using Gibbs sampler) update QTL model H given λ,µ,y,q (using Gibbs or M-H) ( λ, q, µ, H) ( λ, q, µ, H) ~ pr( λ, q, µ, H Y, X) ( λ, q, µ, H) L ( λ, q, 1 µ, H) N October 005 Jax Workshop Brian S. Yandell 16

Gibbs sampler idea two correlated normals (genotypic means in BC) could draw samples from both together but easier to sample one at a time Gibbs sampler: sample each from its full conditional pick order of sampling at random repeat N times µ µ µ QQ QQ Qq, µ Qq given µ given µ ~ N(0,1) but cor( µ Qq QQ ~ N ~ N ( ρµ Qq,1 ρ ) ( ρµ,1 ρ ) October 005 Jax Workshop Brian S. Yandell 17 QQ QQ, µ Qq ) = ρ Gibbs sampler samples: ρ = 0.6 N = 50 samples N = 00 samples Gibbs: mean 1 - -1 0 1 0 10 0 30 40 50 Markov chain index - -1 0 1 Gibbs: mean 1 Gibbs: mean - -1 0 1 3 Gibbs: mean - -1 0 1 3 Gibbs: mean - -1 0 1 3 Gibbs: mean 1 - -1 0 1 3 0 50 100 150 00 Markov chain index - -1 0 1 3 Gibbs: mean 1 Gibbs: mean - -1 0 1 Gibbs: mean - -1 0 1 Gibbs: mean - -1 0 1 0 10 0 30 40 50 Markov chain index - -1 0 1 Gibbs: mean 1 0 50 100 150 00 Markov chain index - -1 0 1 3 Gibbs: mean 1 October 005 Jax Workshop Brian S. Yandell 18

How to sample a locus λ? cannot easily sample from locus full conditional pr(λ m,q) = pr(λ) pr(q m,λ) / constant constant determined by averaging over all possible genotypes over entire map Gibbs sampler will not work in general but can use method based on ratios of probabilities Metropolis-Hastings is extension of Gibbs sampler October 005 Jax Workshop Brian S. Yandell 19 Metropolis-Hastings idea want to study distribution f(λ) take Monte Carlo samples unless too complicated pick an arbitrary distribution g(λ, λ * ) draw Metropolis-Hastings samples: current sample value λ propose new value λ * from g(λ, λ * ) accept new value with prob A * * f ( λ ) g( λ, λ) A = min 1, * f ( λ) g( λ, λ ) Gibbs sampler is special case g(λ, λ * ) = f(λ * ) always accept new proposal: A = 1 0.0 0. 0.4 0.0 0. 0.4 f(λ) 0 4 6 8 10 g(λ λ * ) -4-0 4 October 005 Jax Workshop Brian S. Yandell 0

mcmc sequence 0 000 4000 6000 8000 10000 MCMC realization pr(θ Y) 0.0 0.1 0. 0.3 0.4 0.5 0.6 0 4 6 8 10 θ 3 4 5 6 7 8 θ λ λ added twist: occasionally propose from whole domain October 005 Jax Workshop Brian S. Yandell 1 Metropolis-Hastings samples mcmc sequence 0 50 150 N = 00 samples N = 1000 samples narrow g wide g narrow g wide g mcmc sequence 0 50 150 mcmc sequence 0 400 800 mcmc sequence 0 400 800 pr(θ Y) 0 4 6 0 4 6 8 θλ pr(θ Y) 0.0 0.4 0.8 1. 0 4 6 8 θλ pr(θ Y) 0.0 1.0.0 0 4 6 8 θλ pr(θ Y) 0.0 0. 0.4 0.6 0 4 6 8 λθ 0 4 6 8 0 4 6 8 0 4 6 8 0 4 6 8 λθ λθ λθ θλ October 005 Jax Workshop Brian S. Yandell

sampling across QTL models H 0 λ 1 λ m+1 λ λ m L action steps: draw one of three choices update QTL model H with probability 1-b(H)-d(H) update current model using full conditionals sample QTL loci, effects, and genotypes add a locus with probability b(h) propose a new locus along genome innovate new genotypes at locus and phenotype effect decide whether to accept the birth of new locus drop a locus with probability d(h) propose dropping one of existing loci decide whether to accept the death of locus October 005 Jax Workshop Brian S. Yandell 3 reversible jump MCMC consider known genotypes q at known loci λ models with 1 or QTL M-H step between 1-QTL and -QTL models model changes dimension (via careful bookkeeping) consider mixture over QTL models H nqtl = 1 : Y = β 0 + β ( q 1 1 ) + e nqtl = : Y = β 0 + β ( q 1 1 ) + β ( q ) + e October 005 Jax Workshop Brian S. Yandell 4

Gibbs sampler with loci indicators partition genome into intervals at most one QTL per interval interval = 1 cm in length assume QTL in middle of interval use loci to indicate presence/absence of QTL in each interval γ = 1 if QTL in interval γ = 0 if no QTL Gibbs sampler on loci indicators see work of Nengjun Yi (and earlier work of Ina Hoeschele) Y = β + γ β ( q ) + γ β ( q ) + 0 1 1 1 1 e October 005 Jax Workshop Brian S. Yandell 5 Bayesian shrinkage estimation soft loci indicators strength of evidence for λ j depends on variance of β j similar to γ > 0 on grey scale include all possible loci in model pseudo-markers at 1cM intervals Wang et al. (005 Genetics) Shizhong Xu group at U CA Riverside Y = β + β ( q ) + β ( q ) +... + 0 1 1 1 e β ( q j j ) ~ N (0, σ j ), σ j ~ inverse - chisquare October 005 Jax Workshop Brian S. Yandell 6

geometry of effect samples (collinearity of QTL yields correlated estimates) a short sequence first 1000 with m<3 β b 0.0 0.05 0.10 0.15 b 0.0 0.1 0. 0.3 0.4 β 0.05 0.10 0.15-0.3-0. -0.1 0.0 0.1 0. β b1 b1 1 β 1 October 005 Jax Workshop Brian S. Yandell 7 4. Model Assessment balance model fit against model complexity smaller model bigger model model fit miss key features fits better prediction may be biased no bias interpretation easier more complicated parameters low variance high variance information criteria: penalize L by model size H compare IC = log L( H y ) + penalty( H ) Bayes factors: balance posterior by prior choice compare pr( data y model H ) October 005 Jax Workshop Brian S. Yandell 8

QTL Bayes factors BF = posterior odds / prior odds BF equivalent to BIC simple comparison: 1 vs QTL same as LOD test general comparison of models want Bayes factor >> 1 nqtl = number of QTL indexes model complexity genetic architecture also important BF nqtl,nqtl + 1 prior probability 0.00 0.10 0.0 0.30 pr( nqtl data) / pr( nqtl) = pr( nqtl + 1data) / pr( nqtl + 1) e e exponential p Poisson e u uniform p p u u p eu u pu u u e e p p e e p e p e p e u u pu p e u 0 4 6 8 10 m = number of QTL October 005 Jax Workshop Brian S. Yandell 9 Bayes factors to assess models Bayes factor: which model best supports the data? ratio of posterior odds to prior odds ratio of model likelihoods equivalent to LR statistic when comparing two nested models simple hypotheses (e.g. 1 vs QTL) Bayes Information Criteria (BIC) Schwartz introduced for model selection in general settings penalty to balance model size (p = number of parameters) B 1 pr( model1 Y ) / pr( model Y ) pr( Y model1) = = pr( model ) / pr( model ) pr( Y model ) log( B 1 1 ) = log( LR) ( p p )log( n) October 005 Jax Workshop Brian S. Yandell 30 1

simulations and data studies simulated F intercross, 8 QTL (Stephens, Fisch 1998) n=00, heritability = 50% detected 3 QTL increase to detect all 8 n=500, heritability to 97% QTL chr loci effect 1 1 11 3 1 50 5 3 3 6 + 4 6 107 3 5 6 15 +3 6 8 3 4 7 8 54 +1 8 9 195 + ch1 ch ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 frequency in % 0 10 0 30 40 Genetic map posterior 7 8 9 10 11 1 13 number of QTL 0 50 100 150 00 October 005 Jax Workshop Brian S. Yandell 31 loci pattern across genome notice which chromosomes have persistent loci best pattern found 4% of the time Chromosome m 1 3 4 5 6 7 8 9 10 Count of 8000 8 0 1 0 0 0 1 0 3371 9 3 0 1 0 0 0 1 0 751 7 0 1 0 0 0 1 1 0 377 9 0 1 0 0 0 1 0 18 9 0 1 0 0 3 0 1 0 18 9 0 1 0 0 0 0 198 October 005 Jax Workshop Brian S. Yandell 3

B. napus 8-week vernalization whole genome study 108 plants from double haploid similar genetics to backcross: follow 1 gamete parents are Major (biennial) and Stellar (annual) 300 markers across genome 19 chromosomes average 6cM between markers median 3.8cM, max 34cM 83% markers genotyped phenotype is days to flowering after 8 weeks of vernalization (cooling) Stellar parent requires vernalization to flower available in R/bim package Ferreira et al. (1994); Kole et al. (001); Schranz et al. (00) October 005 Jax Workshop Brian S. Yandell 33 Markov chain Monte Carlo sequence burnin (sets up chain) mcmc sequence number of QTL environmental variance h = heritability (genetic/total variance) LOD = likelihood nqtl 0 5 10 15 0 5 envvar 0.006 0.010 0.014 herit 0.0 0. 0.4 0.6-50000 -40000-30000 -0000-10000 0 burnin sequence -50000-40000 -30000-0000 -10000 0 burnin sequence nqtl 4 6 8 10 envvar 0.004 0.010 0.016-50000 -40000-30000 -0000-10000 0 burnin sequence LOD 0 5 10 15 0 herit 0.1 0.3 0.5 0 e+00 e+05 4 e+05 6 e+05 8 e+05 1 e+0 mcmc sequence 0 e+00 e+05 4 e+05 6 e+05 8 e+05 1 e+0 mcmc sequence 0 e+00 e+05 4 e+05 6 e+05 8 e+05 1 e+0 mcmc sequence LOD 5 10 15 0-50000 -40000-30000 -0000-10000 0 0 e+00 e+05 4 e+05 6 e+05 8 e+05 1 e+0 burnin sequence mcmc sequence October 005 Jax Workshop Brian S. Yandell 34

MCMC sampled loci tgf1 wg3f7c subset of chromosomes N, N3, N16 points jittered for view blue lines at markers note concentration on chromosome N includes all models MCMC sampled loci 0 0 40 60 80 100 10 tgh10b wg1a10 ecd1a E35M59.117 E38M50.133 Aca1 tg6a1 wg5a5 wg8g1b wg6b10 E35M6.117 E33M59.59 wg7f3a ec3a8 E3M49.73 wg3g11 wg1g8a wgd11b ec3b1 E33M49.117 ece5a isoidh wge1b E33M49.11 wg6b wg9c7 E33M6.140 E33M50.183 wg4a4b E3M61.18 ec4e7a wg5b1a wg6c6 ec4g7b wg7a11 E35M48.148 wg4f4c E33M49.165 ecd8a E33M47.338 E38M50.159 E3M50.409 wg4d10 ec4h7 wg1g10b E38M6.9 E33M49.491 tg5d9b sl E35M59.85 g6 wg7f5b E3M47.159 ec5e1a wg5a1b wga3c ec3e1b N N3 N16 chromosome October 005 Jax Workshop Brian S. Yandell 35 row 1: # QTL row : pattern Bayesian model assessment col 1: posterior col : Bayes factor note error bars on bf evidence suggests 4-5 QTL N(-3),N3,N16 QTL posterior 0.0 0.1 0. 0.3 model posterior 0.0 0.1 0. 0.3 QTL posterior 1 3 5 7 9 11 number of QTL * pattern posterior :,1 3:*,1 :,13 :,3 :,16 :,11 3:*,3 :,15 4:*,3,16 3:*,13 :,14 1 3 5 7 9 11 13 model index October 005 Jax Workshop Brian S. Yandell 36 posterior / prior 1 5 50 500 posterior / prior 5 e-01 1 e+01 5 e+0 Bayes factor ratios strong moderate weak 1 3 5 7 9 11 number of QTL 1 Bayes factor ratios 3 strong moderate weak 3 4 3 1 3 5 7 9 11 13 model index

Bayesian estimates of loci & effects model averaging: at least 4 QTL histogram of loci blue line is density red lines at estimates estimate additive effects (red circles) grey points sampled from posterior blue line is cubic spline dashed line for SD loci histogram 0.00 0.0 0.04 0.06 additive -0.08-0.0 0.0 napus8 summaries with pattern 1,1,,3 and m 4 N N3 N16 0 50 100 150 00 50 N N3 N16 0 50 100 150 00 50 October 005 Jax Workshop Brian S. Yandell 37 Bayesian model diagnostics pattern: N(),N3,N16 col 1: density col : boxplots by m environmental variance σ =.008, σ =.09 heritability h = 5% LOD = 16 (highly significant) but note change with m density density density 0 50 100 00 0 1 3 4 5 0.00 0.04 0.08 0.1 0.004 0.006 0.008 0.010 0.01 marginal envvar, m 4 0. 0.3 0.4 0.5 0.6 0.7 marginal heritability, m 4 5 10 15 0 5 marginal LOD, m 4 4 5 6 7 8 9 11 1 envvar conditional on number of QTL October 005 Jax Workshop Brian S. Yandell 38 envvar heritability LOD 0.006 0.008 0.010 0.30 0.40 0.50 0.60 4 5 6 7 8 9 11 1 heritability conditional on number of QT 10 1 14 16 18 0 4 5 6 7 8 9 11 1 LOD conditional on number of QTL

studying diabetes in an F segregating cross of inbred lines B6.ob x BTBR.ob F1 F selected mice with ob/ob alleles at leptin gene (chr 6) measured and mapped body weight, insulin, glucose at various ages (Stoehr et al. 000 Diabetes) sacrificed at 14 weeks, tissues preserved gene expression data Affymetrix microarrays on parental strains, F1 key tissues: adipose, liver, muscle, β-cells novel discoveries of differential expression (Nadler et al. 000 PNAS; Lan et al. 00 in review; Ntambi et al. 00 PNAS) RT-PCR on 108 F mice liver tissues 15 genes, selected as important in diabetes pathways SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase, PDI, October 005 Jax Workshop Brian S. Yandell 39 Multiple Interval Mapping (QTLCart) SCD1: multiple QTL plus epistasis! effect (add=blue, dom=red) -0.5 0.0 0.5 1.0 LOD 0 4 6 8 0 50 100 150 00 50 300 chr chr5 chr9 0 50 100 150 00 50 300 chr chr5 chr9 October 005 Jax Workshop Brian S. Yandell 40

Bayesian model assessment: number of QTL for SCD1 QTL posterior Bayes factor ratios QTL posterior 0.00 0.05 0.10 0.15 0.0 0.5 posterior / prior 1 5 10 50 500 strong moderate weak 1 3 4 5 6 7 8 9 11 13 1 3 4 5 6 7 8 9 11 13 number of QTL number of QTL October 005 Jax Workshop Brian S. Yandell 41 Bayesian LOD and h for SCD1 density 0.00 0.10 LOD 5 10 15 0 0 5 10 15 0 marginal LOD, m 1 1 3 4 5 6 7 8 9 10 1 14 LOD conditional on number of QTL density 0 1 3 4 heritability 0.1 0.3 0.5 0.0 0.1 0. 0.3 0.4 0.5 0.6 0.7 marginal heritability, m 1 1 3 4 5 6 7 8 9 10 1 14 heritability conditional on number of QTL October 005 Jax Workshop Brian S. Yandell 4

Bayesian model assessment: chromosome QTL pattern for SCD1 model posterior 0.00 0.05 0.10 0.15 3:1,,3 4:*1,,3 pattern posterior 4:1,,*3 4:1,*,3 5:3*1,,3 5:*1,,*3 5:*1,*,3 6:3*1,,*3 6:3*1,*,3 5:1,*,*3 6:4*1,,3 6:*1,*,*3 :1,3 3:*1, :1, 1 3 5 7 9 11 13 15 model index posterior / prior 0. 0.4 0.6 0.8 3 4 4 4 Bayes factor ratios 5 6 moderate 6 5 5 6 5 6 weak 3 1 3 5 7 9 11 13 15 model index October 005 Jax Workshop Brian S. Yandell 43 trans-acting QTL for SCD1 (no epistasis yet: see Yi, Xu, Allison 003) dominance? October 005 Jax Workshop Brian S. Yandell 44

-D scan: assumes only QTL! epistasis LOD peaks joint LOD peaks October 005 Jax Workshop Brian S. Yandell 45 sub-peaks can be easily overlooked! October 005 Jax Workshop Brian S. Yandell 46

epistatic model fit October 005 Jax Workshop Brian S. Yandell 47 Cockerham epistatic effects October 005 Jax Workshop Brian S. Yandell 48

marginal heritability by locus black = total blue = additive red = dominant purple = add x add October 005 Jax Workshop Brian S. Yandell 49 marginal LOD by locus black = total blue = additive red = dominant purple = add x add green = scanone October 005 Jax Workshop Brian S. Yandell 50

Bayesian & classical (HK) October 005 Jax Workshop Brian S. Yandell 51 sex-adjusted anova for SCD1 (only 3-way sex interactions significant) Summary for fit QTL Method is: imp Number of observations: 107 Full model result ---------------------------------- Model formula is: y ~ sex * (Q1 + Q + Q3 + Q1:Q3 + Q:Q3) df SS MS LOD %var Pvalue(Chi) Pvalue(F) Model 9 109.004 3.7593806 5.79574 67.05143 7.86961e-13 1.59058e-09 Error 77 53.5761 0.695748 Total 106 16.59465 Drop one QTL at a time ANOVA table: ---------------------------------- df Type III SS LOD %var F value Pvalue sex 15 36.369 1.039.368 3.485 0.000154 Chr@105 1 41.48 13.66 5.369 4.940 5.38e-06 Chr5@67 1 39.771 1.901 4.460 4.764 8.93e-06 Chr9@15 0 59.54 17.365 36.60 4.79 1.87e-06 Chr@105:Chr9@15 8 33.637 11.3 0.687 6.043 5.11e-06 Chr5@67:Chr9@15 8 30.04 10.344 18.476 5.397.1e-05 sex:chr@105 6 18.499 6.89 11.377 4.431 0.000669 sex:chr5@67 6 18.130 6.773 11.151 4.343 0.000793 sex:chr9@15 10 5.945 9.176 15.957 3.79 0.000413 sex:chr@105:chr9@15 4 17.444 6.549 10.79 6.68 0.0000 sex:chr5@67:chr9@15 4 15.78 5.981 9.673 5.65 0.000483 October 005 Jax Workshop Brian S. Yandell 5

anova for SCD1 (sex terms n.s.) Model: y ~ Chr@105 + Chr5@67 + Chr9@67 + Chr@80 + Chr9@67^ + Chr@105:Chr9@67 + Chr5@67:Chr9@67^ Df Sum Sq Mean Sq F value Pr(>F) Model 7 69.060 69.060 73.095 <.e-16 *** Error 99 93.535 0.945 Total 106 16.595 70.005 Single term deletions Df Sum of Sq RSS AIC F value Pr(F) <none> 93.535.99 Chr@80 1 9.073 10.608 8.5 9.6036 0.0058 ** Chr@105 1 0.073 93.607 18.40 0.0769 0.78140 Chr5@67 1 0.18 93.753 18.568 0.307 0.63084 Chr9@67 1 7.156 100.691 6.07 7.5746 0.00704 ** Chr9@67^ 1 0.106 93.641 18.440 0.113 0.73800 Chr@105:Chr9@67 1 15.61 109.147 34.836 16.546 9.64e-05 *** Chr5@67:Chr9@67^ 1 7.11 100.745 6.65 7.6318 0.006838 ** October 005 Jax Workshop Brian S. Yandell 53 epistatic interaction plots (chr 5x9 p=0.007; chr x9 p<0.0001) mostly axd mostly axa October 005 Jax Workshop Brian S. Yandell 54

SCD1 on chrx9 by sex October 005 Jax Workshop Brian S. Yandell 55 obesity in CAST/Ei BC onto M16i 41 mice (Daniel Pomp) (13 male, 08 female) 9 microsatellites on 19 chromosomes 114 cm map subcutaneous fat pads pre-adjusted for sex and dam effects Yi, Yandell, Churchill, Allison, Eisen, Pomp (005) Genetics (in press) October 005 Jax Workshop Brian S. Yandell 56

non-epistatic analysis LOD score 0 5 10 15 0 Bayes factor 0 50 100 150 00 50 300 single QTL LOD profile multiple QTL Bayes factor profile October 005 Jax Workshop Brian S. Yandell 57 posterior profile of main effects in epistatic analysis Heritability Main effect 0.00 0.05 0.10 0.15 0.0-0.8-0.4 0.0 0. 0.4 Bayes factor 0 10 0 30 40 50 main effects & heritability profile Bayes factor profile October 005 Jax Workshop Brian S. Yandell 58

posterior profile of main effects in epistatic analysis October 005 Jax Workshop Brian S. Yandell 59 model selection via Bayes factors for epistatic model number of QTL QTL pattern October 005 Jax Workshop Brian S. Yandell 60

posterior probability of effects 0.0 0. 0.4 0.6 0.8 1.0 Posterior probability Chr13(0,4)*Chr15(1,31) Chr7(50,75)*Chr19(15,45) Chr(7,85)*Chr14(1,41) Chr15(1,31)*Chr19(15,45) Chr(7,85)*Chr13(0,4) Chr1(6,54)*Chr18(43,71) Chr14(1,41) Chr7(50,75) Chr19(15,45) Chr1(6,54) Chr18(43,71) Chr15(1,31) Chr13(0,4) Chr(7,85) October 005 Jax Workshop Brian S. Yandell 61 scatterplot estimates of epistatic loci October 005 Jax Workshop Brian S. Yandell 6

stronger epistatic effects October 005 Jax Workshop Brian S. Yandell 63 Bayesian software for QTLs R/bim: Bayesian IM (Satagopan Yandell 1996; Gaffney 001)* www.stat.wisc.edu/~yandell/qtl/software/bmapqtl www.r-project.org contributed package to CRAN version available within WinQTLCart (statgen.ncsu.edu/qtlcart) R/bmqtl: Bayesian Multiple QTL (N Yi, T Mehta & BS Yandell)* epistasis, new MCMC algorithms, extensive graphics extension of R/bim; due out Fall 005 R/qtl (Broman et al. 003 Bioinformatics)* biosun01.biostat.jhsph.edu/~kbroman/software www.r-project.org contributed package Pseudomarker (Sen, Churchill 00 Genetics) www.jax.org/staff/churchill/labsite/software Bayesian QTL / Multimapper Sillanpää Arjas (1998) www.rni.helsinki.fi/~mjs Stephens & Fisch (1998 Biometrics) R/bqtl (C Berry, hacuna.ucsd.edu/bqtl) * Hao Wu, Jackson Labs, provide crucial computing support October 005 Jax Workshop Brian S. Yandell 64

R/bim: our software www.stat.wisc.edu/~yandell/qtl/software/bmapqtl R contributed library (www.r-project.org) library(bim) is cross-compatible with library(qtl) Bayesian module within WinQTLCart WinQTLCart output can be processed using R library Software history initially designed by JM Satagopan (1996) major revision and extension by PJ Gaffney (001) whole genome multivariate update of effects; long range position updates substantial improvements in speed, efficiency pre-burnin: initial prior number of QTL very large upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 003) epistasis in progress (H Wu, BS Yandell, N Yi 004) October 005 Jax Workshop Brian S. Yandell 65 many thanks Jackson Labs Gary Churchill Hao Wu U AL Birmingham David Allison Nengjun Yi Tapan Mehta Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Jonathan Stoehr Hong Lan Susie Clee Jessica Byers Michael Newton Daniel Sorensen Daniel Gianola Liang Li my students Jaya Satagopan Fei Zou Patrick Gaffney Chunfang Jin Elias Chaibub Neto USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi) October 005 Jax Workshop Brian S. Yandell 66