contact information & resources

Similar documents
Inferring Genetic Architecture of Complex Biological Processes

QTL model selection: key players

Model Selection for Multiple QTL

QTL model selection: key players

what is Bayes theorem? posterior = likelihood * prior / C

Bayesian Model Selection for Multiple QTL. Brian S. Yandell

1. what is the goal of QTL study?

Multiple QTL mapping

Causal Graphical Models in Systems Genetics

Statistical issues in QTL mapping in mice

Mapping multiple QTL in experimental crosses

Introduction to QTL mapping in model organisms

Gene mapping in model organisms

Seattle Summer Institute : Systems Genetics for Experimental Crosses

Introduction to QTL mapping in model organisms

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Introduction to QTL mapping in model organisms

Inferring Causal Phenotype Networks from Segregating Populat

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

Introduction to QTL mapping in model organisms

Mapping multiple QTL in experimental crosses

Overview. Background

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Causal Model Selection Hypothesis Tests in Systems Genetics

Introduction to QTL mapping in model organisms

QTL Mapping I: Overview and using Inbred Lines

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Use of hidden Markov models for QTL mapping

BTRY 4830/6830: Quantitative Genomics and Genetics

Lecture 9. QTL Mapping 2: Outbred Populations

Mapping QTL to a phylogenetic tree

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Contents. Part I: Fundamentals of Bayesian Inference 1

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

Evolution of phenotypic traits

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Lecture WS Evolutionary Genetics Part I 1

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Latent Variable models for GWAs

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci

Bayesian Methods for Machine Learning

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Genotype Imputation. Biostatistics 666

Linear Regression (1/1/17)

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Bayesian Inference of Interactions and Associations

Causal Network Models for Correlated Quantitative Traits. outline

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

STA 4273H: Statistical Machine Learning

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

Major Genes, Polygenes, and

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Principles of Bayesian Inference

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia

Bayesian model selection: methodology, computation and applications

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

TECHNICAL REPORT NO December 1, 2008

Methods for QTL analysis

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

An Introduction to the spls Package, Version 1.0

Bayesian reversible-jump for epistasis analysis in genomic studies

Principles of QTL Mapping. M.Imtiaz

I Have the Power in QTL linkage: single and multilocus analysis

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Prediction of the Confidence Interval of Quantitative Trait Loci Location

MCMC IN THE ANALYSIS OF GENETIC DATA ON PEDIGREES

1 Springer. Nan M. Laird Christoph Lange. The Fundamentals of Modern Statistical Genetics

Stat 5101 Lecture Notes

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Long-Term Response and Selection limits

Estimation of Parameters in Random. Effect Models with Incidence Matrix. Uncertainty

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Bayesian Nonparametric Regression for Diabetes Deaths

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Subject CS1 Actuarial Statistics 1 Core Principles

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

For 5% confidence χ 2 with 1 degree of freedom should exceed 3.841, so there is clear evidence for disequilibrium between S and M.

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

(Genome-wide) association analysis

STA 4273H: Statistical Machine Learning

Statistical Data Analysis Stat 3: p-values, parameter estimation

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Cheng-Ruei Lee 1 *, Jill T. Anderson 2, Thomas Mitchell-Olds 1,3. Abstract. Introduction

MOST complex traits are influenced by interacting

1.5.1 ESTIMATION OF HAPLOTYPE FREQUENCIES:

Locating multiple interacting quantitative trait. loci using rank-based model selection

Doing Bayesian Integrals

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Making rating curves - the Bayesian approach

Partitioning Genetic Variance

Lecture 8 Genomic Selection

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Transcription:

Seattle Summer Institute 006 Advanced QTL Brian S. Yandell University of Wisconsin-Madison Bayesian QTL mapping & model selection data examples in detail multiple phenotypes & microarrays software demo & automated strategy QTL : Overview Seattle SISG: Yandell 006 1 contact information & resources email: byandell@wisc.edu web: www.stat.wisc.edu/~yandell/statgen QTL & microarray resources references, software, people thanks: students: Jaya Satagopan, Pat Gaffney, Fei Zou, Amy Jin, W. Whipple Neely faculty/staff: Alan Attie, Michael Newton, Nengjun Yi, Gary Churchill, Hong Lan, Christina Kendziorski, Tom Osborn, Jason Fine, Tapan Mehta, Hao Wu, Samprit Banerjee, Daniel Shriner QTL : Overview Seattle SISG: Yandell 006

Bayesian Interval Mapping 1. what is goal of QTL study? -8. Bayesian QTL mapping 9-0 3. Markov chain sampling 1-7 4. sampling across architectures 8-34 5. epistatic interactions 35-4 6. comparing models 43-46 QTL : Bayes Seattle SISG: Yandell 006 1 1. what is the goal of QTL study? uncover underlying biochemistry identify how networks function, break down find useful candidates for (medical) intervention epistasis may play key role statistical goal: maximize number of correctly identified QTL basic science/evolution how is the genome organized? identify units of natural selection additive effects may be most important (Wright/Fisher debate) statistical goal: maximize number of correctly identified QTL select elite individuals predict phenotype (breeding value) using suite of characteristics (phenotypes) translated into a few QTL statistical goal: mimimize prediction error QTL : Bayes Seattle SISG: Yandell 006

cross two inbred lines linkage disequilibrium associations linked segregating QTL (after Gary Churchill) QTL Marker Trait QTL : Bayes Seattle SISG: Yandell 006 3 pragmatics of multiple QTL evaluate some objective for model given data classical likelihood Bayesian posterior search over possible genetic architectures (models) number and positions of loci gene action: additive, dominance, epistasis estimate features of model means, variances & covariances, confidence regions marginal or conditional distributions art of model selection how select best or better model(s)? how to search over useful subset of possible models? QTL : Bayes Seattle SISG: Yandell 006 4

advantages of multiple QTL approach improve statistical power, precision increase number of QTL detected better estimates of loci: less bias, smaller intervals improve inference of complex genetic architecture patterns and individual elements of epistasis appropriate estimates of means, variances, covariances asymptotically unbiased, efficient assess relative contributions of different QTL improve estimates of genotypic values less bias (more accurate) and smaller variance (more precise) mean squared error = MSE = (bias) + variance QTL : Bayes Seattle SISG: Yandell 006 5 major QTL on linkage map 3 Pareto diagram of QTL effects additive effect 0 1 3 major QTL (modifiers) minor QTL polygenes 1 4 5 0 5 10 15 0 5 30 rank order of QTL QTL : Bayes Seattle SISG: Yandell 006 6

limits of multiple QTL? limits of statistical inference power depends on sample size, heritability, environmental variation best model balances fit to data and complexity (model size) genetic linkage = correlated estimates of gene effects limits of biological utility sampling: only see some patterns with many QTL marker assisted selection (Bernardo 001 Crop Sci) 10 QTL ok, 50 QTL are too many phenotype better predictor than genotype when too many QTL increasing sample size may not give multiple QTL any advantage hard to select many QTL simultaneously 3 m possible genotypes to choose from QTL : Bayes Seattle SISG: Yandell 006 7 QTL below detection level? problem of selection bias QTL of modest effect only detected sometimes their effects are biased upwards when detected probability that QTL detected avoids sharp in/out dichotomy avoid pitfalls of one best model examine better models with more probable QTL build m = number of QTL detected into QTL model directly allow uncertainty in genetic architecture model selection over genetic architecture QTL : Bayes Seattle SISG: Yandell 006 8

. Bayesian QTL mapping Reverend Thomas Bayes (170-1761) part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 1763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?) basic idea (from Bayes original example) two billiard balls tossed at random (uniform) on table where is first ball if the second is to its left? prior: anywhere on the table posterior: more likely toward right end of table QTL : Bayes Seattle SISG: Yandell 006 9 Bayes posterior for normal data n large n small prior prior mean actual mean n large n small prior prior mean actual mean 6 8 10 1 14 16 6 8 10 1 14 16 y = phenotype values y = phenotype values small prior variance large prior variance QTL : Bayes Seattle SISG: Yandell 006 10

Bayes posterior for normal data model y i = µ + e i environment e ~ N( 0, σ ), σ known likelihood y~n( µ, σ ) prior µ ~N( µ 0, κσ ), κ known posterior: mean tends to sample mean single individual µ ~N( µ 0 + b 1 (y 1 µ 0 ), b 1 σ ) sample of n individuals µ ~ N with y ( bn y + (1 bn ) µ 0, bnσ / n) = sum y / n { i= 1,..., n} i fudge factor (shrinks to 1) κn bn = 1 κn + 1 QTL : Bayes Seattle SISG: Yandell 006 11 Bayesian QTL: key players observed measurements y = phenotypic trait m = markers & linkage map i = individual index (1,,n) missing data missing marker data missing q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities λ = QT locus (or loci) µ = phenotype model parameters H = QTL model/genetic architecture pr(q m,λ,h) genotype model grounded by linkage map, experimental cross recombination yields multinomial for q given m pr(y q,µ,h) phenotype model distribution shape (assumed normal here) unknown parameters µ (could be non-parametric) observed mx Yy Qq unknown λ θµ H after Sen Churchill (001) QTL : Bayes Seattle SISG: Yandell 006 1

pr(y q,µ) phenotype model n large n small prior 6 8 10 1 14 16 qq Qq QQ y = phenotype values QTL : Bayes Seattle SISG: Yandell 006 13 Bayes posterior QTL means posterior centered on sample genotypic mean but shrunken slightly toward overall mean prior: µ q ~ N ( y, κσ ) posterior: µ ~ N q ( bq yq + (1 bq ) y, bqσ / nq ) fudge factor: n b q q = count{ q κnq = 1 κn + 1 q i = q}, y q = sum y { qi = q} i / n q QTL : Bayes Seattle SISG: Yandell 006 14

partition of multiple QTL effects partition genotype-specific mean into QTL effects µ q = mean + main effects + epistatic interactions µ q = µ + β q = µ +sum j in H β qj priors on mean and effects µ ~ N(µ 0, κ 0 σ ) grand mean β q ~ N(0, κ 1 σ ) model-independent genotypic effect β qj ~ N(0, κ 1 σ / H ) effects down-weighted by size of H determine hyper-parameters via empirical Bayes h 1 h σ σ G µ 0 Y and κ1 = QTL : Bayes Seattle SISG: Yandell 006 15 pr(q m,λ) recombination model pr(q m,λ) = pr(geno map, locus) pr(geno flanking markers, locus) q? m1 m m 3 m m 4 5 m6 markers λ distance along chromosome QTL : Bayes Seattle SISG: Yandell 006 16

how does phenotype Y improve posterior for genotype Q? D4Mit41 D4Mit14 bp 10 110 100 90 what are probabilities for genotype Q between markers? recombinants AA:AB all 1:1 if ignore Y and if we use Y? AA AA AB AA AA AB AB AB Genotype QTL : Bayes Seattle SISG: Yandell 006 17 posterior on QTL genotypes full conditional for q depends data for individual i proportional to prior pr(q m i, λ ) weight toward q that agrees with flanking markers proportional to likelihood pr(y i q,µ) weight toward q so that group mean µ q y i phenotype and prior recombination may conflict posterior recombination balances these two weights this is E step in EM for classical QTL analysis pr( q mi, λ)pr( yi q, µ ) pr( q yi, mi, µ, λ) = pr( y m, µ, λ) i i QTL : Bayes Seattle SISG: Yandell 006 18

Bayesian model posterior augment data (y,m) with unknowns q study unknowns (µ,λ,q) given data (y,m) properties of posterior pr(µ,λ,q y,m ) sample from posterior in some clever way multiple imputation or MCMC pr( y q, µ )pr( q m, λ)pr( µ )pr( λ m) pr( q, µ, λ y, m) = pr( y m) pr( µ, λ y, m) = sum q pr( q, µ, λ y, n) QTL : Bayes Seattle SISG: Yandell 006 19 Bayesian priors for QTL missing genotypes q pr( q m, λ ) recombination model is formally a prior effects ( µ, σ ) prior = pr( µ q σ ) pr(σ ) use conjugate priors for normal phenotype pr( µ q σ ) = normal pr(σ ) = inverse chi-square each locus λ may be uniform over genome pr(λ m ) = 1 / length of genome combined prior pr( q, µ, λ m ) = pr( q m, λ ) pr( µ ) pr(λ m ) QTL : Bayes Seattle SISG: Yandell 006 0

3. Markov chain sampling of architectures construct Markov chain around posterior want posterior as stable distribution of Markov chain in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well hard to sample (q, µ, λ, H) from joint posterior update (q,µ,λ) from full conditionals for model H update genetic architecture H ( q, µ, λ, H) ~ pr( q, µ, λ, H y, m) ( q, µ, λ, H) 1 ( q, µ, λ, H) L ( q, µ, λ, H) N QTL : Bayes Seattle SISG: Yandell 006 1 MCMC sampling of (λ,q,µ) Gibbs sampler genotypes q effects µ not loci λ q ~ pr( q y, m, µ, λ) pr( y q, µ )pr( µ ) µ ~ pr( y q) Metropolis-Hastings sampler extension of Gibbs sampler does not require normalization pr( q m ) = sum λ pr( q m, λ ) pr(λ ) i pr( q m, λ)pr( λ m) λ ~ pr( q m) i QTL : Bayes Seattle SISG: Yandell 006

full conditional for locus cannot easily sample from locus full conditional pr(λ y,m,µ,q) = pr( λ m,q) = pr( q m, λ ) pr(λ ) / constant constant is very difficult to compute explicitly must average over all possible loci λ over genome must do this for every possible genotype q Gibbs sampler will not work in general but can use method based on ratios of probabilities Metropolis-Hastings is extension of Gibbs sampler QTL : Bayes Seattle SISG: Yandell 006 3 Gibbs sampler idea toy problem want to study two correlated effects could sample directly from their bivariate distribution instead use Gibbs sampler: sample each effect from its full conditional given the other pick order of sampling at random repeat many times µ 1 0 1 ~ N, µ 0 ρ µ ~ N 1 µ ~ N ( ρµ,1 ρ ) ( ρµ,1 ρ ) QTL : Bayes Seattle SISG: Yandell 006 4 1 ρ 1

Gibbs sampler samples: ρ = 0.6 N = 50 samples N = 00 samples Gibbs: mean 1 - -1 0 1 0 10 0 30 40 50 Markov chain index - -1 0 1 Gibbs: mean 1 Gibbs: mean - -1 0 1 3 Gibbs: mean - -1 0 1 3 Gibbs: mean - -1 0 1 3 Gibbs: mean 1 - -1 0 1 3 0 50 100 150 00 Markov chain index - -1 0 1 3 Gibbs: mean 1 Gibbs: mean - -1 0 1 Gibbs: mean - -1 0 1 Gibbs: mean - -1 0 1 0 10 0 30 40 50 Markov chain index - -1 0 1 Gibbs: mean 1 0 50 100 150 00 Markov chain index - -1 0 1 3 Gibbs: mean 1 QTL : Bayes Seattle SISG: Yandell 006 5 Metropolis-Hastings idea want to study distribution f(λ) take Monte Carlo samples unless too complicated take samples using ratios of f Metropolis-Hastings samples: propose new value λ * near (?) current value λ from some distribution g accept new value with prob a Gibbs sampler: a = 1 always a = * * f ( λ ) g( λ λ) min 1, * f ( λ) g( λ λ ) 0 4 6 8 10-4 - 0 4 QTL : Bayes Seattle SISG: Yandell 006 6 0.0 0. 0.4 0.0 0. 0.4 f(λ) g(λ λ * )

Metropolis-Hastings samples mcmc sequence 0 50 150 N = 00 samples N = 1000 samples narrow g wide g narrow g wide g mcmc sequence 0 50 150 mcmc sequence 0 400 800 mcmc sequence 0 400 800 histogram pr(θ Y) 0 4 6 0 4 6 8 0 4 6 8 0 4 6 8 0 4 6 8 θ λ θλ θ λ θ λ histogram pr(θ Y) 0.0 0.4 0.8 1. histogram pr(θ Y) 0.0 1.0.0 histogram pr(θ Y) 0.0 0. 0.4 0.6 0 4 6 8 0 4 6 8 0 4 6 8 0 4 6 8 θ θ θ θ QTL : Bayes Seattle SISG: Yandell 006 7 4. sampling across architectures search across genetic architectures M of various sizes allow change in number of QTL allow change in types of epistatic interactions methods for search reversible jump MCMC Gibbs sampler with loci indicators complexity of epistasis Fisher-Cockerham effects model general multi-qtl interaction & limits of inference QTL : Bayes Seattle SISG: Yandell 006 8

model selection in regression consider known genotypes q at known loci λ models with 1 or QTL jump between 1-QTL and -QTL models adjust parameters when model changes β q1 estimate changes between models 1 and due to collinearity of QTL genotypes m = 1 : µ q = µ + β q1 m = : µ q = µ + β q1 + β q QTL : Bayes Seattle SISG: Yandell 006 9 geometry of reversible jump Move Between Models Reversible Jump Sequence b 0.0 0. 0.4 0.6 0.8 β m= m=1 c1 = 0.7 b 0.0 0. 0.4 0.6 0.8 β 0.0 0. 0.4 0.6 0.8 0.0 0. 0.4 0.6 0.8 β b1 b1 1 β 1 QTL : Bayes Seattle SISG: Yandell 006 30

geometry allowing q and λ to change a short sequence first 1000 with m<3 β b 0.0 0.05 0.10 0.15 0.05 0.10 0.15 b1 b 0.0 0.1 0. 0.3 0.4 β -0.3-0. -0.1 0.0 0.1 0. b1 β 1 β 1 QTL : Bayes Seattle SISG: Yandell 006 31 collinear QTL = correlated effects 4-week 8-week additive effect -0.6-0.4-0. 0.0 cor = -0.81 additive effect -0.3-0. -0.1 0.0 cor = -0.7-0.6-0.4-0. 0.0 0. additive 1 effect 1-0. -0.1 0.0 0.1 0. additive 1 effect 1 linked QTL = collinear genotypes correlated estimates of effects (negative if in coupling phase) sum of linked effects usually fairly constant QTL : Bayes Seattle SISG: Yandell 006 3

reversible jump MCMC idea 0 λ 1 λ m+1 λ λ m L Metropolis-Hastings updates: draw one of three choices update m-qtl model with probability 1-b(m+1)-d(m) update current model using full conditionals sample m QTL loci, effects, and genotypes add a locus with probability b(m+1) propose a new locus and innovate new genotypes & genotypic effect decide whether to accept the birth of new locus drop a locus with probability d(m) propose dropping one of existing loci decide whether to accept the death of locus Satagopan Yandell (1996, 1998); Sillanpaa Arjas (1998); Stevens Fisch (1998) these build on RJ-MCMC idea of Green (1995); Richardson Green (1997) QTL : Bayes Seattle SISG: Yandell 006 33 Gibbs sampler with loci indicators consider only QTL at pseudomarkers every 1- cm modest approximation with little bias use loci indicators in each pseudomarker δ = 1 if QTL present δ = 0 if no QTL present Gibbs sampler on loci indicators δ relatively easy to incorporate epistasis Yi, Yandell, Churchill, Allison, Eisen, Pomp (005 Genetics) (see earlier work of Nengjun Yi and Ina Hoeschele) µ = µ + δ β + δ q β 1 q1 q QTL : Bayes Seattle SISG: Yandell 006 34

5. Gene Action and Epistasis additive, dominant, recessive, general effects of a single QTL (Gary Churchill) QTL : Bayes Seattle SISG: Yandell 006 35 additive effects of two QTL (Gary Churchill) µ q = µ + β q1 + β q QTL : Bayes Seattle SISG: Yandell 006 36

Epistasis (Gary Churchill) The allelic state at one locus can mask or uncover the effects of allelic variation at another. - W. Bateson, 1907. QTL : Bayes Seattle SISG: Yandell 006 37 epistasis in parallel pathways (GAC) Z keeps trait value low neither E 1 nor E is rate limiting X Y E 1 E Z loss of function alleles are segregating from parent A at E 1 and from parent B at E QTL : Bayes Seattle SISG: Yandell 006 38

epistasis in a serial pathway (GAC) Z keeps trait value high neither E 1 nor E is rate limiting X E 1 Y E Z loss of function alleles are segregating from parent B at E 1 and from parent A at E QTL : Bayes Seattle SISG: Yandell 006 39 y = µ + e,var( e) = σ G q1 var( µ ) = σ h QTL with epistasis same phenotype model overview q q partition of genotypic value with epistasis µ = µ + β q G σg = σ + σ + β q = h QTL : Bayes Seattle SISG: Yandell 006 40 1 1 + β + h q1 partition of genetic variance & heritability = σ + σ + σ + h 1 1

epistatic interactions model space issues -QTL interactions only? or general interactions among multiple QTL? partition of effects Fisher-Cockerham or tree-structured or? model search issues epistasis between significant QTL check all possible pairs when QTL included? allow higher order epistasis? epistasis with non-significant QTL whole genome paired with each significant QTL? pairs of non-significant QTL? Yi Xu (000) Genetics; Yi, Xu, Allison (003) Genetics; Yi et al. (005) Genetics QTL : Bayes Seattle SISG: Yandell 006 41 limits of epistatic inference power to detect effects epistatic model size grows exponentially H = 3 nqtl for general interactions power depends on ratio of n to model size want n / H to be fairly large (say > 5) n = 100, nqtl = 3, n / H 4 empty cells mess up adjusted (Type 3) tests missing q 1 Q / q 1 Q or q 1 Q q 3 / q 1 Q q 3 genotype null hypotheses not what you would expect can confound main effects and interactions can bias AA, AD, DA, DD partition QTL : Bayes Seattle SISG: Yandell 006 4

6. comparing QTL models balance model fit with model "complexity want maximum likelihood without too complicated a model information criteria quantifies the balance Bayes information criteria (BIC) for likelihood Bayes factors for Bayesian approach QTL : Bayes Seattle SISG: Yandell 006 43 Bayes factors & BIC pr( model1 Y ) / pr( model Y ) pr( Y model1) B 1 = = pr( model ) / pr( model ) pr( Y model ) what is a Bayes factor? ratio of posterior odds to prior odds ratio of model likelihoods BF is equivalent to LR statistic when comparing two nested models simple hypotheses (e.g. 1 vs QTL) BF is equivalent to Bayes Information Criteria (BIC) for general comparison of any models want Bayes factor to be substantially larger than 1 (say 10 or more) 1 1 log( B1) = log( LR) ( p p ) log( n) QTL : Bayes Seattle SISG: Yandell 006 44

Bayes factors and genetic model H H = number of QTL prior pr(h) chosen by user posterior pr(h y,m) sampled marginal histogram shape affected by prior pr(h) pr( H y, m) / pr( H ) + 1 = pr( H + 1 y, m) / pr( H + 1) BF H,H prior probability 0.00 0.10 0.0 0.30 pattern of QTL across genome gene action and epistasis e e exponential p Poisson e u uniform p p u u p eu u pu u u e e p p e e p e p e p e u u pu p e u 0 4 6 8 10 m = number of QTL QTL : Bayes Seattle SISG: Yandell 006 45 issues in computing Bayes factors BF insensitive to shape of prior on nqtl geometric, Poisson, uniform precision improves when prior mimics posterior BF sensitivity to prior variance on effects θ prior variance should reflect data variability resolved by using hyper-priors automatic algorithm; no need for user tuning easy to compute Bayes factors from samples sample posterior using MCMC posterior pr(nqtl y,m) is marginal histogram QTL : Bayes Seattle SISG: Yandell 006 46

examples in detail simulation study (after Stephens & Fisch (1998) days to flower for Brassica napus (plant) (n = 108) single chromosome with linked loci whole genome gonad shape in Drosophila spp. (insect) (n = 1000) multiple traits reduced by PC many QTL and epistasis expression phenotype (SCD1) in mice (n = 108) multiple QTL and epistasis obesity in mice (n = 41) epistatic QTLs with no main effects QTL : Data Seattle SISG: Yandell 006 1 simulation with 8 QTL simulated F intercross, 8 QTL (Stephens, Fisch 1998) n=00, heritability = 50% detected 3 QTL increase to detect all 8 n=500, heritability to 97% QTL chr loci effect 1 1 11 3 1 50 5 3 3 6 + 4 6 107 3 5 6 15 +3 6 8 3 4 7 8 54 +1 8 9 195 + ch1 ch ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 frequency in % 0 10 0 30 40 Genetic map posterior 7 8 9 10 11 1 13 number of QTL 0 50 100 150 00 QTL : Data Seattle SISG: Yandell 006

loci pattern across genome notice which chromosomes have persistent loci best pattern found 4% of the time Chromosome m 1 3 4 5 6 7 8 9 10 Count of 8000 8 0 1 0 0 0 1 0 3371 9 3 0 1 0 0 0 1 0 751 7 0 1 0 0 0 1 1 0 377 9 0 1 0 0 0 1 0 18 9 0 1 0 0 3 0 1 0 18 9 0 1 0 0 0 0 198 QTL : Data Seattle SISG: Yandell 006 3 Brassica napus: 1 chromosome 4-week & 8-week vernalization effect log(days to flower) genetic cross of Stellar (annual canola) Major (biennial rapeseed) 105 F1-derived double haploid (DH) lines homozygous at every locus (QQ or qq) 10 molecular markers (RFLPs) on LG9 two QTLs inferred on LG9 (now chromosome N) corroborated by Butruille (1998) exploiting synteny with Arabidopsis thaliana QTL : Data Seattle SISG: Yandell 006 4

Brassica 4- & 8-week data 8-week.5 3.0 3.5.5 3.5.5 3.0 3.5 4.0 4-week 0 4 6 8 10 8-week vernalization 0 4 6 8 summaries of raw data joint scatter plots (identity line) separate histograms.5 3.0 3.5 4.0 4-week vernalization QTL : Data Seattle SISG: Yandell 006 5 Brassica credible regions 4-week 8-week additive -0.6-0.4-0. 0.0 0. additive -0.3-0. -0.1 0.0 0.1 0. 0 40 60 80 distance (cm) 0 40 60 80 distance (cm) QTL : Data Seattle SISG: Yandell 006 6

B. napus 8-week vernalization whole genome study 108 plants from double haploid similar genetics to backcross: follow 1 gamete parents are Major (biennial) and Stellar (annual) 300 markers across genome 19 chromosomes average 6cM between markers median 3.8cM, max 34cM 83% markers genotyped phenotype is days to flowering after 8 weeks of vernalization (cooling) Stellar parent requires vernalization to flower Ferreira et al. (1994); Kole et al. (001); Schranz et al. (00) QTL : Data Seattle SISG: Yandell 006 7 row 1: # QTL row : pattern Bayesian model assessment col 1: posterior col : Bayes factor note error bars on bf evidence suggests 4-5 QTL N(-3),N3,N16 QTL posterior 0.0 0.1 0. 0.3 model posterior 0.0 0.1 0. 0.3 QTL posterior 1 3 5 7 9 11 number of QTL * pattern posterior :,1 3:*,1 :,13 :,3 :,16 :,11 3:*,3 :,15 4:*,3,16 3:*,13 :,14 1 3 5 7 9 11 13 1 3 5 7 9 11 13 model index model index QTL : Data Seattle SISG: Yandell 006 8 posterior / prior 1 5 50 500 posterior / prior 5 e-01 1 e+01 5 e+0 Bayes factor ratios strong moderate weak 1 3 5 7 9 11 number of QTL 1 Bayes factor ratios 3 strong moderate weak 3 4 3

Bayesian estimates of loci & effects histogram of loci blue line is density red lines at estimates loci histogram 0.00 0.0 0.04 0.06 napus8 summaries with pattern 1,1,,3 and m 4 N N3 N16 0 50 100 150 00 50 estimate additive effects (red circles) grey points sampled from posterior blue line is cubic spline dashed line for SD additive -0.08-0.0 0.0 N N3 N16 0 50 100 150 00 50 QTL : Data Seattle SISG: Yandell 006 9 Bayesian model diagnostics pattern: N(),N3,N16 col 1: density col : boxplots by m environmental variance σ =.008, σ =.09 heritability h = 5% LOD = 16 (highly significant) but note change with m density density density 0 50 100 00 0 1 3 4 5 0.00 0.04 0.08 0.1 0.004 0.006 0.008 0.010 0.01 marginal envvar, m 4 0. 0.3 0.4 0.5 0.6 0.7 marginal heritability, m 4 5 10 15 0 5 marginal LOD, m 4 4 5 6 7 8 9 11 1 envvar conditional on number of QTL QTL : Data Seattle SISG: Yandell 006 10 envvar heritability LOD 0.006 0.008 0.010 0.30 0.40 0.50 0.60 4 5 6 7 8 9 11 1 heritability conditional on number of QT 10 1 14 16 18 0 4 5 6 7 8 9 11 1 LOD conditional on number of QTL

shape phenotype in BC study indexed by PC1 Liu et al. (1996) Genetics QTL : Data Seattle SISG: Yandell 006 11 shape phenotype via PC Liu et al. (1996) Genetics QTL : Data Seattle SISG: Yandell 006 1

Zeng et al. (000) CIM vs. MIM composite interval mapping (Liu et al. 1996) narrow peaks miss some QTL multiple interval mapping (Zeng et al. 000) triangular peaks both conditional 1-D scans fixing all other "QTL" QTL : Data Seattle SISG: Yandell 006 13 cim CIM, MIM and IM pairscan mim QTL : Data Seattle SISG: Yandell 006 14

QTL + epistasis: IM versus multiple imputation IM pairscan multiple imputation QTL : Data Seattle SISG: Yandell 006 15 multiple QTL: CIM, MIM and BIM cim mim bim QTL : Data Seattle SISG: Yandell 006 16

studying diabetes in an F segregating cross of inbred lines B6.ob x BTBR.ob F1 F selected mice with ob/ob alleles at leptin gene (chr 6) measured and mapped body weight, insulin, glucose at various ages (Stoehr et al. 000 Diabetes) sacrificed at 14 weeks, tissues preserved gene expression data Affymetrix microarrays on parental strains, F1 key tissues: adipose, liver, muscle, β-cells novel discoveries of differential expression (Nadler et al. 000 PNAS; Lan et al. 00 in review; Ntambi et al. 00 PNAS) RT-PCR on 108 F mice liver tissues 15 genes, selected as important in diabetes pathways SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase, PDI, QTL : Data Seattle SISG: Yandell 006 17 Multiple Interval Mapping (QTLCart) SCD1: multiple QTL plus epistasis! effect (add=blue, dom=red) -0.5 0.0 0.5 1.0 LOD 0 4 6 8 0 50 100 150 00 50 300 chr chr5 chr9 0 50 100 150 00 50 300 chr chr5 chr9 QTL : Data Seattle SISG: Yandell 006 18

Bayesian model assessment: number of QTL for SCD1 QTL posterior Bayes factor ratios QTL posterior 0.00 0.05 0.10 0.15 0.0 0.5 posterior / prior 1 5 10 50 500 strong moderate weak 1 3 4 5 6 7 8 9 11 13 1 3 4 5 6 7 8 9 11 13 number of QTL number of QTL QTL : Data Seattle SISG: Yandell 006 19 Bayesian LOD and h for SCD1 density 0.00 0.10 LOD 5 10 15 0 0 5 10 15 0 marginal LOD, m 1 1 3 4 5 6 7 8 9 10 1 14 LOD conditional on number of QTL density 0 1 3 4 heritability 0.1 0.3 0.5 0.0 0.1 0. 0.3 0.4 0.5 0.6 0.7 marginal heritability, m 1 1 3 4 5 6 7 8 9 10 1 14 heritability conditional on number of QTL QTL : Data Seattle SISG: Yandell 006 0

Bayesian model assessment: chromosome QTL pattern for SCD1 model posterior 0.00 0.05 0.10 0.15 3:1,,3 4:*1,,3 pattern posterior 4:1,,*3 4:1,*,3 5:3*1,,3 5:*1,,*3 5:*1,*,3 6:3*1,,*3 6:3*1,*,3 5:1,*,*3 6:4*1,,3 6:*1,*,*3 :1,3 3:*1, :1, 1 3 5 7 9 11 13 15 model index posterior / prior 0. 0.4 0.6 0.8 3 4 4 4 Bayes factor ratios 5 6 moderate 6 5 5 6 5 6 weak 3 1 3 5 7 9 11 13 15 model index QTL : Data Seattle SISG: Yandell 006 1 trans-acting QTL for SCD1 (no epistasis yet: see Yi, Xu, Allison 003) dominance? QTL : Data Seattle SISG: Yandell 006

-D scan: assumes only QTL! epistasis LOD peaks joint LOD peaks QTL : Data Seattle SISG: Yandell 006 3 sub-peaks can be easily overlooked! QTL : Data Seattle SISG: Yandell 006 4

epistatic model fit QTL : Data Seattle SISG: Yandell 006 5 Cockerham epistatic effects QTL : Data Seattle SISG: Yandell 006 6

obesity in CAST/Ei BC onto M16i 41 mice (Daniel Pomp) (13 male, 08 female) 9 microsatellites on 19 chromosomes 114 cm map subcutaneous fat pads pre-adjusted for sex and dam effects Yi, Yandell, Churchill, Allison, Eisen, Pomp (005) Genetics (in press) QTL : Data Seattle SISG: Yandell 006 7 non-epistatic analysis LOD score 0 5 10 15 0 Bayes factor 0 50 100 150 00 50 300 single QTL LOD profile multiple QTL Bayes factor profile QTL : Data Seattle SISG: Yandell 006 8

posterior profile of main effects in epistatic analysis Heritability Main effect 0.00 0.05 0.10 0.15 0.0-0.8-0.4 0.0 0. 0.4 Bayes factor 0 10 0 30 40 50 main effects & heritability profile Bayes factor profile QTL : Data Seattle SISG: Yandell 006 9 posterior profile of main effects in epistatic analysis QTL : Data Seattle SISG: Yandell 006 30

model selection via Bayes factors for epistatic model number of QTL QTL pattern QTL : Data Seattle SISG: Yandell 006 31 posterior probability of effects 0.0 0. 0.4 0.6 0.8 1.0 Posterior probability Chr13(0,4)*Chr15(1,31) Chr7(50,75)*Chr19(15,45) Chr(7,85)*Chr14(1,41) Chr15(1,31)*Chr19(15,45) Chr(7,85)*Chr13(0,4) Chr1(6,54)*Chr18(43,71) Chr14(1,41) Chr7(50,75) Chr19(15,45) Chr1(6,54) Chr18(43,71) Chr15(1,31) Chr13(0,4) Chr(7,85) QTL : Data Seattle SISG: Yandell 006 3

scatterplot estimates of epistatic loci QTL : Data Seattle SISG: Yandell 006 33 stronger epistatic effects QTL : Data Seattle SISG: Yandell 006 34

model selection for pairs QTL : Data Seattle SISG: Yandell 006 35 our RJ-MCMC software R: www.r-project.org freely available statistical computing application R library(bim) builds on Broman s library(qtl) QTLCart: statgen.ncsu.edu/qtlcart Bmapqtl incorporated into QTLCart (S Wang 003) www.stat.wisc.edu/~yandell/qtl/software/bmqtl R/bim initially designed by JM Satagopan (1996) major revision and extension by PJ Gaffney (001) whole genome, multivariate and long range updates speed improvements, pre-burnin built as official R library (H Wu, Yandell, Gaffney, CF Jin 003) R/bmqtl collaboration with N Yi, H Wu, GA Churchill initial working module: Winter 005 improved module and official release: Summer/Fall 005 major NIH grant (PI: Yi) QTL : Data Seattle SISG: Yandell 006 36

Multiple Traits & Microarrays 1. why study multiple traits together? -10 diabetes case study. design issues 11-13 selective phenotyping 3. why are traits correlated? 14-17 close linkage or pleiotropy? 4. modern high throughput 18-31 principal components & discriminant analysis 5. graphical models 3-36 building causal biochemical networks Traits NCSU QTL II: Yandell 005 1 1. why study multiple traits together? avoid reductionist approach to biology address physiological/biochemical mechanisms Schmalhausen (194); Falconer (195) separate close linkage from pleiotropy 1 locus or linked loci? identify epistatic interaction or canalization influence of genetic background establish QTL x environment interactions decompose genetic correlation among traits increase power to detect QTL Traits NCSU QTL II: Yandell 005

Type Diabetes Mellitus Traits NCSU QTL II: Yandell 005 3 Insulin Requirement decompensation from Traits Unger & Orci FASEB J. (001) 15,31 NCSU QTL II: Yandell 005 4

glucose insulin (courtesy AD Attie) Traits NCSU QTL II: Yandell 005 5 studying diabetes in an F segregating cross of inbred lines B6.ob x BTBR.ob F1 F selected mice with ob/ob alleles at leptin gene (chr 6) measured and mapped body weight, insulin, glucose at various ages (Stoehr et al. 000 Diabetes) sacrificed at 14 weeks, tissues preserved gene expression data Affymetrix microarrays on parental strains, F1 (Nadler et al. 000 PNAS; Ntambi et al. 00 PNAS) RT-PCR for a few mrna on 108 F mice liver tissues (Lan et al. 003 Diabetes; Lan et al. 003 Genetics) Affymetrix microarrays on 60 F mice liver tissues design (Jin et al. 004 Genetics tent. accept) analysis (work in prep.) Traits NCSU QTL II: Yandell 005 6

why map gene expression as a quantitative trait? cis- or trans-action? does gene control its own expression? or is it influenced by one or more other genomic regions? evidence for both modes (Brem et al. 00 Science) simultaneously measure all mrna in a tissue ~5,000 mrna active per cell on average ~30,000 genes in genome use genetic recombination as natural experiment mechanics of gene expression mapping measure gene expression in intercross (F) population map expression as quantitative trait (QTL) adjust for multiple testing Traits NCSU QTL II: Yandell 005 7 LOD map for PDI: cis-regulation (Lan et al. 003) Traits NCSU QTL II: Yandell 005 8

mapping microarray data single gene expression as trait (single QTL) Dumas et al. (000 J Hypertens) overview, wish lists Jansen, Nap (001 Trends Gen); Cheung, Spielman (00); Doerge (00 Nat Rev Gen); Bochner (003 Nat Rev Gen) microarray scan via 1 QTL interval mapping Brem et al. (00 Science); Schadt et al. (003 Nature); Yvert et al. (003 Nat Gen) found putative cis- and trans- acting genes multivariate and multiple QTL approach Lan et al. (003 Genetics) Traits NCSU QTL II: Yandell 005 9 Traits NCSU QTL II: Yandell 005 10

. design issues for expensive phenotypes (thanks to CF Amy Jin) microarray analysis ~ $1000 per mouse can only afford to assay 60 of 108 in panel wish to not lose much power to detect QTL selective phenotyping genotype all individuals in panel select subset for phenotyping previous studies can provide guide Traits NCSU QTL II: Yandell 005 11 selective phenotyping emphasize additive effects in F F design: 1QQ:Qq:1qq best design for additive only: 1QQ:1Qq drop heterozygotes (Qq) reduce sample size by half with no power loss emphasize general effects in F best design: 1QQ:1Qq:1qq drop half of heterozygotes (5% reduction) multiple loci same idea but care is needed drop 7/16 of sample for two unlinked loci Traits NCSU QTL II: Yandell 005 1

is this relevant to large QTL studies? why not phenotype entire mapping panel? selectively phenotype subset of 50-67% may capture most effects with little loss of power two-stage selective phenotyping? genotype & phenotype subset of 100-300 could selectively phenotype using whole genome QTL map to identify key genomic regions selectively phenotype subset using key regions Traits NCSU QTL II: Yandell 005 13 3. why are traits correlated? environmental correlation non-genetic, controllable by design historical correlation (learned behavior) physiological correlation (same body) genetic correlation pleiotropy one gene, many functions common biochemical pathway, splicing variants close linkage two tightly linked genes genotypes Q are collinear Traits NCSU QTL II: Yandell 005 14

interplay of pleiotropy & correlation pleiotropy only correlation only Korol et al. (001) both Traits NCSU QTL II: Yandell 005 15 3 correlated traits (Jiang Zeng 1995) ellipses centered on genotypic value width for nominal frequency main axis angle environmental correlation 3 QTL, F ρ P = 0.3, ρ G = 0.54, ρ E = 0. 7 genotypes note signs of genetic and environmental correlation jiang1 - -1 0 1-3 - -1 0 1 3-3 - -1 0 1 3 jiang3 ρ P = -0.07, ρ G = -0., ρ E = 0 jiang jiang3 Traits NCSU QTL II: Yandell 005 16 jiang -3 - -1 0 1 3 jiang1 - -1 0 1 ρ P = 0.06, ρ G = 0.68, ρ E = -0. - -1 0 1

pleiotropy or close linkage? traits, qtl/trait pleiotropy @ 54cM linkage @ 114,18cM Jiang Zeng (1995) Traits NCSU QTL II: Yandell 005 17 4. modern high throughput biology measuring the molecular dogma of biology DNA RNA protein metabolites measured one at a time only a few years ago massive array of measurements on whole systems ( omics ) thousands measured per individual (experimental unit) all (or most) components of system measured simultaneously whole genome of DNA: genes, promoters, etc. all expressed RNA in a tissue or cell all proteins all metabolites systems biology: focus on network interconnections chains of behavior in ecological community underlying biochemical pathways genetics as one experimental tool perturb system by creating new experimental cross each individual is a unique mosaic Traits NCSU QTL II: Yandell 005 18

coordinated expression in mouse genome (Schadt et al. 003) expression pleiotropy in yeast genome (Brem et al. 00) Traits NCSU QTL II: Yandell 005 19 finding heritable traits (from Christina Kendziorski) reduce 30,000 traits to 300-3,000 heritable traits probability a trait is heritable pr(h Y,Q) = pr(y Q,H) pr(h Q) / pr(y Q) Bayes rule pr(y Q) = pr(y Q,H) pr(h Q) + pr(y Q, not H) pr(not H Q) phenotype averaged over genotypic mean µ pr(y Q, not H) = f 0 (Y) = f(y G ) pr(g) dg pr(y Q, H) = f 1 (Y Q) = q f 0 (Y q ) if not H if heritable Y q = {Y i Q i =q} = trait values with genotype Q=q Traits NCSU QTL II: Yandell 005 0

hierarchical model for expression phenotypes (EB arrays: Christina Kendziorski) f ( ) Y ~ G QQ QQ ~ f Qq ( G Qq ) mrna phenotype models given genotypic mean G q Y Y f ( ) qq ~ G qq G QQ G Qq G qq G q ~ pr ( ) common prior on G q across all mrna (use empirical Bayes to estimate prior) G QQ G Qq G qq Traits NCSU QTL II: Yandell 005 1 expression meta-traits: pleiotropy reduce 3,000 heritable traits to 3 meta-traits(!) what are expression meta-traits? pleiotropy: a few genes can affect many traits transcription factors, regulators weighted averages: Z = YW principle components, discriminant analysis infer genetic architecture of meta-traits model selection issues are subtle missing data, non-linear search what is the best criterion for model selection? time consuming process heavy computation load for many traits subjective judgement on what is best Traits NCSU QTL II: Yandell 005

PC for two correlated mrna ettf1 7.6 7.8 8.0 8. 8.4 8.6 PC (7%) -0. -0.1 0.0 0.1 0. 8. 8.4 8.6 8.8 9.0 9. 9.4-0.5 0.0 0.5 etif3s6 PC1 (93%) Traits NCSU QTL II: Yandell 005 3 PC across microarray functional groups Affy chips on 60 mice ~40,000 mrna 500+ mrna show DE (via EB arrays with marker regression) 1500+ organized in 85 functional groups -35 mrna / group which are interesting? examine PC1, PC circle size = # unique mrna Traits NCSU QTL II: Yandell 005 4

84 PC meta-traits by functional group focus on interesting groups Traits NCSU QTL II: Yandell 005 5 red lines: peak for PC meta-trait black/blue: peaks for mrna traits arrows: cis-action? Traits NCSU QTL II: Yandell 005 6

(portion of) chr 4 region chr 15 region? Traits NCSU QTL II: Yandell 005 7 interaction plots for DA meta-traits DA for all pairs of markers: separate 9 genotypes based on markers (a) same locus pair found with PC meta-traits (b) Chr region interesting from biochemistry (Jessica Byers) (c) Chr 5 & Chr 9 identified as important for insulin, SCD Traits NCSU QTL II: Yandell 005 8

comparison of PC and DA meta-traits on 1500+ mrna traits genotypes from Chr 4/Chr 15 locus pair (circle=centroid) PC captures spread without genotype DA creates best separation by genotype PC (1%) -10-5 0 5 10 DA1 (37%) -3 - -1 0 1 3 4 H.H A.B B.B A.B A.B H.B H.H H.H A.H A.B H.H B.A H.A H.H H.H H.A A.A H.HB.B H.H H.B H.A H.H H.B H.A H.B A.B H.H H.A B.A A.A A.A H.B -15-10 -5 0 5 10 15 PC1 (5%) H.H A.B B.B A.B H.H H.BA.B H.H PC ignores genotype A.H A.B B.A H.H H.A B.B H.HH.A H.H H.B A.A H.H.A H.H H.H H.B A.B H.A H.AH.H H.BH.H H.H H.A A.B correlation of PC and DA meta-traits A.B A.A H.B H.B A.H A.H A.A B.AH.A H.B H.A A.H A.A A.H A.H B.A A.H A.H A.H H.H A.H A.H A.H A.H H.H A.H A.A H.A A.H B.A A.H A.H DA (18%) -3 - -1 0 1 3 DA (18%) -3 - -1 0 1 3 H.H A.B H.B H.B H.AH.B B.A H.A H.H A.H.A B.A H.B H.A H.B H.H H.A A.H H.H H.B H.A A.A H.H H.H H.A H.H B.A H.H H.H H.H A.B B.B A.B H.H A.A A.A A.A H.H A.B A.B B.B A.B A.H A.H -3 - -1 0 1 3 4 DA1 (37%) H.B H.H A.H H.B H.A H.A H.A H.H H.B H.H A.HH.B H.B H.H H.H H.H B.B H.H A.A A.B DA uses genotype A.A A.A A.B H.H A.B H.B A.H A.H H.A B.A H.A H.A H.HH.A A.A H.H B.A A.H A.H A.H A.HH.H A.H A.B B.B B.A A.H A.B A.H A.H A.H H.H A.B A.H note better spread of circles -15-10 -5 0 5 10 15-10 -5 0 5 10 PC1 (5%) PC (1%) Traits NCSU QTL II: Yandell 005 9 relating meta-traits to mrna traits SCD trait log expression DA meta-trait standard units Traits NCSU QTL II: Yandell 005 30

DA: a cautionary tale (184 mrna with cor > 0.5; mouse 13 drives heritability) Traits NCSU QTL II: Yandell 005 31 building graphical models infer genetic architecture of meta-trait E(Z Q, M) = µ q = β 0 + {q in M} β qk find mrna traits correlated with meta-trait Z YW for modest number of traits Y extend meta-trait genetic architecture M = genetic architecture for Y expect subset of QTL to affect each mrna may be additional QTL for some mrna Traits NCSU QTL II: Yandell 005 3

posterior for graphical models posterior for graph given multivariate trait & architecture pr(g Y, Q, M) = pr(y Q, G) pr(g M) / pr(y Q) pr(g M) = prior on valid graphs given architecture multivariate phenotype averaged over genotypic mean µ pr(y Q, G) = f 1 (Y Q, G) = q f 0 (Yq G) f 0 (Y q G) = f(y q µ, G) pr(µ) dµ graphical model G implies correlation structure on Y genotype mean prior assumed independent across traits pr(µ) = t pr(µ t ) Traits NCSU QTL II: Yandell 005 33 from graphical models to pathways build graphical models QTL RNA1 RNA class of possible models best model = putative biochemical pathway parallel biochemical investigation candidate genes in QTL regions laboratory experiments on pathway components Traits NCSU QTL II: Yandell 005 34

graphical models (with Elias Chaibub) f 1 (Y Q, G=g) = f 1 (Y 1 Q) f 1 (Y Q, Y 1 ) QTL DNA RNA protein unobservable meta-trait QTL D1 R1 P1 observable cis-action? D R P observable trans-action Traits NCSU QTL II: Yandell 005 35 summary expression QTL are complicated need to consider multiple interacting QTL coherent approach for high-throughput traits identify heritable traits dimension reduction to meta-traits mapping genetic architecture extension via graphical models to networks many open questions model selection computation efficiency inference on graphical models Traits NCSU QTL II: Yandell 005 36