Seattle Summer Institute : Systems Genetics for Experimental Crosses

Size: px
Start display at page:

Download "Seattle Summer Institute : Systems Genetics for Experimental Crosses"

Transcription

1 Seattle Summer Institute : Systems Genetics for Experimental Crosses Brian S. Yandell, UW-Madison Elias Chaibub Neto, Sage Bionetworks Real knowledge is to know the extent of one s ignorance. Confucius (on a bench in Seattle) SysGen: Overview Seattle SISG: Yandell Daily Schedule Monday 8:30-10 Introductions; Overview of System Genetics :30-12 QTL Model Selection :30-3 Gene Mapping for Multiple Correlated Traits :30-5 Hands On Lab: R/qtl Tuesday 8:30-10 Permutation Tests for Correlated Traits :30-12 Scanning the Genome for Causal Architecture :30-33 Causal Phenotype Models Driven by QTL :30-5 Hands On Lab: R/qtlhot, R/qtlnet Wednesday 8:30-10 Incorporating Biological Knowledge :30-12 Platforms for eqtl Analysis SysGen: Overview Seattle SISG: Yandell

2 Overview of Systems Genetics Big idea: how do genes affect organisms? Measuring system(s) state(s) of an organism QTL mapping as tool toward goal Making sense of multiple traits Connecting traits to biochemical i pathways Putting it all together: workflows SysGen: Overview Seattle SISG: Yandell How do genes affect organisms? Dogma (with exceptions) DNA -> RNA -> protein -> phenotype redundancy/overlap of biochemical pathways System state of organism accumulated effects over time of many ygenes environmental influences SysGen: Overview Seattle SISG: Yandell

3 SysGen: Overview Seattle SISG: Yandell Biochemical Pathways chart, Gerhard Michal, Beohringer Mannheim SysGen: Overview Seattle SISG: Yandell

4 SysGen: Overview Seattle SISG: Yandell systems genetics approach study genetic architecture of quantitative traits in model systems, and ultimately humans interrogate t single resource population for variation DNA sequence, transcript abundance, proteins, metabolites multiple organismal phenotypes multiple environments detailed map of genetic variants associated with each organismal phenotype in each environment functional context to interpret phenotypes genetic underpinnings of multiple phenotypes genetic basis of genotype by environment interaction Sieberts, Schadt (2007 Mamm Genome); Emilsson et al. (2008 Nature) Chen et al Nature); Ayroles et al. MacKay (2009 Nature Genetics) SysGen: Overview Seattle SISG: Yandell

5 Measuring an organism Phenotype measurement is challenging! Cannot measure exactly what is important Instead measure multiple related traits Multiple traits at one time Same trait measured over time SysGen: Overview Seattle SISG: Yandell QTL as tool toward goal Identifying important genomic region(s) But they may contain many genes Journey from QTL to gene References Corroborative evidence from multiple traits Reassurance Increased power? Evidence at a system level (pathways, etc.)? SysGen: Overview Seattle SISG: Yandell

6 cross two inbred lines linkage disequilibrium associations linked segregating QTL (after Gary Churchill) QTL Marker Trait SysGen: Overview Seattle SISG: Yandell Making sense of multiple traits Aligning QTL mapping results Mapping correlated traits Inferring hot spots where many traits map Organizing traits into correlated sets Function, clustering, QTL alignment Inferring (causal) networks SysGen: Overview Seattle SISG: Yandell

7 eqtl Tools Seattle SISG: Yandell Genetic architecture of gene expression in 6 tissues. A Tissue specific panels illustrate the relationship between the genomic location of a gene (y axis) to where that gene s mrna shows an eqtl (LOD > 5), as a function of genome position (x axis). Circles represent eqtls that showed either cis linkage (black) or translinkage (colored) according to LOD score. Genomic hot spots, where many eqtls map in trans, are apparent as vertical bands that show either tissue selectivity (e.g., Chr 6 in the islet, ) or are present in all tissues (e.g., Chr 17, ). B The total number of eqtls identified in 5 cm genomic windows is plotted for each tissue; total eqtls for all positions is shown in upper right corner for each panel. The peak number of eqtls exceeding 1000 per 5 cm is shown for islets (Chrs 2, 6 and 17), liver (Chrs 2 and 17) and kidney (Chr 17).

8 Figure 4 Tissue specific hotspots with eqtl and SNP architecture for Chrs 1, 2 and 17. The number of eqtls for each tissue (left axis) and the number of SNPs between B6 and BTBR (right axis) that were identified within a 5 cm genomic window is shown for Chr 1 (A), Chr 2 (B) Chr17(C). The location of tissue specific hotspots are identified by their number corresponding to that in Table 1. eqtl and SNP architecture is shown for all chromosomes in supplementary material. BxH ApoE-/- chr 2: causal architecture hotspot 12 causal calls eqtl Tools Seattle SISG: Yandell

9 BxH ApoE-/- causal network for transcription factor Pscdbp causal trait work of Elias Chaibub Neto eqtl Tools Seattle SISG: Yandell Connecting to biochemical pathways Gene ontology (GO) Functional groups Gene enrichment tests KO, PPI, TF, interactome databases Networks built from databases Hybrid networks using QTL and databases Proof of concept experiments Do findings apply to your organisms? SysGen: Overview Seattle SISG: Yandell

10 KEGG pathway: pparg in mouse SysGen: Overview Seattle SISG: Yandell phenotypic buffering of molecular QTL Fu et al. Jansen (2009 Nature Genetics) SysGen: Overview Seattle SISG: Yandell

11 Putting it all together: workflows Ideally have all tools & data connected Reduce duplication of copies, effort Reduce errors, save time Make tools more broadly available User-friendly interfaces Documentation ti & examples Enable comparison of methods Reduce start-up time & translation errors SysGen: Overview Seattle SISG: Yandell Swertz & Jansen (2007) eqtl Tools Seattle SISG: Yandell

12 what is the goal of QTL study? uncover underlying biochemistry identify how networks function, break down find useful candidates for (medical) intervention epistasis may play key role statistical goal: maximize number of correctly identified QTL basic science/evolution how is the genome organized? identify units of natural selection additive effects may be most important (Wright/Fisher debate) statistical goal: maximize number of correctly identified QTL select elite individuals predict phenotype (breeding value) using suite of characteristics (phenotypes) translated into a few QTL statistical goal: mimimize prediction error SysGen: Overview Seattle SISG: Yandell problems of single QTL approach wrong model: biased view fool yourself: bad guess at locations, effects detect ghost QTL between linked loci miss epistasis completely low power bad science use best tools for the job maximize scarce research resources leverage already big investment in experiment SysGen: Overview Seattle SISG: Yandell

13 advantages of multiple QTL approach improve statistical power, precision increase number of QTL detected better estimates of loci: less bias, smaller intervals improve inference of complex genetic architecture patterns and individual elements of epistasis appropriate estimates of means, variances, covariances asymptotically unbiased, efficient assess relative contributions of different QTL improve estimates of genotypic values less bias (more accurate) and smaller variance (more precise) mean squared error = MSE = (bias) 2 + variance SysGen: Overview Seattle SISG: Yandell major QTL on linkage map 3 Pareto diagram of QTL effects additive effe ect (modifiers) major minor QTL QTL polygenes rank order of QTL SysGen: Overview Seattle SISG: Yandell

14 Gene Action and Epistasis additive, dominant, recessive, general effects of a single QTL (Gary Churchill) SysGen: Overview Seattle SISG: Yandell additive effects of two QTL (Gary Churchill) μ q = μ + β q1 + β q2 SysGen: Overview Seattle SISG: Yandell

15 Epistasis (Gary Churchill) The allelic state at one locus can mask or uncover the effects of allelic variation at another. - W. Bateson, SysGen: Overview Seattle SISG: Yandell epistasis in parallel pathways (GAC) Z keeps trait value low X E 1 neither E 1 nor E 2 is rate limiting Y E 2 Z loss of function alleles are segregating from parent A at E 1 and from parent B at E 2 SysGen: Overview Seattle SISG: Yandell

16 epistasis in a serial pathway (GAC) Z keeps trait value high X E 1 Y E 2 Z either E 1 or E 2 is rate limiting loss of function alleles are segregating from parent B at E 1 or from parent A at E 2 SysGen: Overview Seattle SISG: Yandell Bayesian vs. classical QTL study classical study maximize over unknown effects test for detection of QTL at loci model selection in stepwise fashion Bayesian study average over unknown effects estimate chance of detecting QTL sample all possible models both approaches average over missing QTL genotypes scan over possible loci QTL 2: Overview Seattle SISG: Yandell

17 Bayesian idea Reverend Thomas Bayes ( ) part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 1763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?) basic idea (from Bayes original example) two billiard balls tossed at random (uniform) on table where is first ball if the second is to its left? prior: anywhere on the table posterior: more likely toward right end of table QTL 2: Overview Seattle SISG: Yandell QTL model selection: key players observed measurements y = phenotypic trait m = markers & linkage map i = individual index (1,,n) missing data missing marker data q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities λ = QT locus (or loci) μ = phenotype model parameters γ = QTL model/genetic architecture pr(q m,λ,γ) λ genotype model grounded by linkage map, experimental cross recombination yields multinomial for q given m pr(y q,μ,γ) phenotype model distribution shape (assumed normal here) unknown parameters μ (could be non-parametric) observed mx Yy missing Qq unknown λλ θμ γ after Sen Churchill (2001) QTL 2: Overview Seattle SISG: Yandell

18 Bayes posterior vs. maximum likelihood LOD: classical Log ODds maximize likelihood over effects µ R/qtl scanone/scantwo: method = em LPD: Bayesian Log Posterior Density average posterior over effects µ R/qtl scanone/scantwo: method = imp LOD( λ ) = log {max pr( y m, μ, λ )} + c LPD( λ) = log pr( y m, μ, λ) = {pr( λ m) q μ pr( y m, μ, λ)pr( μ) dμ} + C likelihood mixes over missing QTL genotypes: pr( y q, μ)pr( q m, λ) QTL 2: Overview Seattle SISG: Yandell LOD & LPD: 1 QTL n.ind = 100, 1 cm marker spacing QTL 2: Overview Seattle SISG: Yandell

19 LOD & LPD: 1 QTL n.ind = 100, 10 cm marker spacing QTL 2: Overview Seattle SISG: Yandell marginal LOD or LPD compare two genetic architectures (γ 2,γ 1 ) at each locus with (γ 2 ) or without (γ 1 ) another QTL at locus λ preserve model hierarchy h (e.g. drop any epistasis i with QTL at λ) with (γ 2 ) or without (γ 1 ) epistasis with QTL at locus λ γ 2 contains γ 1 as a sub-architecture allow for multiple QTL besides locus being scanned architectures γ 1 and γ 2 may have QTL at several other loci use marginal LOD, LPD or other diagnostic posterior, Bayes factor, heritability LOD( λ γ ) LOD( λ γ ) 2 LPD( λ γ ) LPD( λ γ ) QTL 2: Overview Seattle SISG: Yandell

20 LPD: 1 QTL vs. multi-qtl marginal contribution to LPD from QTL at λ 1 st QTL 2 nd QTL 2 nd QTL QTL 2: Overview Seattle SISG: Yandell substitution effect: 1 QTL vs. multi-qtl single QTL effect vs. marginal effect from QTL at λ 1 st QTL 2 nd QTL 2 nd QTL QTL 2: Overview Seattle SISG: Yandell

21 why use a Bayesian approach? first, do both classical ca and Bayesian a always nice to have a separate validation each approach has its strengths and weaknesses classical approach works quite well selects large effect QTL easily directly builds on regression ideas for model selection Bayesian approach is comprehensive samples most probable genetic architectures formalizes model selection within one framework readily (!) extends to more complicated problems QTL 2: Overview Seattle SISG: Yandell comparing models balance model fit against model complexity want to fit data well (maximum likelihood) without getting too complicated a model smaller model bigger model fit model miss key features fits better estimate phenotype may be biased no bias predict new data may be biased no bias interpret model easier more complicated estimate effects low variance high variance SysGen: Overview Seattle SISG: Yandell

22 methods QTL software options approximate QTL by markers exact multiple l QTL interval mapping software platforms MapMaker/QTL (obsolete) QTLCart (statgen.ncsu.edu/qtlcart) R/qtl ( R/qtlbim ( Yandell, Bradbury (2007) book chapter SysGen: Overview Seattle SISG: Yandell QTL software platforms QTLCart (statgen.ncsu.edu/qtlcart) includes features of original MapMaker/QTL not designed for building a linkage map easy to use Windows version WinQTLCart based on Lander-Botstein maximum likelihood LOD extended to marker cofactors (CIM) and multiple QTL (MIM) epistasis, some covariates (GxE) stepwise model selection using information criteria some multiple trait options OK graphics R/qtl ( includes functionality of classical interval mapping many useful tools to check genotype data, build linkage maps excellent graphics several methods for 1-QTL and 2-QTL mapping epistasis, covariates (GxE) tools available for multiple QTL model selection SysGen: Overview Seattle SISG: Yandell

23 QTL Model Selection 1. Bayesian strategy 2. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell QTL model selection: key players observed measurements y = phenotypic trait m = markers & linkage map i = individual index (1,,n) missing i data missing marker data q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities λ = QT locus (or loci) μ = phenotype model parameters γ = QTL model/genetic architecture pr(q m,λ,γ) genotype model observed mx Yy missing grounded ddby lik linkage map, experimental tlcross recombination yields multinomial for q given m pr(y q,μ,γ) phenotype model distribution shape (assumed normal here) unknown parameters μ (could be non-parametric) Qq unknown λλ θμ γ after Sen Churchill (2001) Model Selection Seattle SISG: Yandell

24 QTL mapping (from ZB Zeng) phenotype model pr(y q,μ,γ) genotypes Q pr(q m,λ,γ) λ markers M Model Selection Seattle SISG: Yandell classical likelihood approach genotype model pr(q m,λ,γ) missing genotypes q depend on observed markers m across genome phenotype model pr(y q,μ,γ) link phenotypes y to genotypes q LOD( λ ) = log 10{max pr( y m, μ, λ )} + c ( 10 μ likelihood mixes over missing QTL genotypes: pr( y m, μ, λ) = q pr( y q, μ)pr( q m, λ) Model Selection Seattle SISG: Yandell

25 EM approach Iterate E and M steps expectation (E): geno prob s pr(q m,λ,γ) maximization (M): pheno model parameters mean, effects, variance careful attention when many QTL present Multiple papers by Zhao-Bang Zeng and others Start with simple initial model Add QTL, epistatic effects sequentially Model Selection Seattle SISG: Yandell classic model search initial model from single QTL analysis search for additional QTL search for epistasis between pairs of QTL Both in model? One in model? Neither? Refine model Update QTL positions Check if existing QTL can be dropped Analogous to stepwise regression Model Selection Seattle SISG: Yandell

26 comparing models (details later) balance model fit against model complexity want to fit data well (maximum likelihood) without getting too complicated a model smaller model bigger model fit model miss key features fits better estimate phenotype may be biased no bias predict new data may be biased no bias interpret model easier more complicated estimate effects low variance high variance SysGen: Overview Seattle SISG: Yandell Bayesian strategy for QTL study augment data (y,m) with missing genotypes q study unknowns (μ,λ,γ) given augmented data (y,m,q) find better genetic architectures γ find most likely genomic regions = QTL = λ estimate phenotype parameters = genotype means = μ sample from posterior in some clever way multiple imputation (Sen Churchill 2002) Markov chain Monte Carlo (MCMC) (Satagopan et al. 1996; Yi et al. 2005, 2007) likelihood* prior posterior = constant phenotype likelihood*[prior for q, μ, λ, γ ] posterior for q, μ, λ, γ = constant pr( y q, μ, γ ) *[pr( q m, λ, γ )pr( μ γ )pr( λ m, γ )pr( γ )] pr( q, μ, λ, γ y, m) = pr( y m) Model Selection Seattle SISG: Yandell

27 Bayes posterior for normal data n lar rge n small prior r prior mean actual mean y = phenotype values prior rge n small n lar prior mean actual me an y = phenotype values small prior variance large prior variance Model Selection Seattle SISG: Yandell Posterior on genotypic means? phenotype model pr(y q,μ) data means prior mean data mean n small prior posterior means n large qq Qq QQ y = phenotype values Model Selection Seattle SISG: Yandell

28 posterior centered on sample genotypic mean but shrunken slightly toward overall mean Bayes posterior QTL means phenotype mean: genotypic prior: posterior: / ) ( ) (1 ) ( ) ( ) ( ) ( ) ( = + = = = = = q q q q q q q q q q n b y V y b y b y E V y E q y V q y E σ μ μ κσ μ μ σ μ QTL 2: Bayes Seattle SISG: Yandell shrinkage: 1 1 / sum } count{ } { + = = = = = q q q q i q q q i q n n b n y y q q n i κ κ pr(q m,λ) recombination model pr(q m,λ) = pr(geno map, locus) pr(geno flanking markers, locus) 1 m 2 m 3 m 4 m 5 m 6 m q? markers Model Selection Seattle SISG: Yandell λ distance along chromosome

29 Model Selection Seattle SISG: Yandell what are likely QTL genotypes q? how does phenotype y improve guess? D4Mit41 D4Mit214 bp what are probabilities for genotype q between markers? recombinants AA:AB all 1:1 if ignore y and if we use y? AA AA AB AA AA AB AB AB Genotype Model Selection Seattle SISG: Yandell

30 posterior on QTL genotypes q full conditional of q given data, parameters proportional to prior pr(q m, λ) weight toward q that agrees with flanking markers proportional to likelihood pr(y q, μ) weight toward q with similar phenotype values posterior recombination model balances these two this is the E-step of EM computations pr( q y, m, μ, λ) = pr( y q, μ) * pr( q m, λ) pr( y m, μ, λ) Model Selection Seattle SISG: Yandell Where are the loci λ on the genome? prior over genome for QTL positions flat prior = no prior idea of loci or use prior studies to give more weight to some regions posterior depends on QTL genotypes q pr(λ m,q) = pr(λ) pr(q m,λ) / constant constant determined by averaging over all possible genotypes q over all possible loci λ on entire map no easy way to write down posterior Model Selection Seattle SISG: Yandell

31 what is the genetic architecture γ? which positions correspond to QTLs? priors on loci (previous slide) which QTL have main effects? priors for presence/absence of main effects same prior for all QTL can put prior on each d.f. (1 for BC, 2 for F2) which pairs of QTL have epistatic interactions? prior for presence/absence of epistatic pairs depends on whether 0,1,2 QTL have main effects epistatic effects less probable than main effects Model Selection Seattle SISG: Yandell γ = genetic architecture: loci: main QTL epistatic pairs effects: add, dom aa, ad, dd Model Selection Seattle SISG: Yandell

32 Bayesian priors & posteriors augmenting with missing genotypes q prior is recombination model posterior is (formally) E step of EM algorithm sampling phenotype model parameters μ prior is flat normal at grand mean (no information) posterior shrinks genotypic means toward grand mean (details for unexplained variance omitted here) sampling QTL loci λ prior is flat across genome (all loci equally likely) sampling QTL genetic architecture model γ number of QTL prior is Poisson with mean from previous IM study genetic architecture of main effects and epistatic interactions priors on epistasis depend on presence/absence of main effects Model Selection Seattle SISG: Yandell Markov chain sampling construct Markov chain around posterior want posterior as stable distribution of Markov chain in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well sample QTL model components from full conditionals sample locus λ given q,γ (using Metropolis-Hastings step) sample genotypes q given λ,μ,y,γ (using Gibbs sampler) sample effects μ given q,y,γ (using Gibbs sampler) sample QTL model γ given λ,μ,y,q (using Gibbs or M-H) ( λ, q, μ, γ ) ~ pr( λ, q, μ, γ y, m) ( λ, q, μ, γ ) 1 ( λ, q, μ, γ ) 2 L ( λ, q, μ, γ ) N Model Selection Seattle SISG: Yandell

33 MCMC sampling of unknowns (q,µ,λ) for given genetic architecture γ Gibbs sampler genotypes q effects µ not loci λ q ~ pr( q y, m, μ, λ) i pr( y q, μ)pr( μ) μ ~ pr( y q) pr( q m, λ)pr( λ m) λ ~ pr( q m) i Metropolis-Hastings sampler extension of Gibbs sampler does not require normalization pr( q m ) = sum λ pr( q m, λ ) pr(λ ) Model Selection Seattle SISG: Yandell Gibbs sampler for two genotypic means want to study two correlated effects could sample directly from their bivariate distribution assume correlation ρ is known instead use Gibbs sampler: sample each effect from its full conditional given the other pick order of sampling at random repeat many times μ ρ ~ N, μ 2 0 ρ 1 μ ~ N 1 μ ~ N 2 2 ( ρμ2,1 ρ ) 2 ( ρμ,1 ρ ) Model Selection Seattle SISG: Yandell

34 Gibbs sampler samples: ρ = 0.6 N = 50 samples N = 200 samples Gibbs: mean Markov chain index Gibbs: mean n n 2 Gibbs: mean Gibbs: mean Markov chain index Gibbs: mean 1 Gibbs: mea Gibbs: mea Gibbs: mean Gibbs: mean n n Gibbs: mean Markov chain index Gibbs: mean Markov chain index Gibbs: mean 1 Model Selection Seattle SISG: Yandell full conditional for locus cannot easily sample from locus full conditional pr(λ ymµ y,m,µ,q) =pr(λ λ mq) m,q) = pr( q m, λ ) pr(λ ) / constant constant is very difficult to compute explicitly must average over all possible loci λ over genome must do this for every possible genotype q Gibbs sampler will not work in general but can use method based on ratios of probabilities Metropolis-Hastings is extension of Gibbs sampler Model Selection Seattle SISG: Yandell

35 Metropolis-Hastings idea want to study distribution f(λ) take Monte Carlo samples unless too complicated take samples using ratios of f Metropolis-Hastings samples: propose new value λ * near (?) current value λ from some distribution g accept new value with prob a Gibbs sampler: a = 1 always f(λ) g(λ λ λ * ) a = * * f ( λ ) g( λ λ) min 1, * f ( λ) g( λ λ ) Model Selection Seattle SISG: Yandell mcmc sequence Metropolis-Hastings for locus λ 0.6 pr(θ Y) θ θ λ λ added twist: occasionally propose from entire genome Model Selection Seattle SISG: Yandell

36 Metropolis-Hastings samples mcmc sequence N = 200 samples N = 1000 samples narrow g wide g narrow g wide g mcmc sequence θ λ θλ θ λ θ λ 1.2 mcmc sequence mcmc sequence histogram pr(θ Y) histogram pr(θ Y) histogram pr(θ Y) histogram pr(θ Y) θ θ θ θ Model Selection Seattle SISG: Yandell sampling genetic architectures search across genetic architectures γ of various sizes allow change in number of QTL allow change in types of epistatic interactions methods for search reversible jump MCMC Gibbs sampler with loci indicators complexity of epistasis Fisher-Cockerham effects model general multi-qtl interaction & limits of inference Model Selection Seattle SISG: Yandell

37 reversible jump MCMC consider known genotypes q at 2 known loci λ models with 1 or 2 QTL M-H step between 1-QTL and 2-QTL models model changes dimension (via careful bookkeeping) consider mixture over QTL models H γ = 1QTL : Y = β 0 + β ( q 1 ) + e γ = 2 QTL : Y = β 0 + β ( q 1 1 ) + β ( q 2 2 ) + e Model Selection Seattle SISG: Yandell geometry of reversible jump Move Between Models Reversible Jump Sequence b m=2 c21 = 0.7 b m= β b1 b1 1 β 1 Model Selection Seattle SISG: Yandell

38 geometry allowing q and λ to change a short sequence first 1000 with m<3 b b1 b b1 Model Selection Seattle SISG: Yandell β 1 β 1 collinear QTL = correlated effects 4-week 8-week 0.0 cor = cor =-0.7 additive additive additive 1 additive 1 effect 1 effect 1 linked QTL = collinear genotypes correlated estimates of effects (negative if in coupling phase) sum of linked effects usually fairly constant Model Selection Seattle SISG: Yandell

39 sampling across QTL models γ 0 λ 1 λ m+1 λ 2 λ m L action steps: draw one of three choices update QTL model γ with probability 1-b(γ)-d(γ) update current model using full conditionals sample QTL loci, effects, and genotypes add a locus with probability b(γ) propose a new locus along genome innovate new genotypes at locus and phenotype effect decide whether to accept the birth of new locus drop a locus with probability d(γ) propose dropping one of existing loci decide whether to accept the death of locus Model Selection Seattle SISG: Yandell Gibbs sampler with loci indicators consider only QTL at pseudomarkers every 1-2 cm modest approximation with little bias use loci indicators in each pseudomarker γ = 1 if QTL present γ = 0 if no QTL present Gibbs sampler on loci indicators γ relatively easy to incorporate epistasis Yi, Yandell, Churchill, hill Allison, Eisen, Pomp (2005 Genetics) ) (see earlier work of Nengjun Yi and Ina Hoeschele) μ q = μ + γ β ( q ) + γ 2 β ( q 2 2 ), γ k = 0,1 Model Selection Seattle SISG: Yandell

40 Bayesian shrinkage estimation soft loci indicators strength of evidence for λ j depends on γ 0 γ 1 (grey scale) shrink most γs to zero Wang et al. (2005 Genetics) Shizhong Xu group at U CA Riverside μ q = β 0 + γ β ( q ) γ 2 β ( q 2 1 ), 0 γ k 1 Model Selection Seattle SISG: Yandell other model selection approaches include all potential loci in model assume true model is sparse in some sense Sparse partial least squares Chun, Keles (2009 Genetics; 2010 JRSSB) LASSO model selection Foster (2006); Foster Verbyla Pitchford (2007 JABES) Xu (2007 Biometrics); Yi Xu (2007 Genetics) Shi Wahba Wright Klein Klein (2008 Stat & Infer) Model Selection Seattle SISG: Yandell

41 4. criteria for model selection balance fit against complexity classical information criteria penalize likelihood L by model size γ IC = 2 log L(γ y) + penalty(γ) maximize over unknowns Bayes factors marginal posteriors pr(y γ) average over unknowns Model Selection Seattle SISG: Yandell classical information criteria start with likelihood L(γ y, m) measures fit of architecture (γ) to phenotype (y) given marker data (m) genetic architecture (γ) depends on parameters have to estimate loci (µ) and effects (λ) complexity related to number of parameters γ = size of genetic architecture BC: γ = 1 + n.qtl + n.qtl(n.qtl - 1) = = 17 F2: γ = 1 + 2n.qtl +4n.qtl(n.qtl - 1) = = 57 Model Selection Seattle SISG: Yandell

42 classical information criteria construct information criteria balance fit to complexity Akaike AIC = 2 log(l) + 2 γ Bayes/Schwartz BIC = 2 log(l) + γ log(n) Broman BIC δ = 2 log(l) + δ γ log(n) general form: IC = 2 log(l) + γ D(n) compare models hypothesis testing: designed for one comparison 2 log[lr(γ 1, γ 2 )] = L(y m, γ 2 ) L(y m, γ 1 ) model selection: penalize complexity IC(γ 1, γ 2 ) = 2 log[lr(γ 1, γ 2 )] + ( γ 2 γ 1 ) D(n) Model Selection Seattle SISG: Yandell information criteria vs. model size WinQTL 2.0 SCD data on F2 A=AIC 1=BIC(1) 2=BIC(2) d=bic(δ) models 1,2,3,4 QTL epistasis 2:2 AD information criteria model parameters p epistasis Model Selection Seattle SISG: Yandell d d d 2 2 d d A 1 1 A A A A A A A d 2 d 2 d 2

43 Bayes factors ratio of model likelihoods ratio of posterior to prior odds for architectures averaged over unknowns pr( γ 1 y, m) / pr( γ 2 y, m) pr( y m, γ 1) B 12 = = pr( γ ) / pr( γ ) pr( y m, γ ) roughly equivalent to BIC 1 BIC maximizes over unknowns BF averages over unknowns log( B12) = 2log( LR) ( γ 2 γ )log( n) 2 Model Selection Seattle SISG: Yandell scan of marginal Bayes factor & effect Model Selection Seattle SISG: Yandell

44 issues in computing Bayes factors BF insensitive to shape of prior on γ geometric, Poisson, uniform precision improves when prior mimics posterior BF sensitivity to prior variance on effects θ prior variance should reflect data variability resolved by using hyper-priors automatic algorithm; no need for user tuning easy to compute Bayes factors from samples sample posterior using MCMC posterior pr(γ y, m) is marginal histogram Model Selection Seattle SISG: Yandell Bayes factors & genetic architecture γ γ = number of QTL prior pr(γ) chosen by user posterior pr(γ y,m) sampled marginal histogram shape affected by prior pr(a) pr( γ 1 y, m) / pr( γ 1) BF γ1, γ = 2 pr( γ y, m) / pr( γ ) 2 2 bability prior prob pattern of QTL across genome gene action and epistasis e e p p e p u exponential Poisson uniform u u p eu u pu u u e e p p e p e p e e p e u u pu e pu m = number of QTL Model Selection Seattle SISG: Yandell

45 BF sensitivity to fixed prior for effects Bayes factors B45 B34 B23 B ( σ / m), β ~ N 0, σ = h σ qj hyper-prior heritability h 2 G G, h fixed Model Selection Seattle SISG: Yandell total 2 BF insensitivity to random effects prior hyper-prior density 2*Beta(a,b) insensitivity to hyper-prior 3.0 density , ,9.5 1,9 2,10 1,3 1,1 1.0 Bayes factors B34 B23 B hyper-parameter heritability h 2 E(h 2 ) 2 ( σ / m), β ~ N 0, σ = h σ qj G G total, 1 2 h 2 ~ Beta( a, b) Model Selection Seattle SISG: Yandell

46 Multiple Correlated Traits Pleiotropy vs. close linkage Analysis of covariance Regress one trait on another before QTL search Classic GxE analysis Formal joint mapping (MTM) Seemingly unrelated regression (SUR) Reducing many traits to one Principle components for similar traits Correlated Traits SISG (c) Yandell co-mapping multiple traits avoid reductionist approach to biology address physiological/biochemical og oc mechanisms s Schmalhausen (1942); Falconer (1952) separate close linkage from pleiotropy 1 locus or 2 linked loci? identify epistatic interaction or canalization influence of genetic background establish QTL x environment interactions decompose genetic correlation among traits increase power to detect QTL Correlated Traits SISG (c) Yandell

47 Two types of data Design I: multiple traits on same individual Related measurements, say of shape or size Same measurement taken over time Correlation within an individual Design II: multiple traits on different individuals Same measurement in two crosses Male vs. female differences Different individuals in different locations No correlation between individuals Correlated Traits SISG (c) Yandell interplay of pleiotropy & correlation pleiotropy only correlation only Korol et al. (2001) both Correlated Traits SISG (c) Yandell

48 Brassica napus: 2 correlated traits 4-week & 8-week vernalization effect log(days to flower) genetic cross of Stellar (annual canola) Major (biennial rapeseed) 105 F1-derived double haploid (DH) lines homozygous at every locus (QQ or qq) 10 molecular markers (RFLPs) on LG9 two QTLs inferred on LG9 (now chromosome N2) corroborated by Butruille (1998) exploiting synteny with Arabidopsis thaliana Correlated Traits SISG (c) Yandell QTL with GxE or Covariates adjust phenotype by covariate covariate(s) ate(s) = environment(s) e or other trait(s) t(s) additive covariate covariate adjustment same across genotypes usual analysis of covariance (ANCOVA) interacting covariate address GxE capture genotype-specific relationship among traits another way to think of multiple trait analysis examine single phenotype adjusted for others Correlated Traits SISG (c) Yandell

49 R/qtl & covariates additive and/or interacting covariates test for QTL after adjusting for covariates ## Get Brassica data. library(qtlbim) data(bnapus) Bnapus <- calc.genoprob(bnapus, step = 2, error = 0.01) ## Scatterplot of two phenotypes: 4wk & 8wk flower time. plot(bnapus$pheno$log10flower4,bnapus$pheno$log10flower8) ## Unadjusted IM scans of each phenotype. fl8 <- scanone(bnapus,, find.pheno(bnapus, "log10flower8")) fl4 <- scanone(bnapus,, find.pheno(bnapus, "log10flower4")) plot(fl4, fl8, chr = "N2", col = rep(1,2), lty = 1:2, main = "solid = 4wk, dashed = 8wk", lwd = 4) Correlated Traits SISG (c) Yandell Correlated Traits SISG (c) Yandell

50 R/qtl & covariates additive and/or interacting covariates test for QTL after adjusting for covariates ## IM scan of 8wk adjusted for 4wk. ## Adjustment independent of genotype fl8.4 <- scanone(bnapus,, find.pheno(bnapus, "log10flower8"), addcov = Bnapus$pheno$log10flower4) ## IM scan of 8wk adjusted for 4wk. ## Adjustment changes with genotype. fl8.4 <- scanone(bnapus,, find.pheno(bnapus, "log10flower8"), intcov = Bnapus$pheno$log10flower4) plot(fl8, fl8.4a, fl8.4, chr = "N2", main = "solid = 8wk, dashed = addcov, dotted = intcov") Correlated Traits SISG (c) Yandell Correlated Traits SISG (c) Yandell

51 Correlated Traits SISG (c) Yandell scatterplot adjusted for covariate ## Set up data frame with peak markers, traits. markers <- c("e38m50.133","ec2e5a","wg7f3a") "ec2e5a" " tmpdata <- data.frame(pull.geno(bnapus)[,markers]) tmpdata$fl4 <- Bnapus$pheno$log10flower4 tmpdata$fl8 <- Bnapus$pheno$log10flower8 ## Scatterplots grouped by marker. library(lattice) xyplot(fl8 ~ fl4, tmpdata, group = wg7f3a, col = "black", pch = 3:4, cex = 2, type = c("p","r"), xlab = "log10(4wk flower time)", ylab = "log10(8wk flower time)", main = "marker at 47cM") xyplot(fl8 ~ fl4, tmpdata, group = E38M50.133, col = "black", pch = 3:4, cex = 2, type = c("p","r"), xlab = "log10(4wk flower time)", ylab = "log10(8wk flower time)", main = "marker at 80cM") Correlated Traits SISG (c) Yandell

52 Multiple trait mapping Joint mapping of QTL testing and estimating QTL affecting multiple traits Testing pleiotropy vs. close linkage One QTL or two closely linked QTLs Testing QTL x environment interaction Comprehensive model of multiple traits Separate genetic & environmental correlation Correlated Traits SISG (c) Yandell Formal Tests: 2 traits y ~ N(μ σ 2 1 q1, ) for group 1 with QTL at location λ 1 y 2 ~ N(μ q2, σ 2 ) for group 2 with QTL at location λ 2 Pleiotropy vs. close linkage test QTL at same location: λ 1 = λ 2 likelihood ratio test (LOD): null forces same location if pleiotropic (λ 1 = λ 2 ) test for same mean: μ q1 = μ q2 Likelihood ratio test (LOD) null forces same mean, location alternative forces same location only make sense if traits are on same scale Correlated test Traitssex or location effect SISG (c) Yandell

53 3 correlated traits (Jiang Zeng 1995) jiang ellipses centered on genotypic value width for nominal frequency main axis angle environmental correlation 3 QTL, F2 ρ P = 0.3, ρ G = 0.54, ρ E = genotypes note signs of genetic and environmental correlation jiang jiang ρ P = 0.06, ρ G = 0.68, ρ E = jiang3 ρ P = -0.07, ρ G = -0.22, ρ E = jiang3 Correlated Traits SISG (c) Yandell jiang pleiotropy or close linkage? 2 traits, 2 qtl/trait 54cM 114,128cM Jiang Zeng (1995) Correlated Traits SISG (c) Yandell

54 More detail for 2 traits y ~N(μ 1 q1, σ 2 ) for group 1 y 2 ~ N(μ q2, σ 2 ) for group 2 two possible QTLs at locations λ 1 and λ 2 effect β kj in group k for QTL at location λ j μ q1 = μ 1 + β 11 (q 1 )+β β 12 (q 2 ) μ q2 = μ 2 + β 21 (q 1 ) + β 22 (q 2 ) classical: test β kj = 0 for various Correlated combinations Traits SISG (c) Yandell seemingly unrelated regression (SUR) μ q1 = μ 1 + γ 11 β q11 + γ 12 β q12 μ q2 = μ 2 + γ 21 β q21 + γ 22 β q22 indicators γ kj are 0 (no QTL) or 1 (QTL) include γs in formal model selection Correlated Traits SISG (c) Yandell

55 SUR for multiple loci across genome consider only QTL at pseudomarkers (lecture 2) use loci indicators γ j (=0 or1)foreach pseudomarker use SUR indicators γ kj (=0 or 1) for each trait Gibbs sampler on both indicators Banerjee, Yandell, Yi (2008 Genetics) μ q1 = μ 1 + γ γ 1 11 β 11 ( q 1 ) + γ 2 γ 12 β 12 ( q 2 ) +... μ q 2 = μ 2 + γ γ 1 21 β 21 ( q 1 ) + γ 2 γ 22 β 22 ( q 2 ) +... Correlated Traits SISG (c) Yandell Simulation 5 QTL 2 traits n=200 TMV vs. SUR Correlated Traits SISG (c) Yandell

56 Correlated Traits SISG (c) Yandell Correlated Traits SISG (c) Yandell

57 Correlated Traits SISG (c) Yandell R/qtlbim and GxE similar idea to R/qtl fixed and random additive covariates GxE with fixed covariate multiple trait analysis tools coming soon theory & code mostly in place properties under study expect in R/qtlbim later this year Samprit Banerjee (N Yi, advisor) Correlated Traits SISG (c) Yandell

58 reducing many phenotypes to 1 Drosophila mauritiana x D. simulans reciprocal backcrosses, ~500 per bc response is shape of reproductive piece trace edge, convert to Fourier series reduce dimension: first principal p component many linked loci brief comparison of CIM, MIM, BIM Correlated Traits SISG (c) Yandell PC for two correlated phenotypes ettf PC2 (7%) etif3s6 PC1 (93%) Correlated Traits SISG (c) Yandell

59 shape phenotype via PC Liu et al. (1996) Genetics Correlated Traits SISG (c) Yandell shape phenotype in BC study indexed by PC1 Liu et al. (1996) Genetics Correlated Traits SISG (c) Yandell

60 Zeng et al. (2000) CIM vs. MIM composite interval mapping (Liu et al. 1996) narrow peaks miss some QTL multiple interval mapping (Zeng et al. 2000) triangular peaks both conditional 1 D scans fixing all other "QTL" Correlated Traits SISG (c) Yandell CIM, MIM and IM pairscan cim 2 D im mim Correlated Traits SISG (c) Yandell

61 multiple QTL: CIM, MIM and BIM cim mim bim Correlated Traits SISG (c) Yandell

62 Quantile-based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March 2012 MSRC Yandell 1 Fisher on inference We may at once admit that any inference from the particular to the general must be attended with some degree of uncertainty, but this is not the same as to admit that such inference cannot be absolutely rigorous, for the nature and degree of the uncertainty may itself be capable of rigorous expression. MSRC5 Sir Ronald A Fisher(1935) 2012 Yandell 2 The Design of Experiments

63 Why study hotspots? How do genotypes affect phenotypes? genotypes = DNA markers for an individual phenotypes = traits measured on an individual (clinical traits, thousands of mrna expression levels) QTL hotspots = genomic locations affecting many traits common feature in genetical genomics studies biologically interesting--may harbor critical regulators But are these hotspots real? Or are they spurious or random? non-genetic correlation from other environmental factors MSRC Yandell 3 MSRC Yandell 4

64 MSRC Yandell 5 MSRC Yandell 6

65 Genetic architecture of gene expression in 6 tissues. A Tissue specific panels illustrate the relationship between the genomic location of a gene (y axis) to where that gene s mrna shows an eqtl (LOD > 5), as a function of genome position (x axis). Circles represent eqtls that showed either cis linkage (black) or translinkage (colored) according to LOD score. Genomic hot spots, where many eqtls map in trans, are apparent as vertical bands that show either tissue selectivity (e.g., Chr 6 in the islet, ) or are present in all tissues (e.g., Chr 17, ). B The total number of eqtls identified in 5 cm genomic windows is plotted for each tissue; total eqtls for all positions is shown in upper right corner for each panel. The peak number of eqtls exceeding 1000 per 5 cm is shown for islets (Chrs 2, 6 and 17), liver (Chrs 2 and 17) and kidney (Chr 17). Figure 4 Tissue specific hotspots with eqtl and SNP architecture for Chrs 1, 2 and 17. The number of eqtls for each tissue (left axis) and the number of SNPs between B6 and BTBR (right axis) that were identified within a 5 cm genomic window is shown for Chr 1 (A), Chr 2 (B) Chr17(C). The location of tissue specific hotspots are identified by their number corresponding to that in Table 1. eqtl and SNP architecture is shown for all chromosomes in supplementary material.

66 How large a hotspot is large? recently proposed empirical test Brietling et al. Jansen (2008) hotspot = count traits above LOD threshold LOD = rescaled likelihood ratio ~ F statistic assess null distribution with permutation test extension of Churchill and Doerge (1994) extension of Fisher's permutation t-test MSRC Yandell 9 Single trait permutation threshold T Churchill Doerge (1994) Null distribution of max LOD Permute single trait separate from genotype Find max LOD over genome Repeat 1000 times Find 95% permutation threshold T Identify interested peaks above T in data Controls genome-wide error rate (GWER) MSRC5 Chance of detecting at least on peak above T 2012 Yandell 10

67 Single trait permutation schema genotypes phenotype LOD over genome max LOD 1. shuffle phenotypes to break QTL 2. repeat 1000 times and summarize MSRC Yandell 11 Hotspot count threshold N(T) Breitling et al. Jansen (2008) Null distribution of max count above T Find single-trait 95% LOD threshold T Find max count of traits with LODs above T Repeat 1000 times Find 95% count permutation threshold N Identify counts of LODs above T in data Locus-specific counts identify hotspots Controls GWER in some way MSRC Yandell 12

68 genotypes Hotspot permutation schema phenotypes LOD at each locus for each phenotype over genome count LODs at locus over threshold T max count N over genome 1. shuffle phenotypes by row to break QTL, keep correlation 2. repeat 1000 times and 2012 summarize Yandell MSRC5 13 spurious hotspot permutation histogram for hotspot size above 1-trait threshold 95% threshold at N > 82 using single trait thresholdt = 3.41 MSRC Yandell 14

69 Hotspot sizes based on count of LODs above single-trait threshold 5 peaks above count threshold N = 82 all traits counted are nominally significant but no adjustment for multiple testing across traits MSRC Yandell 15 hotspot permutation test (Breitling et al. Jansen 2008 PLoS Genetics) for original dataset and each permuted set: Set single trait LOD threshold T Could use Churchill-Doerge (1994) permutations Count number of traits (N) with LOD above T Do this at every marker (or pseudomarker) Probably want to smooth counts somewhat find count with at most 5% of permuted sets above (critical value) as count threshold conclude original counts above threshold are real MSRC Yandell 16

70 permutation across traits (Breitling et al. Jansen 2008 PLoS Genetics) right way wrong way strain marker gene expression break correlation between markers and traits but preserve correlation among traits MSRC Yandell 17 quality vs. quantity in hotspots (Chaibub Neto et al. in review) detecting single trait with very large LOD control FWER across genome control FWER across all traits finding small hotspots with significant traits all with large LODs could indicate a strongly disrupted signal pathway MSRC5 sliding LOD threshold 2012 Yandell across hotspot sizes 18

71 Rethinking the approach Breitling et al. depends highly on T Threshold T based on single trait but interested in multiple correlated traits want to control hotspot GWER (hgwer N ) chance of detecting at least one spurious hotspot of size N or larger N = 1 MSRC5 chance of detecting at least 1 peak above threshold across all traits and whole genome Use permutation null distribution of maximum LOD scores across all transcripts and all genomic locations 2012 Yandell 19 Hotspot architecture using multiple trait GWER threshold (T 1 =7.12) count of all traits with LOD above T 1 = 7.12 all traits counted are significant conservative adjustment for multiple traits MSRC Yandell 20

72 locus-specific LOD quantiles in data for 10(black), 20(blue), 50(red) traits MSRC Yandell 21 locus-specific LOD quantiles Quantile: what is LOD value for which at least 10 (or 20 or 50) traits are at above it? Breitling hotspots (chr 2,3,12,14,15) have many traits with high LODs Chromosome max LOD quantile by trait count MSRC5 color count chr 3 chr 8 chr 12 chr 14 black blue red Yandell

73 Hotspot permutation revisited genotypes phenotypes LOD at each locus over genome per phenotype Find quantile = N-th largest LOD at each locus max LOD quantile over genom 1. shuffle phenotypes by row to break QTL, keep correlation 2. repeat 1000 times and 2012 summarize Yandell MSRC5 23 Tail distribution of LOD quantiles and size-specific thresholds What is locus-specific (spurious) hotspot? all traits in hotspot have LOD above null threshold Small spurious hotspots have higher minimum LODs min of 10 values > min of 20 values Large spurious hotspots have many small LODs most are below single-trait threshold Null thresholds depending on hotspot size Decrease with spurious hotspot size (starting at N = 1) Be truncated at single-trait threshold for large sizes Chen Storey (2007) studied LOD quantiles For multiple peaks on a single trait MSRC Yandell 24

74 genome-wide LOD permutation threshold vs. spurious hotspot size smaller spurious hotspots have higher LOD thresholds larger spurious hotspots allow many traits with small LODS (below T=3.41) MSRC Yandell 25 Hotspot architecture using multiple trait GWER threshold (T 1 =7.12) MSRC Yandell 26

75 hotspot architectures using LOD thresholds for 10(black), 20(blue), 50(red) traits MSRC Yandell 27 Sliding threshold between multiple trait (T 1 =7.12) and single trait (T 0 =3.41) GWER T 1 =7.12 controls GWER across all traits T 0 =3.41 controls GWER for single trait MSRC Yandell 28

76 Hotspot size significance profile Construction Fix significance level (say 5%) At each locus, find largest hotspot that is significant using sliding threshold Plot as profile across genome Interpretation Large hotspots were already significant Traits with LOD > 7.12 could be hubs Smaller hotspots identified by fewer large LODs (chr 8) Subjective choice on what to investigate (chr 13, 5?) MSRC Yandell 29 Hotspot size signifcance profile MSRC Yandell 30

77 120 individuals Yeast study 6000 traits 250 markers 1000 permutations 1.8 * 10^10 linear models MSRC Yandell individuals Mouse study 30,000 traits * 6 tissues 2000 markers 1000 permutations 1.8 * 10^13 linear models 1000 x more than yeast study MSRC Yandell 32

78 Scaling up permutations tremendous computing resource needs Multiple analyses, periodically redone Algorithms improve Gene annotation and sequence data evolve Verification of properties of methods Theory gives easy cutoff values (LOD > 3) that may not be relevant Need to carefully develop re-sampling methods (permutations, etc.) Storage of raw, processed and summary data (and metadata) Terabyte(s) of backed-up storage (soon petabytes and more) Web access tools high throughput computing platforms (Condor) Reduce months or years to hours or days Free up your mind to think about science rather than mechanics Free up your desktop/laptop for more immediate tasks Need local (regional) infrastructure Who maintains the machines, algorithms? Who can talk to you in plain language? MSRC Yandell 33 CHTC use: one small project Open Science Grid Glidein Usage (4 feb 2012) group hours percent 1 BMRB % 2 Biochem_Attie % 3 Statistics_Wahba % MSRC Yandell 34

79 Brietling et al (2008) hotspot size thresholds from permutations MSRC Yandell 35 Breitling Method MSRC Yandell 36

80 Chaibub Neto sliding LOD thresholds h MSRC Yandell 37 Sliding LOD method MSRC Yandell 38

81 What s next? Further assess properties (power of test) Drill into identified hotspots Find correlated subsets of traits Look for local causal agents (cis traits) Build causal networks (another talk ) Validate findings for narrow hotspot Incorporate as tool in pipeline Increase access for discipline researchers Increase visibility of method MSRC Yandell 39 References Chaibub Neto E, Keller MP, Broman AF, Attie AD, Jansen RC, Broman KW, Yandell BS, Quantile-based permutation tti thresholds hld for QTLht hotspots. t Genetics (in review). Breitling R, Li Y, Tesson BM, Fu J, Wu C, Wiltshire T, Gerrits A, Bystrykh LV, de Haan G, Su AI, Jansen RC (2008) Genetical Genomics: Spotlight on QTL Hotspots. PLoS Genetics 4: e Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138: MSRC Yandell 40

82 1 2

83 3 4

84 5 6

85 7 8

86 9 10

87 11 12

88 13 14

89 15 16

90 true graph initial network and so on keep edge move to next edge 17 After all zero order conditional independence tests The algorithm then moves to first order conditional independence tests. 18

91 true graph Move to next edge drop edge 19 true graph keep edge keep edge change cond set 20

92 After all first order conditional independence tests. The algorithm then moves to second order conditional independence tests. 21 true graph move to next edge drop edge 22

93 After all second order conditional independence tests Then the algorithm moves to third order, fourth order 23 24

94 25 26

95 27 28

96 true graph 29 30

97 31 32

98 33 34

99 35 36

100 k save k + 1 save k select edge drop edge identify parents orphan nodes reverse edge find new parents from Grzegorczyk and Husmier (2008) 38

101 Trace plots of the logarithmic scores of the DAGs after the burn in phase. from Grzegorczyk and Husmier (2008) 39 40

102 41 42

103 43 44

104 R/qtlnet available at CRAN 45 46

105 47 48

106 Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison) Modules/Pathways SISG (c) 2012 Brian S Yandell 1 Weighted models for insulin Detected by scanone # transcripts that match weighted insulin model in each of 4 tissues: tissue # transcripts Islet 1984 Adipose 605 Liver 485 Gastroc 404 Modules/Pathways Detected by SISG Ping s (c) multiqtl 2012 Brian model S Yandell 2

107 Ping Wang Chr 2 insulin main effects Chr 9 Chr 12 Chr 14 Chr 16 Chr 17 Chr 19 How many ilt islet transcripts show this same genetic dependence at these loci? Modules/Pathways SISG (c) 2012 Brian S Yandell 3 Expression Networks Zhang & Horvath (2005) organize expression traits using correlation adjacency β = cor( x, x ), β = 6 connectivity k i = sum l ( a il ) a ij topological ij l il jl TOM ij = overlap 1 a + min( k, k ) i a j + sum ( a Modules/Pathways SISG (c) 2012 Brian S Yandell 4 ij a i ) j

108 Using the topological overlap matrix (TOM) to cluster genes modules correspond to branches of the dendrogram Genes correspond to rows and columns TOM plot Hierarchical clustering dendrogram TOM matrix Module: Correspond to branches Modules/Pathways SISG (c) 2012 Brian S Yandell 5 module traits highly correlated adjacency attenuates correlation can separate positive, negative correlation summarize module eigengene weighted average of traits relate module to clinical traits map eigengene Modules/Pathways SISG (c) 2012 Brian S Yandell 6

109 advantages of Horvath modules emphasize modules (pathways) instead of individual genes Greatly alleviates the problem of multiple comparisons ~20 module comparisons versus 1000s of gene comparisons intramodular connectivity k i finds key drivers (hub genes) quantifies module membership (centrality) highly connected genes have an increased chance of validation module definition is based on gene expression data no prior pathway information is used for module definition two modules (eigengenes) can be highly correlated unified approach for relating variables compare data sets on same mathematical footing scale-free: zoom in and see similar structure Modules/Pathways SISG (c) 2012 Brian S Yandell 7 Ping Wang modules for 1984 transcripts with similar genetic architecture as insulin contains the insulin trait Modules/Pathways SISG (c) 2012 Brian S Yandell 8

110 Islet modules chromosomes Insulin trait Modules/Pathways SISG (c) 2012 Brian S Yandell 9 Islet enrichment for modules Module Pvalue Qvalue Count Size Term BLUE biosynthetic process cellular lipid metabolic process lipid biosynthetic process lipid metabolic process GREEN phosphate transport intermediate filament based process ion transport PURPLE nucleobase, nucleoside, nucleotide and nucleic acid 2769 metabolic process BLACK sensory perception of sound MAGENTA 2.54E cell cycle process microtubule based process mitotic cell cycle M phase Insulin cell division cell cycle phase mitosis M phase of mitotic cell cycle YELLOW cell projection organization and biogenesis cell part morphogenesis cell projection morphogenesis RED steroid hormone receptor signaling pathway reproductive process BROWN response to pheromone TURQUOISE enzyme linked receptor protein signaling pathway morphogenesis of an epithelium morphogenesis of embryonic epithelium anatomical structure morphogenesis PINK vesicle organization and biogenesis regulation of apoptosis chromosomes Modules/Pathways SISG (c) 2012 Brian S Yandell 10

111 ontologies Cellular component (GOCC) Biological process (GOBP) Molecular function (GOMF) hierarchy of classification general to specific based on extensive literature search, predictions prone to errors, historical inaccuracies Modules/Pathways SISG (c) 2012 Brian S Yandell 11 Bayesian causal phenotype network incorporating genetic variation and biological i l knowledge Brian S Yandell, Jee Young Moon University of Wisconsin-Madison Elias Chaibub Neto, Sage Bionetworks Xinwei Deng, VA Tech Modules/Pathways SISG (c) 2012 Brian S Yandell 12

112 BTBR mouse is insulin resistant B6 is not Alan Attie Biochemistr make both obese glucose insulin Modules/Pathways SISG (c) 2012 Brian S Yandell 13 (courtesy AD Attie) bigger picture how do DNA, RNA, proteins, metabolites regulate each other? regulatory networks from microarray expression data time series measurements or transcriptional perturbations segregating population: genotype as driving perturbations goal: discover causal regulatory relationships among phenotypes use knowledge of regulatory relationships from Modules/Pathways SISG (c) 2012 Brian S Yandell 14 databases h thi i l t k t ti?

113 BxH ApoE / chr 2: hotspot x% threshold on number of traits DNA local gene distant genes Modules/Pathways SISG (c) 2012 Brian S Yandell Data: Ghazalpour et al.(2006) PLoS 15 causal model selection choices in context of larger, unknown network targe causal focal trait focal trait focal trait focal trait Modules/Pathways t trait targe t trait targe t trait targe t trait SISG (c) 2012 Brian S Yandell reactive correlated uncorrelated 16

114 causal architecture references BIC: Schadt et al. (2005) Nature Genet CIT: Millstein et al. (2009) BMC Genet Aten et al. Horvath (2008) BMC Sys Bio CMST: Chaibub Neto et al. (2010) PhD thesis Chaibub Neto et al. (2012) Genetics (in review) Modules/Pathways SISG (c) 2012 Brian S Yandell 17 BxH ApoE / study Ghazalpour et al. (2008) PLoS Genetics Modules/Pathways SISG (c) 2012 Brian S Yandell 18

115 Modules/Pathways SISG (c) 2012 Brian S Yandell 19 QTL-driven directed graphs given genetic architecture (QTLs), what causal network structure is supported by data? R/qdg available at references Chaibub Neto, Ferrara, Attie, Yandell (2008) Inferring causal phenotype networks from segregating populations. Genetics 179: [doi:genetics ] Ferrara et al. Attie (2008) Genetic networks of liver metabolism revealed by integration of metabolic and transcriptomic profiling. PLoS Genet 4: e [doi: /journal.pgen ] Modules/Pathways SISG (c) 2012 Brian S Yandell 20

116 partial correlation (PC) skeleton true graph correlations 1 st order partial correlations drop edge Modules/Pathways SISG (c) 2012 Brian S Yandell 21 partial correlation (PC) skeleton true graph 1 st order partial correlations 2 nd order partial correlations drop edge Modules/Pathways SISG (c) 2012 Brian S Yandell 22

117 edge direction: which is causal? due to QTL Modules/Pathways SISG (c) 2012 Brian S Yandell 23 test edge direction using LOD score Modules/Pathways SISG (c) 2012 Brian S Yandell 24

118 reverse edges using QTLs true graph Modules/Pathways SISG (c) 2012 Brian S Yandell 25 causal graphical models in systems genetics What if genetic architecture and causal network are unknown? jointly infer both using iteration Chaibub Neto, Keller, Attie, Yandell (2010) Causal Graphical Models in Systems Genetics: a unified framework for joint inference of causal network and genetic architecture for correlated phenotypes. Ann Appl Statist 4: [doi: /09-aoas288] R/qtlnet available from Related references Schadt et al. Lusis (2005 Nat Genet); Li et al. Churchill (2006 Genetics); Chen Emmert-Streib Storey(2007 Genome Bio); Liu de la Fuente Hoeschele (2008 Genetics); Winrow et al. Turek (2009 PLoS ONE); Hageman et al. Churchill (2011 Genetics) Modules/Pathways SISG (c) 2012 Brian S Yandell 26

119 Basic idea of QTLnet iterate between finding QTL and network genetic architecture given causal network trait y depends on parents pa(y) in network QTL for y found conditional on pa(y) Parents pa(y) are interacting covariates for QTL scan causal network given genetic architecture build (adjust) causal network given QTL Modules/Pathways each direction SISG change (c) 2012 Brian may S Yandell alter neighbor edges27 missing data method: MCMC known phenotypes Y, genotypes Q unknown graph G want to study Pr(Y G, Q) break down in terms of individual edges Pr(Y G,Q) = sum of Pr(Y i pa(y i ), Q) sample new values for individual edges given current value of all other edges repeat many times and average results Modules/Pathways SISG (c) 2012 Brian S Yandell 28

120 MCMC steps for QTLnet propose new causal network G with simple changes to current network: change edge direction add or drop edge find any new genetic architectures Q update phenotypes when parents pa(y) change in new G compute likelihood for new network and QTL Pr(Y G, Q) accept or reject new network and QTL usual Metropolis-Hastings idea Modules/Pathways SISG (c) 2012 Brian S Yandell 29 BxH ApoE-/- causal network for transcription factor Pscdbp causal trait work of Elias Chaibub Neto Modules/Pathways Data: Ghazalpour et al.(2006) PLoS Gene SISG (c) 2012 Brian S Yandell 30

121 scaling up to larger networks reduce complexity of graphs use prior knowledge to constrain valid edges restrict number of causal edges into each node make task parallel: run on many machines pre-compute p conditional probabilities run multiple parallel Markov chains rethink approach LASSO, sparse PLS, other optimization Modules/Pathways SISG (c) 2012 Brian S Yandell 31 methods graph complexity with node parents pa1 pa1 pa2 pa3 node node of1 of2 of3 of1 of2 of3 Modules/Pathways SISG (c) 2012 Brian S Yandell 32

122 parallel phases for larger projects Phase 1: identify parents 1 Phase 2: compute BICs b nodes BIC = LOD penalty all possible parents to all 3 Phase 3: store BICs m 5 Phase 4: run Markov chains Modules/Pathways SISG (c) 2012 Brian S Yandell 33 parallel implementation R/qtlnet available at Condor cluster: chtc.cs.wisc.edu System Of Automated Runs (SOAR) ~2000 cores in pool shared by many scientists automated run of new jobs placed in project Phase 2 Phase 4 Modules/Pathways SISG (c) 2012 Brian S Yandell 34

123 single edge updates burnin Modules/Pathways SISG (c) 2012 Brian S Yandell ,000 runs neighborhood edge reversal select edge drop edge identify parents orphan nodes reverse edge find new parents Grzegorczyk M. and Husmeier D. (2008) Machine Learning 71 (2 3), 2 Modules/Pathways SISG (c) 2012 Brian S Yandell 36

124 neighborhood for reversals only burnin Modules/Pathways SISG (c) 2012 Brian S Yandell ,000 runs how to use functional information? functional grouping from prior studies may or may not indicate direction gene ontology (GO), KEGG knockout (KO) panels protein-protein interaction (PPI) database transcription factor (TF) database methods using only this information priors for QTL-driven causal networks more weight to local (cis) QTLs? Modules/Pathways SISG (c) 2012 Brian S Yandell 38

125 modeling biological knowledge infer graph G Y from biological knowledge B Pr(G Y B, W) = exp( W * B G Y ) / constant B = prob of edge given TF, PPI, KO database derived using previous experiments, papers, etc. G Y = 0-1 matrix for graph with directed edges W = inferred weight of biological knowledge W=0: 0 no influence; W large: assumed correct P(W B) = φ exp(- φ W) exponential Werhli and Husmeier (2007) J Bioinfo Comput Biol Modules/Pathways SISG (c) 2012 Brian S Yandell 39 combining eqtl and bio knowledge probability for graph G and bio-weights W given phenotypes Y, genotypes Q, bio info B Pr(G, W Y, Q, B) = c Pr(Y G,Q)Pr(G B,W,Q)Pr(W B) Pr(Y G,Q) is genetic architecture (QTLs) using parent nodes of each trait as covariates Pr(G B,W,Q) = Pr(G Y B,W) Pr(G Q Y Q) P(G Pr(G Y B,W) relates graph hto biological i linfo Pr(G Q Y Q) relates genotype to phenotype Moon JY, Chaibub Neto E, Deng X, Yandell BS (2011) Growing graphical models to infer causal phenotype networks. In Probabilistic Graphical Models Dedicated to Applications in Genetics. Sinoquet C, Mourad R, eds. (in review) Modules/Pathways SISG (c) 2012 Brian S Yandell 40

126 encoding biological knowledge B transcription factors, DNA binding (causation) B ij = λe λp λpλ p λe + (1 e λ p = p-value for TF binding of i j truncated exponential (λ)whentfi j uniform if no detection relationship Bernard, Hartemink (2005) Pac Symp Biocomp ) Modules/Pathways SISG (c) 2012 Brian S Yandell 41 encoding biological knowledge B protein-protein interaction (association) Bij = Bji = posterior odds 1+ posterior odds post odds = prior odds * LR use positive ii and negative gold standards d Jansen et al. (2003) Science Modules/Pathways SISG (c) 2012 Brian S Yandell 42

127 encoding biological knowledge B gene ontology(association) B ij = Bji = c mean( sim( GOi, GO j )) GO = molecular function, processes of gene sim = maximum information i content across common parents of pair of genes Lord et al. (2003) Bioinformatics Modules/Pathways SISG (c) 2012 Brian S Yandell 43 MCMC with pathway information sample new network G from proposal R(G* G) add or drop edges; switch causal direction sample QTLs Q from proposal R(Q* Q,Y) e.g. Bayesian QTL mapping given pa(y) accept new network (G*,Q*) with probability A = min(1, f(g,q G*,Q*)/ f(g*,q* G,Q)) f(g,q G*,Q*) Q*) = Pr(Y G*,Q*)Pr(G* B,W,Q*)/R(G* G)R(Q* Q,Y) (G* B Q*)/R(G* G)R(Q* Q Y) sample W from proposal R(W* W) accept new weight W* with probability Modules/Pathways SISG (c) 2012 Brian S Yandell 44

128 ROC curve simulation open = QTLnet closed = phenotypes only True Positive Rate QTLnet prior QTLnet WH prior Expression False Positive Rate Modules/Pathways SISG (c) 2012 Brian S Yandell 45 integrated ROC curve x2: genetics pathways Area Under ROC probability classifier ranks true > false edges QTLnet prior QTLnet WH prior Expression = accuracy of B δ Modules/Pathways SISG (c) 2012 Brian S Yandell 46

129 weight on biological knowledge incorrect non informative correct 4 δ : 0.1 δ : 0 δ : 0.1 Density W Modules/Pathways SISG (c) 2012 Brian S Yandell 47 yeast data partial success 26 genes 36 inferred edges dashed: indirect (2) starred: direct (3) missed (39) reversed (0) ALG7 NDD1 STB1 HTA1 CDC5 SWI4 SWI5 EGT2 * CLB2 FKH2 FAR1 MCM1 ACE2 * CLN1 SWI6 PCL2 MBP1 * CDC6 CTS1 SIC1 CLN2 CLB5 CDC20 Data: Brem, Kruglyak (2005) PNAS ASH1 CDC21 FKH1 Modules/Pathways SISG (c) 2012 Brian S Yandell 48

130 phenotypic buffering of molecular QTL Fu et al. Jansen (2009 Nature Genetics) Modules/Pathways SISG (c) 2012 Brian S Yandell 49 limits of causal inference Computing costs already discussed Noisy data leads to false positive causal calls Unfaithfulness assumption violated Depends on sample size and omic technology And on graph complexity (d = maximal path length i j) Profound limits Uhler C, Raskutti G, Buhlmann P, Yu B (2012 in prep) Geometry of faithfulness assumption in causal inference. Yang Li, Bruno M. Tesson, Gary A. Churchill, Ritsert C. Jansen (2010) Critical reasoning on causal inference in genome-wide linkage and association studies. Trends in Genetics 26: Modules/Pathways SISG (c) 2012 Brian S Yandell 50

131 sizes for reliable causal inference genome wide linkage & association Li, Tesson, Churchill, Jansen (2010) Trends in Genetics Modules/Pathways SISG (c) 2012 Brian S Yandell 51 limits of causal inference unfaithful: false positive edges λ =min cor(y i,y j ) λ=c sqrt(dp/n) d=max degree p=# nodes n=sample size Uhler, Raskutti, Buhlmann, Yu (2012 in p Modules/Pathways SISG (c) 2012 Brian S Yandell 52

132 Grant support Thanks! NIH/NIDDK 58037, NIH/NIGMS 74244, NCI/ICBP U54-CA NIH/R01MH Collaborators on papers and ideas Alan Attie & Mark Keller, Biochemistry Karl Broman, Aimee Broman, Christina Modules/Pathways Kendziorski SISG (c) 2012 Brian S Yandell 53

133 Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts to share tools eqtl Tools Seattle SISG: Yandell byandell@wisc.edu UW-Madison Alan Attie Christina Kendziorski Karl Broman Mark Keller Andrew Broman Aimee Broman YounJeong Choi Elias Chaibub Neto Jee Young Moon John Dawson Ping Wang NIH Grants DK58037, DK66369, GM74244, GM69430, EY18869 Jackson Labs (HTDAS) Gary Churchill Ricardo Verdugo Keith Sheppard UC-Denver (PhenoGen) Boris Tabakoff Cheryl Hornbaker Laura Saba Paula Hoffman Labkey Software Mark Igra U Groningen (XGA) Ritsertt Jansen Morris Swertz Pjotr Pins Danny Arends Broad Institute Jill Mesirov Michael Reich eqtl Tools Seattle SISG: Yandell

134 experimental context B6 x BTBR obese mouse cross model for diabetes and obesity 500+ mice from intercross (F2) collaboration with Rosetta/Merck genotypes 5K SNP Affymetrix mouse chip care in curating genotypes! (map version, errors, ) phenotypes clinical phenotypes (>100 / mouse) gene expression traits (>40,000 / mouse / tissue) other molecular phenotypes eqtl Tools Seattle SISG: Yandell how does one filter traits? want to reduce to manageable set 10/100/1000: 00/ 000 depends ds on needs/tools s How many can the biologist handle? how can we create such sets? data-driven procedures correlation-based modules Zhang & Horvath 2005 SAGMB, Keller et al Genome Res Li et al Hum Mol Gen mapping-based focus on genome region function-driven selection with database tools GO, KEGG, etc Incomplete knowledge leads to bias random sample eqtl Tools Seattle SISG: Yandell

135 why build Web eqtl tools? common storage/maintainence of data one well-curated copy central repository reduce errors, ensure analysis on same data automate commonly used methods biologist gets immediate feedback statistician can focus on new methods codify standard choices eqtl Tools Seattle SISG: Yandell how does one build tools? no one solution for all situations use existing tools wherever possible new tools take time and care to build! downloaded databases must be updated regularly human component is key need informatics expertise need continual dialog with biologists build bridges (interfaces) between tools Web interface uses PHP commands are created dynamically for R continually rethink & redesign organization eqtl Tools Seattle SISG: Yandell

136 perspectives for building a community where disease data and models are shared Benefits of wider access to datasets and models: 1 catalyze new insights on disease & methods 2 enable deeper comparison of methods & results Lessons Learned: 1 need quick feedback between biologists & analysts 2 involve biologists early in development 3 repeated use of pipelines leads to documented dlearning from experience increased rigor in methods Challenges Ahead: 1 stitching together components as coherent system 2 ramping up to ever larger molecular datasets eqtl Tools Seattle SISG: Yandell Swertz & Jansen (2007) eqtl Tools Seattle SISG: Yandell

137 collaborative portal (LabKey) systems genetics portal (PhenoGen) iterate many times view results (R graphics, GenomeSpace tools) get data (GEO, Sage) run pipeline (CLIO,XGAP,HTD AS) eqtl Tools Seattle SISG: Yandell analysis pipeline acts on objects (extends concept of GenePattern) inpu t pipeline outpu t setting s check s eqtl Tools Seattle SISG: Yandell

138 I I pipeline is composed of many steps A A combine datasets compare methods O O E E B D C D alternative path eqtl Tools Seattle SISG: Yandell causal model selection choices in context of larger, unknown network focal trait target trait causal focal trait target trait reactive focal trait target trait correlated focal trait target trait uncorrelated eqtl Tools Seattle SISG: Yandell

139 BxH ApoE-/- chr 2: causal architecture hotspot 12 causal calls eqtl Tools Seattle SISG: Yandell BxH ApoE-/- causal network for transcription factor Pscdbp causal trait work of Elias Chaibub Neto eqtl Tools Seattle SISG: Yandell

140 collaborative portal (LabKey) systems genetics portal (PhenoGen) update periodically iterate many times view results (R graphics, GenomeSpace tools) get data (GEO, Sage) develop analysis methods & algorithms run pipeline (CLIO,XGAP,HTD AS) eqtl Tools Seattle SISG: Yandell inpu t pipeline outpu t preserv e history setting s check s raw package R&D code eqtl Tools Seattle SISG: Yandell

141 Model/View/Controller (MVC) software architecture isolate domain logic from input and presentation permit independent development, e e testing, maintenance Controller Input/response View render for interaction system actions user changes Model domain-specific logic eqtl Tools Seattle SISG: Yandell eqtl Tools Seattle SISG: Yandell

142 eqtl Tools Seattle SISG: Yandell eqtl Tools Seattle SISG: Yandell

143 library('b6btbr07') automated R script out <- multtrait(cross.name='b6btbr07', filename = 'scanone_ csv', category = 'islet', chr = c(17), threshold.level = 0.05, sex = 'both',) sink('scanone_ txt') print(summary(out)) sink() bitmap('scanone_ %03d.bmp', height = 12, width = 16, res = 72, pointsize = 20) plot(out, use.cm = TRUE) dev.off() eqtl Tools Seattle SISG: Yandell eqtl Tools Seattle SISG: Yandell

144 eqtl Tools Seattle SISG: Yandell eqtl Tools Seattle SISG: Yandell

145 eqtl Tools Seattle SISG: Yandell eqtl Tools Seattle SISG: Yandell

QTL model selection: key players

QTL model selection: key players Bayesian Interval Mapping. Bayesian strategy -9. Markov chain sampling 0-7. sampling genetic architectures 8-5 4. criteria for model selection 6-44 QTL : Bayes Seattle SISG: Yandell 008 QTL model selection:

More information

QTL model selection: key players

QTL model selection: key players QTL Model Selection. Bayesian strategy. Markov chain sampling 3. sampling genetic architectures 4. criteria for model selection Model Selection Seattle SISG: Yandell 0 QTL model selection: key players

More information

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012

Quantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012 Quantile based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March 2012 2012 Yandell 1 Fisher on inference We may at once admit that any inference from the particular

More information

Model Selection for Multiple QTL

Model Selection for Multiple QTL Model Selection for Multiple TL 1. reality of multiple TL 3-8. selecting a class of TL models 9-15 3. comparing TL models 16-4 TL model selection criteria issues of detecting epistasis 4. simulations and

More information

Inferring Genetic Architecture of Complex Biological Processes

Inferring Genetic Architecture of Complex Biological Processes Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison http://www.stat.wisc.edu/~yandell/statgen

More information

1. what is the goal of QTL study?

1. what is the goal of QTL study? NSF UAB Course 008 Bayesian Interval Mapping Brian S. Yandell, UW-Madison www.stat.wisc.edu/~yandell/statgen overview: multiple QTL approaches Bayesian QTL mapping & model selection data examples in detail

More information

Causal Network Models for Correlated Quantitative Traits. outline

Causal Network Models for Correlated Quantitative Traits. outline Causal Network Models for Correlated Quantitative Traits Brian S. Yandell UW Madison October 2012 www.stat.wisc.edu/~yandell/statgen Jax SysGen: Yandell 2012 1 outline Correlation and causation Correlatedtraitsinorganized

More information

what is Bayes theorem? posterior = likelihood * prior / C

what is Bayes theorem? posterior = likelihood * prior / C who was Bayes? Reverend Thomas Bayes (70-76) part-time mathematician buried in Bunhill Cemetary, Moongate, London famous paper in 763 Phil Trans Roy Soc London was Bayes the first with this idea? (Laplace?)

More information

contact information & resources

contact information & resources Seattle Summer Institute 006 Advanced QTL Brian S. Yandell University of Wisconsin-Madison Bayesian QTL mapping & model selection data examples in detail multiple phenotypes & microarrays software demo

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial

Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial Quantile-based permutation thresholds for QTL hotspot analysis: a tutorial Elias Chaibub Neto and Brian S Yandell September 18, 2013 1 Motivation QTL hotspots, groups of traits co-mapping to the same genomic

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Human vs mouse Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] www.daviddeen.com

More information

Causal Graphical Models in Systems Genetics

Causal Graphical Models in Systems Genetics 1 Causal Graphical Models in Systems Genetics 2013 Network Analysis Short Course - UCLA Human Genetics Elias Chaibub Neto and Brian S Yandell July 17, 2013 Motivation and basic concepts 2 3 Motivation

More information

Bayesian Model Selection for Multiple QTL. Brian S. Yandell

Bayesian Model Selection for Multiple QTL. Brian S. Yandell Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgen Jackson Laboratory, October 005 October 005 Jax Workshop Brian S. Yandell 1

More information

Gene mapping in model organisms

Gene mapping in model organisms Gene mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Goal Identify genes that contribute to common human diseases. 2

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Human vs mouse Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics & Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman www.daviddeen.com

More information

Statistical issues in QTL mapping in mice

Statistical issues in QTL mapping in mice Statistical issues in QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University http://www.biostat.jhsph.edu/~kbroman Outline Overview of QTL mapping The X chromosome Mapping

More information

QTL Model Search. Brian S. Yandell, UW-Madison January 2017

QTL Model Search. Brian S. Yandell, UW-Madison January 2017 QTL Model Search Brian S. Yandell, UW-Madison January 2017 evolution of QTL models original ideas focused on rare & costly markers models & methods refined as technology advanced single marker regression

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Backcross P 1 P 2 P 1 F 1 BC 4

More information

Mapping multiple QTL in experimental crosses

Mapping multiple QTL in experimental crosses Mapping multiple QTL in experimental crosses Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin Madison www.biostat.wisc.edu/~kbroman [ Teaching Miscellaneous lectures]

More information

Inferring Causal Phenotype Networks from Segregating Populat

Inferring Causal Phenotype Networks from Segregating Populat Inferring Causal Phenotype Networks from Segregating Populations Elias Chaibub Neto chaibub@stat.wisc.edu Statistics Department, University of Wisconsin - Madison July 15, 2008 Overview Introduction Description

More information

Causal Model Selection Hypothesis Tests in Systems Genetics

Causal Model Selection Hypothesis Tests in Systems Genetics 1 Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto and Brian S Yandell SISG 2012 July 13, 2012 2 Correlation and Causation The old view of cause and effect... could only fail;

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Introduction to QTL mapping in model organisms

Introduction to QTL mapping in model organisms Introduction to QTL mapping in model organisms Karl W Broman Department of Biostatistics Johns Hopkins University kbroman@jhsph.edu www.biostat.jhsph.edu/ kbroman Outline Experiments and data Models ANOVA

More information

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets

Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Fast Bayesian Methods for Genetic Mapping Applicable for High-Throughput Datasets Yu-Ling Chang A dissertation submitted to the faculty of the University of North Carolina at Chapel Hill in partial fulfillment

More information

QTL Mapping I: Overview and using Inbred Lines

QTL Mapping I: Overview and using Inbred Lines QTL Mapping I: Overview and using Inbred Lines Key idea: Looking for marker-trait associations in collections of relatives If (say) the mean trait value for marker genotype MM is statisically different

More information

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org

R/qtl workshop. (part 2) Karl Broman. Biostatistics and Medical Informatics University of Wisconsin Madison. kbroman.org R/qtl workshop (part 2) Karl Broman Biostatistics and Medical Informatics University of Wisconsin Madison kbroman.org github.com/kbroman @kwbroman Example Sugiyama et al. Genomics 71:70-77, 2001 250 male

More information

Use of hidden Markov models for QTL mapping

Use of hidden Markov models for QTL mapping Use of hidden Markov models for QTL mapping Karl W Broman Department of Biostatistics, Johns Hopkins University December 5, 2006 An important aspect of the QTL mapping problem is the treatment of missing

More information

Lecture WS Evolutionary Genetics Part I 1

Lecture WS Evolutionary Genetics Part I 1 Quantitative genetics Quantitative genetics is the study of the inheritance of quantitative/continuous phenotypic traits, like human height and body size, grain colour in winter wheat or beak depth in

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Overview. Background

Overview. Background Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci

Bayesian Partition Models for Identifying Expression Quantitative Trait Loci Journal of the American Statistical Association ISSN: 0162-1459 (Print) 1537-274X (Online) Journal homepage: http://www.tandfonline.com/loi/uasa20 Bayesian Partition Models for Identifying Expression Quantitative

More information

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines

Lecture 8. QTL Mapping 1: Overview and Using Inbred Lines Lecture 8 QTL Mapping 1: Overview and Using Inbred Lines Bruce Walsh. jbwalsh@u.arizona.edu. University of Arizona. Notes from a short course taught Jan-Feb 2012 at University of Uppsala While the machinery

More information

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI

BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI BAYESIAN MAPPING OF MULTIPLE QUANTITATIVE TRAIT LOCI By DÁMARIS SANTANA MORANT A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR

More information

Linear Regression (1/1/17)

Linear Regression (1/1/17) STA613/CBB540: Statistical methods in computational biology Linear Regression (1/1/17) Lecturer: Barbara Engelhardt Scribe: Ethan Hada 1. Linear regression 1.1. Linear regression basics. Linear regression

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees:

Tutorial Session 2. MCMC for the analysis of genetic data on pedigrees: MCMC for the analysis of genetic data on pedigrees: Tutorial Session 2 Elizabeth Thompson University of Washington Genetic mapping and linkage lod scores Monte Carlo likelihood and likelihood ratio estimation

More information

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen

A Statistical Framework for Expression Trait Loci (ETL) Mapping. Meng Chen A Statistical Framework for Expression Trait Loci (ETL) Mapping Meng Chen Prelim Paper in partial fulfillment of the requirements for the Ph.D. program in the Department of Statistics University of Wisconsin-Madison

More information

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control

Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia

Expression QTLs and Mapping of Complex Trait Loci. Paul Schliekelman Statistics Department University of Georgia Expression QTLs and Mapping of Complex Trait Loci Paul Schliekelman Statistics Department University of Georgia Definitions: Genes, Loci and Alleles A gene codes for a protein. Proteins due everything.

More information

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia

Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. Jessica Mendes Maia ABSTRACT MAIA, JESSICA M. Joint Analysis of Multiple Gene Expression Traits to Map Expression Quantitative Trait Loci. (Under the direction of Professor Zhao-Bang Zeng). The goal of this dissertation is

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Methods for QTL analysis

Methods for QTL analysis Methods for QTL analysis Julius van der Werf METHODS FOR QTL ANALYSIS... 44 SINGLE VERSUS MULTIPLE MARKERS... 45 DETERMINING ASSOCIATIONS BETWEEN GENETIC MARKERS AND QTL WITH TWO MARKERS... 45 INTERVAL

More information

Hotspots and Causal Inference For Yeast Data

Hotspots and Causal Inference For Yeast Data Hotspots and Causal Inference For Yeast Data Elias Chaibub Neto and Brian S Yandell October 24, 2012 Here we reproduce the analysis of the budding yeast genetical genomics data-set presented in Chaibub

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture 20: Epistasis and Alternative Tests in GWAS Jason Mezey jgm45@cornell.edu April 16, 2016 (Th) 8:40-9:55 None Announcements Summary

More information

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population

Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population Genet. Sel. Evol. 36 (2004) 415 433 415 c INRA, EDP Sciences, 2004 DOI: 10.1051/gse:2004009 Original article Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

TECHNICAL REPORT NO December 1, 2008

TECHNICAL REPORT NO December 1, 2008 DEPARTMENT OF STATISTICS University of Wisconsin 300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 46 December, 2008 Revised on January 27, 2009 Causal Graphical Models in System Genetics: a

More information

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES

MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI ROBY JOEHANES MULTIPLE-TRAIT MULTIPLE-INTERVAL MAPPING OF QUANTITATIVE-TRAIT LOCI by ROBY JOEHANES B.S., Universitas Pelita Harapan, Indonesia, 1999 M.S., Kansas State University, 2002 A REPORT submitted in partial

More information

Latent Variable models for GWAs

Latent Variable models for GWAs Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Selecting explanatory variables with the modified version of Bayesian Information Criterion

Selecting explanatory variables with the modified version of Bayesian Information Criterion Selecting explanatory variables with the modified version of Bayesian Information Criterion Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland in cooperation with J.K.Ghosh,

More information

Evolution of phenotypic traits

Evolution of phenotypic traits Quantitative genetics Evolution of phenotypic traits Very few phenotypic traits are controlled by one locus, as in our previous discussion of genetics and evolution Quantitative genetics considers characters

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Lecture 11: Multiple trait models for QTL analysis

Lecture 11: Multiple trait models for QTL analysis Lecture 11: Multiple trait models for QTL analysis Julius van der Werf Multiple trait mapping of QTL...99 Increased power of QTL detection...99 Testing for linked QTL vs pleiotropic QTL...100 Multiple

More information

Locating multiple interacting quantitative trait. loci using rank-based model selection

Locating multiple interacting quantitative trait. loci using rank-based model selection Genetics: Published Articles Ahead of Print, published on May 16, 2007 as 10.1534/genetics.106.068031 Locating multiple interacting quantitative trait loci using rank-based model selection, 1 Ma lgorzata

More information

Lecture 9. QTL Mapping 2: Outbred Populations

Lecture 9. QTL Mapping 2: Outbred Populations Lecture 9 QTL Mapping 2: Outbred Populations Bruce Walsh. Aug 2004. Royal Veterinary and Agricultural University, Denmark The major difference between QTL analysis using inbred-line crosses vs. outbred

More information

Causal Model Selection Hypothesis Tests. in Systems Genetics

Causal Model Selection Hypothesis Tests. in Systems Genetics Causal Model Selection Hypothesis Tests in Systems Genetics Elias Chaibub Neto 1 Aimee T. Broman 2 Mark P Keller 2 Alan D Attie 2 Bin Zhang 1 Jun Zhu 1 Brian S Yandell 3,4 1 Sage Bionetworks, Seattle,

More information

Doing Bayesian Integrals

Doing Bayesian Integrals ASTR509-13 Doing Bayesian Integrals The Reverend Thomas Bayes (c.1702 1761) Philosopher, theologian, mathematician Presbyterian (non-conformist) minister Tunbridge Wells, UK Elected FRS, perhaps due to

More information

Principles of QTL Mapping. M.Imtiaz

Principles of QTL Mapping. M.Imtiaz Principles of QTL Mapping M.Imtiaz Introduction Definitions of terminology Reasons for QTL mapping Principles of QTL mapping Requirements For QTL Mapping Demonstration with experimental data Merit of QTL

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Introduction to population genetics & evolution

Introduction to population genetics & evolution Introduction to population genetics & evolution Course Organization Exam dates: Feb 19 March 1st Has everybody registered? Did you get the email with the exam schedule Summer seminar: Hot topics in Bioinformatics

More information

Genotype Imputation. Biostatistics 666

Genotype Imputation. Biostatistics 666 Genotype Imputation Biostatistics 666 Previously Hidden Markov Models for Relative Pairs Linkage analysis using affected sibling pairs Estimation of pairwise relationships Identity-by-Descent Relatives

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Review Article Statistical Methods for Mapping Multiple QTL

Review Article Statistical Methods for Mapping Multiple QTL Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2008, Article ID 286561, 8 pages doi:10.1155/2008/286561 Review Article Statistical Methods for Mapping Multiple QTL Wei Zou

More information

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5

Association Testing with Quantitative Traits: Common and Rare Variants. Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 Association Testing with Quantitative Traits: Common and Rare Variants Timothy Thornton and Katie Kerr Summer Institute in Statistical Genetics 2014 Module 10 Lecture 5 1 / 41 Introduction to Quantitative

More information

Bayesian Regression (1/31/13)

Bayesian Regression (1/31/13) STA613/CBB540: Statistical methods in computational biology Bayesian Regression (1/31/13) Lecturer: Barbara Engelhardt Scribe: Amanda Lea 1 Bayesian Paradigm Bayesian methods ask: given that I have observed

More information

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1

TECHNICAL REPORT NO April 22, Causal Model Selection Tests in Systems Genetics 1 DEPARTMENT OF STATISTICS University of Wisconsin 1300 University Avenue Madison, WI 53706 TECHNICAL REPORT NO. 1157 April 22, 2010 Causal Model Selection Tests in Systems Genetics 1 Elias Chaibub Neto

More information

NIH Public Access Author Manuscript Ann Appl Stat. Author manuscript; available in PMC 2011 January 7.

NIH Public Access Author Manuscript Ann Appl Stat. Author manuscript; available in PMC 2011 January 7. NIH Public Access Author Manuscript Published in final edited form as: Ann Appl Stat. 2010 March 1; 4(1): 320 339. CAUSAL GRAPHICAL MODELS IN SYSTEMS GENETICS: A UNIFIED FRAMEWORK FOR JOINT INFERENCE OF

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017

Lecture 2: Genetic Association Testing with Quantitative Traits. Summer Institute in Statistical Genetics 2017 Lecture 2: Genetic Association Testing with Quantitative Traits Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 29 Introduction to Quantitative Trait Mapping

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties of Bayesian

More information

(Genome-wide) association analysis

(Genome-wide) association analysis (Genome-wide) association analysis 1 Key concepts Mapping QTL by association relies on linkage disequilibrium in the population; LD can be caused by close linkage between a QTL and marker (= good) or by

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu February 12, 2015 Lecture 3:

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression

Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized Regression Genetics: Published Articles Ahead of Print, published on February 15, 2010 as 10.1534/genetics.110.114280 Genome-wide Multiple Loci Mapping in Experimental Crosses by the Iterative Adaptive Penalized

More information

Multiple interval mapping for ordinal traits

Multiple interval mapping for ordinal traits Genetics: Published Articles Ahead of Print, published on April 3, 2006 as 10.1534/genetics.105.054619 Multiple interval mapping for ordinal traits Jian Li,,1, Shengchu Wang and Zhao-Bang Zeng,, Bioinformatics

More information

Binary trait mapping in experimental crosses with selective genotyping

Binary trait mapping in experimental crosses with selective genotyping Genetics: Published Articles Ahead of Print, published on May 4, 2009 as 10.1534/genetics.108.098913 Binary trait mapping in experimental crosses with selective genotyping Ani Manichaikul,1 and Karl W.

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Bayesian Inference of Interactions and Associations

Bayesian Inference of Interactions and Associations Bayesian Inference of Interactions and Associations Jun Liu Department of Statistics Harvard University http://www.fas.harvard.edu/~junliu Based on collaborations with Yu Zhang, Jing Zhang, Yuan Yuan,

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline

Module 4: Bayesian Methods Lecture 9 A: Default prior selection. Outline Module 4: Bayesian Methods Lecture 9 A: Default prior selection Peter Ho Departments of Statistics and Biostatistics University of Washington Outline Je reys prior Unit information priors Empirical Bayes

More information

arxiv: v4 [stat.me] 2 May 2015

arxiv: v4 [stat.me] 2 May 2015 Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 BAYESIAN NONPARAMETRIC TESTS VIA SLICED INVERSE MODELING By Bo Jiang, Chao Ye and Jun S. Liu Harvard University, Tsinghua University and Harvard

More information

Affected Sibling Pairs. Biostatistics 666

Affected Sibling Pairs. Biostatistics 666 Affected Sibling airs Biostatistics 666 Today Discussion of linkage analysis using affected sibling pairs Our exploration will include several components we have seen before: A simple disease model IBD

More information

arxiv: v1 [stat.ap] 7 Oct 2010

arxiv: v1 [stat.ap] 7 Oct 2010 The Annals of Applied Statistics 2010, Vol. 4, No. 1, 320 339 DOI: 10.1214/09-AOAS288 c Institute of Mathematical Statistics, 2010 arxiv:1010.1402v1 [stat.ap] 7 Oct 2010 CAUSAL GRAPHICAL MODELS IN SYSTEMS

More information

Hierarchical Generalized Linear Models for Multiple QTL Mapping

Hierarchical Generalized Linear Models for Multiple QTL Mapping Genetics: Published Articles Ahead of Print, published on January 1, 009 as 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple QTL Mapping Nengun Yi 1,* and Samprit Baneree

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES

MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES MODEL-FREE LINKAGE AND ASSOCIATION MAPPING OF COMPLEX TRAITS USING QUANTITATIVE ENDOPHENOTYPES Saurabh Ghosh Human Genetics Unit Indian Statistical Institute, Kolkata Most common diseases are caused by

More information

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q) Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,

More information

MOST complex traits are influenced by interacting

MOST complex traits are influenced by interacting Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.099556 Hierarchical Generalized Linear Models for Multiple Quantitative Trait Locus Mapping Nengjun Yi*,1 and Samprit Banerjee

More information

Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure

Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure Enhancing eqtl Analysis Techniques with Special Attention to the Transcript Dependency Structure by John C. Schwarz A dissertation submitted to the faculty of the University of North Carolina at Chapel

More information

Teachers Guide. Overview

Teachers Guide. Overview Teachers Guide Overview BioLogica is multilevel courseware for genetics. All the levels are linked so that changes in one level are reflected in all the other levels. The BioLogica activities guide learners

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties

More information