4.1 basic idea of interval mapping

Size: px

Start display at page:

Download "4.1 basic idea of interval mapping"

Doreen Davis
5 years ago
Views:

1 4 Interval Mappng for a Sngle TL basc dea of nterval mappng nterval mappng by maxmum lkelhood maxmum lkelhood usng EM MCMC Bayesan nterval mappng "natural" Bayesan prors multple mputaton MCMC bootstrapped varance estmates advantages & shortcomngs of IM Haley-Knott regresson approxmaton ch Broman Churchll Yandell Zeng 4. basc dea of nterval mappng study propertes of lkelhood at each possble TL treatng TL as mssng data assumng only a sngle TL (for now) recall lkelhood as mxture over unknown TL lkelhood product of sum of products complcated to evaluate--requres teraton L( θ λ Y X ) pr( Y X θ λ) prod prod pr( Y sum X θ λ) pr( X λ)pr( Y θ ) ch Broman Churchll Yandell Zeng

2 uncertanty n TL genotype how to mprove guess on wth data parameters? pror recombnaton: pr( X λ ) posteror recombnaton: pr( Y X θ λ ) man phlosophes for assessng lkelhood maxmum lkelhood: study peak(s) Bayesan analyss: study whole shape mplementaton methodologes Expectaton-Maxmzaton (EM) Markov chan Monte Carlo (MCMC) multple mputaton genetc algorthms GEE ch Broman Churchll Yandell Zeng 3 posteror on TL genotypes full condtonal of gven data parameters proportonal to pror pr( X λ ) weght toward that agrees wth flankng markers proportonal to lkelhood pr(y θ) weght toward so that group mean G Y phenotype and flankng markers may conflct posteror recombnaton balances these two weghts pr( X λ)pr( Y θ ) pr( Y X θ λ) pr( Y X θ λ) ch Broman Churchll Yandell Zeng 4

3 how does phenotype Y affect? D4Mt4 D4Mt4 bp what are probabltes for genotype between markers? recombnants AA:AB all : f gnore Y and f we use Y? AA AA AB AA AA AB AB AB Genotype ch Broman Churchll Yandell Zeng 5 maxmum lkelhood (ML) dea pck TL locus λ (usually scan whole genome) fnd ML estmates of gene acton θ gven λ maxmum lkelhood at peak of lkelhood slope (dervatve wth respect to θ) s zero sometmes maxmum s at a boundary (non-zero slope) slope s weghted average usng posterors for cannot wrte estmate n "closed form" need to know θ to estmate t! terate toward the maxmum n some clever way dl( θ λ Y X ) sum dθ d log(pr( Y θ )) pr( Y X θ λ) dθ ch Broman Churchll Yandell Zeng 6

4 Bayesan model posteror augment data (YX) wth unknowns study unknowns (θλ) gven data (YX) ~ pr( Y X θ λ ) no longer need weghted average over nstead we average over to study parameters pr(θλ YX) sum pr(θλ YX) study propertes of posteror need to specfy prors for (θλ) denomnator s very dffcult to compute n practce drawng samples from posteror n some clever way pr( θ λ Y X ) pr( X λ)pr( Y θ )pr( λ X )pr( θ ) pr( Y X ) ch Broman Churchll Yandell Zeng 7 4. nterval mappng by ML search whole genome for putatve TL "profle" lkelhood across all possble λ fnd ML estmate of θ gven λ ML estmate of (θλ) at maxmum over genome L ( ˆ θ Y ) prod 0 0 L( ˆ θ λ Y X ) prod sum LOD( λ) log f ( Y ˆ µ s L( ˆ θ λ Y X ) ˆ pr( X λ) f ( Y Gˆ ˆ σ 0 L0 ( θ0 Y ) ch Broman Churchll Yandell Zeng 8 ) pool )

5 LOD for hyper dataset X chromosomes hghest LOD on 4 other TL? lod X ch Broman Churchll Yandell Zeng 9 LOD(λ) on chr 4 of hyper 8 6 EM HK IMP EM "exact" Haley-Knott regresson sngle mputaton lod 4 all agree at the peak and mostly at markers note marker spacng Map poston (cm) ch Broman Churchll Yandell Zeng 0

6 EM method for nterval mappng fx a possble TL λ terate between expectaton & maxmzaton lkelhood ncreases wth each teraton stop teratng when the change s "neglgble" ntal values P pr( X λ ) recombnaton model n the absence of data or use Haley-Knott regresson estmates of θ ch Broman Churchll Yandell Zeng EM method for nterval mappng E-step: estmate posteror recombnaton P pr( Y X θ λ ) estmate for every ndvdual genotype depends on effects θ M-steps: maxmze lkelhood for θ may be many parameters techncal pont: cauton on parallel updates solve system of equaton: dervatves set to zero depends on P 0 sum P d log(pr( Y θ )) dθ ch Broman Churchll Yandell Zeng

7 4.. M-step for normal phenotype Y G + e e ~ N(0σ ) pr(y θ ) f(y G σ ) see notes n book for dervatve detals E-step estmates: Gˆ ˆ σ sum sum Y P P ( Y Gˆ ) P / n /sum ch Broman Churchll Yandell Zeng ML va MCMC basc dea of smulated annealng start wth non-nformatve prors on (θλ) sample from posteror (somehow ) gradually shrnk prors toward ML estmate slght dffculty need to know (θλ) to sample from posteror teraton leads to Markov chan pont of ths secton MCMC does not mply a Bayesan perspectve! ch Broman Churchll Yandell Zeng 4

8 4.3 Bayesan nterval mappng sample mssng genotypes decouple effects θ from TL λ but depends on (θλ) and vce versa also need to specfy prors pr( X λ)pr( λ X ) λ ~ pr( X ) ~ pr( Y X θ λ) pr( Y θ )pr( θ ) θ ~ pr( Y ) ch Broman Churchll Yandell Zeng Bayesan prors for TL locus λ may be unform over genome pr(λ X ) / length of genome mssng genotypes pr( X λ ) recombnaton model s formally a pror effects θ (Gσ ) G (G G q G qq ) conjugate prors for normal phenotype G ~ N(µ κσ ) σ ~ nverse-χ (ντ ) or ντ / σ ~ χ ch Broman Churchll Yandell Zeng 6

9 effect of pror varance on posteror κ0.5.0 κ κ0.5 κ normal pror posteror for n posteror for n 5 true mean (sold black) (dotted blue) (dashed red) (green arrow) ch Broman Churchll Yandell Zeng 7 detals of phenotype prors prors depend on "hyper-parameters" G ~ N(µ κσ ) center around phenotype grand mean κσ σ G genetc varance κ σ G /σ h / ( h ) h σ G /(σ G +σ ) hertablty σ ~ nverse-χ (ντ ) or ντ / σ ~ χ τ s total sample varance ν pror degrees of freedom small nteger ch Broman Churchll Yandell Zeng 8

10 ch Broman Churchll Yandell Zeng 9 Y G + E posteror for sngle ndvdual envron E ~ N(0σ ) σ known lkelhood pr(y Gσ ) N(Y Gσ ) pror pr(g σ µκ) N(G µκσ ) posteror N(G µ+b (Y-µ) B σ ) Y G + E posteror for sample of n ndvduals shrnkage weghts B n go to Bayes for normal data sum wth ) ( N ) pr( + + n n B n Y Y n B Y B G Y G n n n κ κ σ µ µ κ µ σ ch Broman Churchll Yandell Zeng 0 Y G + E genetc qqq envron E ~ N(0σ ) σ known parameters θ (Gσ ) lkelhood pr(y Gσ ) N(Y G σ ) pror pr(g σ µκ) N(G µκσ ) posteror: posteror by T genetc value sum } count{ ) ( N ) pr( } : { + + n n B n Y Y n n B Y B G Y G κ κ σ µ µ κ µ σ

11 Emprcal Bayes: choosng hyper-parameters How do we choose hyper-parameters µκ? Emprcal Bayes: margnalze over pror estmate µκ from margnal posteror lkelhood pr(y Gσ ) N(Y G( )σ ) pror pr(g σ µκ) N(G µκσ ) margnal pr(y σ µκ) N(Y µ (κ +)σ ) estmates EB posteror ˆ µ Y s sum ( Y Y ) / n ˆ σ s /( κ + ) s /( h ) pr( G Y ) N G Y + B ( Y Y ) B ˆ σ n ch Broman Churchll Yandell Zeng What f varance σ s unknown? recall that sample varance s proportonal to ch-square pr(s σ ) χ (ns /σ n) or equvalently ns /σ σ ~ χ n conjugate pror s nverse ch-square pr(σ ντ ) nv-χ (σ ντ ) or equvalently ντ /σ ντ ~ χ ν emprcal choce: τ s /3 ν6 E(σ ντ ) s / Var(σ ντ ) s 4 /4 posteror gven data pr(σ Yντ ) nv-χ (σ ν+n (ντ +ns )/(ν+n)) weghted average of pror and data ch Broman Churchll Yandell Zeng

12 jont effects posteror detals Y G( ) + E genetc qqq envron E ~ N(0σ ) parameters θ (Gσ ) lkelhood pr(y Gσ ) N(Y G( )σ ) pror pr(g σ µκ) N(G µ σ /κ) posteror: pr(σ ντ ) nv-χ (σ ντ ) pr( G Y σ µ κ ) N G Y + B ( Y ντ + ns pr( σ Y G ν τ ) nv - χ n σ ν + ν + n n ( ) n σ Y ) B n wth B s sum Y G( ) / κ + n ch Broman Churchll Yandell Zeng Bayesan multple mputaton basc dea mpute multple copes of mssng genotypes sample ~pr( X λ) weghted to appear as draws from posteror average out gene effects θ study posteror for putatve TL λ most effectve for multple TL use sngle TL to ntroduce dea consder all loc as possble TL sample on grd Λ of `pseudomarkers' (every cm) smlar to nterval map scan of whole genome ch Broman Churchll Yandell Zeng 4

13 mportance samplng dea draw samples from one dstrbuton 3 n ~ f() weght them approprately by ω() sample summares from dstrbuton g() g() f()ω() / constant mean for f sum / n mean for g sum ω( ) / sum ω( ) ch Broman Churchll Yandell Zeng 5 example: mean copes of genotype qq q sum copes 0 true g draw f /3 /3 /3.0 weght ω f ω /3 /3 /3 4/3 mportance samplng g f 0.75ω sample mean f /00. mean g /30.08 ch Broman Churchll Yandell Zeng 6

14 what are approprate weghts? deally draw genotype from posteror want sample ~ g() sum θ pr( YXθλ)pr(θ ) but have sample ~ f() pr( Xλ) approprate weghts ω(λ YX) pr(λ X) sum θ pr(y θ )pr(θ ) estmate margnal posteror for TL λ draw N samples from pror at each TL λ 3 N ~ pr( Xλ) pr(λ YX) sum ω(λ YX) pr( Xλ) / constant sum j ω( j λ YX) / constant constant s summed over all λ but not actually needed ch Broman Churchll Yandell Zeng 7 relatng weghts to posteror posteror s smply averaged over θ weghts comprse terms except pr( Xλ) estmatng weghts: see Sen & Churchll pr( λ Y X ) sum θ pr( θ λ Y X ) pr( X λ)pr( λ X ) sumθ pr( Y θ )pr( θ ) pr( Y X ) pr( X λ) ω( λ Y X ) / pr( Y X ) ch Broman Churchll Yandell Zeng 8

15 estmatng effects va mputaton multple mputaton averages over effects dffcult to study posteror of effects drectly can estmate usual summares E( θ Y X ) sum sum sum λ λ j E( θ Y ) pr( X λ) ω( λ Y X ) / pr( Y X ) E( θ Y E( θ Y ) pr( Y X ) j ) ω( j λ Y X ) / constant ch Broman Churchll Yandell Zeng Bayesan MCMC Markov chan Monte Carlo Monte Carlo samples along a Markov chan What s a Markov chan? What s MCMC? Samplng from full condtonals Gbbs sampler Metropols-Hastngs ch Broman Churchll Yandell Zeng 30

16 What s a Markov chan? future gven present s ndependent of past update chan based on current value can make chan arbtrarly complcated chan converges to stable pattern π() we wsh to study pr() p /( p + q) p -p 0 -q q ch Broman Churchll Yandell Zeng 3 Markov chan dea p pr() p /( p + q) -p 0 -q q state tme ch Broman Churchll Yandell Zeng 3

17 Markov chan Monte Carlo can study arbtrarly complex models need only specfy how parameters affect each other can reduce to specfyng full condtonals construct Markov chan wth rght model jont posteror of unknowns as lmtng stable dstrbuton update unknowns gven data and all other unknowns sample from full condtonals cycle at random through all parameters next step depends only on current values nce Markov chans have nce propertes sample summares make sense consder almost as random sample from dstrbuton ergodc theorem and all that stuff ch Broman Churchll Yandell Zeng 33 Markov chan Monte Carlo dea have posteror pr(θ Y) want to draw samples propose θ ~ pr(θ Y) (deal: Gbbs sample) propose new θ nearby accept f more probable toss con f less probable based on relatve heghts (Metropols-Hastngs) pr(θ Y) θ ch Broman Churchll Yandell Zeng 34

18 mcmc sequence MCMC realzaton pr(θ Y) θ θ added twst: occasonally propose from whole doman ch Broman Churchll Yandell Zeng 35 margnal posterors jont posteror pr(λθ YX) pr(θ)pr(λ)pr( Xλ)pr(Y θ) /constant genetc effects observed pr(θ YX) sum pr(θ Y) pr( YX) X Y TL locus mssng pr(λ YX) sum pr(λ X) pr( YX) unknown TL genotypes more complcated λ θ pr( YX) sum λθ pr( YXλθ ) pr(λθ YX) mpossble to separate λ and θ n sum ch Broman Churchll Yandell Zeng 36

19 Why not Ordnary Monte Carlo? ndependent samples of jont dstrbuton channg (or peelng) of effects pr(θ Y)pr(G Yσ )pr(σ Y) possble analytcally here gven genotypes Monte Carlo: draw N samples from posteror sample varance σ sample genetc values G gven varance σ but we know markers X not genotypes! would have messy average over possble pr(θ YX) sum pr(θ Y) pr( YX) ch Broman Churchll Yandell Zeng 37 MCMC Idea for TLs construct Markov chan around posteror want posteror as stable dstrbuton of Markov chan n practce the chan tends toward stable dstrbuton ntal values may have low posteror probablty burn-n perod to get chan mxng well update components from full condtonals update effects θ gven genotypes & trats update locus λ gven genotypes & marker map update genotypes gven trats marker map locus & effects ( λ θ) ( λ θ) ~ pr( λ θ Y X) ( ) λ θ L λ θ N ( ) ch Broman Churchll Yandell Zeng 38

20 sample from full condtonals hard to sample from jont posteror update each unknown gven all others examne posteror: keep terms wth unknown normalzng denomnator make a dstrbuton pr( X λ)pr( λ X ) λ ~ pr( X ) ~ pr( Y X θ λ) pr( Y θ )pr( θ ) θ ~ pr( Y ) ch Broman Churchll Yandell Zeng 39 sample from full condtonals for model wth m TL hard to sample from jont posteror pr(λθ YX) pr(θ)pr(λ)pr( Xλ)pr(Y θ) /constant easy to sample parameters from full condtonals full condtonal for genetc effects pr(θ YXλ) pr(θ Y) pr(θ) pr(y θ) /constant full condtonal for TL locus pr(λ YXθ) pr(λ X) pr(λ) pr( Xλ) /constant full condtonal for TL genotypes pr( YXλθ ) pr( Xλ) pr(y θ) /constant observed X Y mssng unknown λ θ ch Broman Churchll Yandell Zeng 40

21 Gbbs sampler dea want to study two correlated normals could sample drectly from bvarate normal Gbbs sampler: sample each from ts full condtonal pck order of samplng at random repeat N tmes θ θ θ θ µ ρ ~ N θ θ µ ρ ~ N µ ρ µ ρ ~ N µ ρ ( µ + ρ( θ µ ) ρ ) ( µ + ρ( θ µ ) ρ ) ch Broman Churchll Yandell Zeng 4 Gbbs sampler samples: ρ 0.6 N 50 samples N 00 samples Gbbs: mean Markov chan ndex Gbbs: mean Gbbs: mean Gbbs: mean Gbbs: mean Gbbs: mean Markov chan ndex Gbbs: mean Gbbs: mean Gbbs: mean Gbbs: mean Markov chan ndex Gbbs: mean Markov chan ndex Gbbs: mean ch Broman Churchll Yandell Zeng 4

22 Gbbs Sampler: effects & genotypes for gven locus λ can sample effects θ and genotypes effects parameter vector θ (Gσ ) wth G(G qq G q G ) mssng genotype vector ( n ) Gbbs sampler: update one at a tme va full condtonals randomly select order of unknowns update each gven current values of all others locus λ and data (YX) sample varance σ gven Y and genetc values G sample genotype gven markers X and locus λ can do block updates f more effcent sample all genetc values G gven Y and varance σ ch Broman Churchll Yandell Zeng 43 phenotype model: alternate form genetc value G n cell means form easy but often useful to model effects drectly sort out addtve and domnance effects useful for reduced models wth multple TL TL man effects and nteractons (parwse 3-way etc.) we only consder addtve effects here G qq µ a G q µg µ + a recodng for regresson model for genotype qq 0 for genotype q for genotype G( ) µ + a ch Broman Churchll Yandell Zeng 44

23 ch Broman Churchll Yandell Zeng 45 MCMC run of mean & addtve MCMC run/ mean frequency MCMC run/ addtve frequency ch Broman Churchll Yandell Zeng 46 MCMC run for varance MCMC run varance frequency

24 mssng marker data sample mssng marker data a la T genotypes full condtonal for mssng markers depends on flankng markers possble flankng TL can explctly decompose by ndvdual bnomal (or trnomal) probablty pr( X k aaaa or AA Y X θ λ) pr( X X k k X λ) ch Broman Churchll Yandell Zeng 47 Metropols-Hastngs dea want to study dstrbuton f(θ) take Monte Carlo samples unless too complcated Metropols-Hastngs samples: current sample value θ propose new value θ * from some dstrbuton g(θθ * ) Gbbs sampler: g(θθ * ) f(θ * ) accept new value wth prob A Gbbs sampler: A A * * f ( θ ) g( θ θ ) mn * f ( θ ) g( θ θ ) ch Broman Churchll Yandell Zeng f(θ) g(θ θ * )

25 Metropols-Hastngs samples mcmc sequence N 00 samples N 000 samples narrow g wde g narrow g wde g mcmc sequence mcmc sequence mcmc sequence pr(θ Y) θ pr(θ Y) θ pr(θ Y) θ pr(θ Y) θ θ θ θ θ ch Broman Churchll Yandell Zeng 49 full condtonal for locus cannot easly sample from locus full condtonal pr(λ YXθ) pr(λ X) pr(λ) pr( Xλ) /constant cannot explctly determne full condtonal dffcult to normalze need to average over all possble genotypes over entre map Gbbs sampler wll not work but can use method based on ratos of probabltes ch Broman Churchll Yandell Zeng 50

26 ch Broman Churchll Yandell Zeng 5 Metropols-Hastngs Step pck new locus based upon current locus propose new locus from dstrbuton q( ) pck value near current one? pck unformly across genome? accept new locus wth probablty a() Gbbs sampler s specal case of M-H always accept new proposal acceptance nsures rght stable dstrbuton accept new proposal wth probablty A otherwse stck wth current value ) ( ) ( ) ( ) ( mn ) ( * * new old old old new new new old q q A λ λ λ π λ λ λ π λ λ x x ch Broman Churchll Yandell Zeng MCMC run/ dstance (cm) frequency MCMC Run for locus at 40cM

27 Care & Use of MCMC sample chan for long run ( ) longer for more complcated lkelhoods use dagnostc plots to assess mxng standard error of estmates use hstogram of posteror compute varance of posteror--just another summary studyng the Markov chan Monte Carlo error of seres (Geyer 99) tme seres estmate based on lagged auto-covarances convergence dagnostcs for proper mxng ch Broman Churchll Yandell Zeng bootstrapped varance estmates (re)sample (Y X ) wth replacement create bootstrap sample "new" data of sze n estmate loc λ and effects θ repeat ths N tmes construct summares of these mean varance medan percentle construct 95% confdence ntervals for λ and θ (.5%le 97.5%le) order estmates pck number.05n and.975n ch Broman Churchll Yandell Zeng 54

28 4.5 advantages & shortcomngs of IM advantages over sngle marker analyss can nfer poston and effect of TL estmated locatons & effects almost unbased f only one segregatng TL per chromosome requres fewer ndvduals for detecton of TL ch Broman Churchll Yandell Zeng 55 not an nterval test shortcomngs of IM cannot say whether or not TL s n an nterval not ndependent of effects of TL outsde nterval can gve false postves due to lnkage hgh LOD score due to nearby TL less of a problem for unlnked TL can detect "ghost TL" hgher peak between two lnked TL estmated poston and effect are based ch Broman Churchll Yandell Zeng 56

29 4.6 Haley-Knott Regresson Approxmaton lkelhood mxes over mssng genotypes normal data mxture of normals approxmate mxture by one normal just estmate mean and varance advantages works well for closely spaced markers mean s correct can explot flankng markers for mssng data calculatons are easy and fast (PLABTL) dsadvantages varance depends on marker genotypes and spacngs approxmaton errors accumulate for multple TL ch Broman Churchll Yandell Zeng 57 Haley-Knott regresson dea replace mssng genotypes by expected values P E( X λ) sum pr( X λ) ft regresson model (e.g. addtve gene acton) Y µ + αp + e n assume constant varance correct mean E(Y X θ λ) P wrong varance V (Y X θ λ) σ sum [pr( X λ)] ch Broman Churchll Yandell Zeng 58

30 Haley-Knott and EM both use expected value of genotypes HK: P E( X λ) pror expectaton EM: P E( Y X θ λ) pror expectaton both solve regresson problems for effects dfference s n teraton HK s frst step teraton EM terates E-step and M-steps to convergence ch Broman Churchll Yandell Zeng 59

multiple QTL likelihood

multiple QTL likelihood Bayesan Interval Mappng multple TL lkelhood compare CIM MIM mputaton BIM Drosophla shape example Bayesan dea Who was Bayes? What s Bayes theorem? Bayesan Bayes factors and margnal posterors Markov chan