Mximum Likelihood Estimtion for Allele Frequencies Biosttistics 666
Previous Series of Lectures: Introduction to Colescent Models Comuttionlly efficient frmework Alterntive to forwrd simultions Amenble to nlyticl solutions Predictions bout sequence vrition Number of olymorhisms Frequency of olymorhisms Distribution of olymorhisms cross hlotyes
Next Series of Lectures Estimting llele nd hlotye frequencies from genotye dt Mximum likelihood roch Aliction of n E-M lgorithm Chllenges Using informtion from relted individuls Allowing for non-codominnt genotyes Allowing for mbiguity in hlotye ssignments
Mximum Likelihood A generl frmework for estimting model rmeters Find rmeter vlues tht mximize the robbility of the observed dt Lern bout oultion chrcteristics E.g. llele frequencies, oultion size Using secific smle E.g. set sequences, unrelted individuls, or even fmilies Alicble to mny different roblems
Exmle: Allele Frequencies Consider A smle of n chromosomes X of these re of tye Prmeter of interest is llele frequency L( n, X ) n X n X ( ) X
Evlute for vrious rmeters - L 0.0.0 0.000 0. 0.8 0.088 0.4 0.6 0.5 0.6 0.4 0. 0.8 0. 0.006.0 0.0 0.000 For n = 0 nd X = 4
Likelihood Plot 0.4 For n = 0 nd X = 4 0.3 Likelihood 0. 0. 0 0.0 0. 0.4 0.6 0.8.0 Allele Frequency
In this cse The likelihood tells us the dt is most robble if = 0.4 The likelihood curve llows us to evlute lterntives Is = 0.8 ossibility? Is = 0. ossibility?
Exmle: Estimting 4N Consider S olymorhisms in smle of n sequences L( n, S) P ( S ) n Where P n is clculted using the Q n nd P functions defined reviously
Likelihood Likelihood Plot With n = 5, S = 0 MLE 4N
Mximum Likelihood Estimtion Two bsic stes ) Write down likelihood function L( x) f ( x ) b) Find vlue of ˆ tht mximizes L( x) In rincile, licble to ny roblem where likelihood function exists
MLEs Prmeter vlues tht mximize likelihood where observtions hve mximum robbility Finding MLEs is n otimiztion roblem How do MLEs comre to other estimtors?
Comring Estimtors How do MLEs rte in terms of Unbisedness Consistency Efficiency For review, see Grthwite, Jolliffe, Jones (995) Sttisticl Inference, Prentice Hll
Anlyticl Solutions Write out log-likelihood ( dt) ln L( dt) Clculte derivtive of likelihood d ( dt) d Find zeros for derivtive function
Informtion The second derivtive is lso extremely useful The seed t which log-likelihood decreses Provides n symtotic vrince for estimtes I V d dt d E I ) ( ˆ
Allele Frequency Estimtion When individul chromosomes re observed this is not so tricky Wht bout with genotyes? Wht bout with rent-offsring irs?
Coming u We will wlk through llele frequency estimtion in three distinct settings: Smles single chromosomes Smles of unrelted Individuls Smles of rents nd offsring
I. Single Alleles Observed Consider A smle of n chromosomes X of these re of tye Prmeter of interest is llele frequency L( n, X ) n X n X ( ) X
Some Notes The following two likelihoods re just s good: For ML estimtion, constnt fctors in likelihood don t mtter n i x x n X n X i i n x x x L X n n X L ) ( ),..., ; ( ) ( ), ; (
Anlytic Solution The log-likelihood ln L( The derivtive n, X ) ln n X X ln ( n X )ln( ) d ln L( d X ) X n X Find zero
Smles of Individul Chromosomes The nturl estimtor (where we count the roortion of sequences of rticulr tye) nd the MLE give identicl solutions Mximum likelihood rovides justifiction for using the nturl estimtor
II. Genotyes Observed Use nottion n ij to denote the number of individuls with genotye i / j Smle of n individuls Genotye Counts Genotye A A A A A A Totl Observed Counts n n n n=n +n +n Frequency.0
Allele Frequencies by Counting A nturl estimte for llele frequencies is to clculte the roortion of individuls crrying ech llele Allele Counts Genotye A A Totl Observed Counts n = n + n n = n + n n=n +n Frequency =n /n =n /n.0
MLE using genotye dt Consider smle such s... Genotye Counts Genotye A A A A A A Totl Observed Counts n n n n=n +n +n Frequency.0 The likelihood s function of llele frequencies is L( ; n) n n!! n! n! n n ² ² n q q
Which gives Log-likelihood nd its derivtive ln d d L n n n ln n n n n n ( ) ln( ) C Giving the MLE s ˆ n n n n n
Smles of Unrelted Individuls Agin, nturl estimtor (where we count the roortion of lleles of rticulr tye) nd the MLE give identicl solutions Mximum likelihood rovides justifiction for using the nturl estimtor
III. Prent-Offsring Pirs Child Prent A A A A A A A A 0 + A A 3 4 5 3 + 4 + 5 A A 0 6 7 6 + 7 + 3 + 4 + 6 5 + 7 N irs
Probbility for Ech Observtion Child Prent A A A A A A A A A A A A.0
Probbility for Ech Observtion Child Prent A A A A A A A A 3 0 A A A A 0 3.0
Which gives C B B C B 7 6 5 4 3 6 5 4 3 ˆ 3 3 ln L
Which gives C B B C B C B 7 6 5 4 3 6 5 4 3 3 7 6 5 4 3 3 ˆ 3 3 ) ln( ln constnt ln ln ln ln ln ln L
Smles of Prent Offsring-Pirs The nturl estimtor (where we count the roortion of lleles of rticulr tye) nd the MLE no longer give identicl solutions In this cse, we exect the MLE to be more ccurte
Comring Smling Strtegies We cn comre smling strtegies by clculting the informtion for ech one Which one to you exect to be most informtive? I V d dt d E I ) ( ˆ
How informtive is ech setting? Single chromosomes Vr( ) N q chromosomes Unrelted individuls Vr( ) N q individuls Prent offsring irs Vr( ) 3N q irs 4
Other Likelihoods Allele frequencies when individuls re Dignosed for Mendelin disorder Genotyed t two neighboring loci Phenotyed for the ABO blood grous Mny other interesting roblems but some hve no nlyticl solution
Tody s Summry Exmles of Mximum Likelihood Allele Frequency Estimtion Allele counts Genotye counts Pirs of Individuls
Tke home reding Excoffier nd Sltkin (995) Mol Biol Evol :9-97 Introduces the E-M lgorithm Widely used for mximizing likelihoods in genetic roblems
Proerties of Estimtors For Review
Unbisedness An estimtor is unbised if E( ˆ) bis( ˆ) E( ˆ) Multile unbised estimtors my exist Other roerties my be desirble
Consistency An estimtor is consistent if for ny P ˆ 0 s n Estimte converges to true vlue in robbility with incresing smle size
Men Squred Error MSE is defined s If MSE 0 s n then the estimtor must be consistent
Efficiency The reltive efficiency of two estimtors is the rtio of their vrinces vr( ˆ ) if then ˆ vr( ˆ ) ismore efficient Comrison only meningful for estimtors with equl bises
Sufficiency Consider Observtions X, X, X n Sttistic T(X, X, X n ) T is sufficient sttistic if it includes ll informtion bout rmeter in the smle Distribution of X i conditionl on T is indeendent of Posterior distribution of conditionl on T is indeendent of X i
Miniml Sufficient Sttistic There cn be mny lterntive sufficient sttistics. A sttistic is miniml sufficient sttistic if it cn be exressed s function of every other sufficient sttistic.
Tyicl Proerties of MLEs Bis Cn be bised or unbised Consistency Subject to regulrity conditions, MLEs re consistent Efficiency Tyiclly, MLEs re symtoticlly efficient estimtors Sufficiency Often, but not lwys Cox nd Hinkley, 974
Strtegies for Likelihood Otimiztion For Review
Generic Aroches Suitble for when nlyticl solutions re imrcticl Brcketing Simlex Method Newton-Rhson
Brcketing Find 3 oints such tht < b < c L( b ) > L( ) nd L( b ) > L( c ) Serch for mximum by Select tril oint in intervl Kee mximum nd flnking oints
Brcketing 6 5 3 4
The Simlex Method Clculte likelihoods t simlex vertices Geometric she with k+ corners E.g. tringle in k = dimensions At ech ste, move the high vertex in the direction of lower oints
The Simlex Method II high reflection Originl Simlex low contrction reflection nd exnsion multile contrction
One rmeter mximiztion Simle but inefficient roch Consider Prmeters = (,,, k ) Likelihood function L (; x) Mximize with resect to ech i in turn Cycle through rmeters
The Inefficiency
Steeest Descent Consider Prmeters = (,,, k ) Likelihood function L (; x) Score vector S d ln( L) d d ln( L) d,..., d ln( L) d k Find mximum long + S
Still inefficient Consecutive stes re erendiculr!
Locl Aroximtions to Log-Likelihood Function informtion mtrix the observed is ) ( ² vector the score is ) ( the loglikelihoodfunction is ) ( ln ) ( where ) ( ) ( ) ( ) ( ) ( the neighboorhoodof In i i i t i i i i d d L S θ I θ S θ θ θ θ I θ θ θ θ θ θ θ
Newton s Method S I θ θ 0 θ θ I S θ θ I θ θ θ θ S θ θ oint new tril nd get ) ( to zero... by setting its derivtive ) ( ) ( ) ( ) ( ) ( the roximtion Mximize i i i i t i i i
Fisher Scoring Use exected informtion mtrix insted of observed informtion: d ( ) E d insted of d ( dt) d Comred to Newton-Rhson: Converges fster when estimtes re oor. Converges slower when close to MLE.