Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69
density e log (density) Normal distribution: curvature of log of height x x 1 σ 2π e 1 2 (x µ) 2 σ 2 (constant stuff) 1 2 (x µ) 2 σ 2 Taking the logarithm of the height of the density curve of a normal distribution whose variance is σ 2, we see that it is a quadratic curve whose curvature is 1/σ 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.2/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.3/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.4/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.5/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.6/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.7/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.8/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.9/69
The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.10/69
urvatures and covariances of ML estimates ML estimates have covariances computable from curvatures of the expected log-likelihood: ] Var[ θ 1 / ( d 2 (log(l)) dθ 2 The same is true when there are multiple parameters: ] Var[ θ V 1 ) where ( 2 ) log(l) ij = θ i θ j Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.11/69
With large amounts of data, asymptotically When the true value of θ is θ 0, ˆθ θ 0 v N(0, 1) Since 1/v is the negative of the curvature of the log-likelihood: lnl(θ 0 ) = lnl(ˆθ) 1 2 (θ 0 ˆθ) 2 v so that twice the difference of log-likelihoods is the square of a normal: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.12/69
orresponding results for multiple parameters lnl(θ 0 ) lnl(θ 0 ) 1 2 (θ 0 θ) T (θ 0 θ) (θ θ 0 ) T (θ θ 0 ) χ 2 p so that the log-likelihood difference is: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 p When in the (true) null hypothesis θ 0 we have q of the p parameters constrained: ( ) 2 ln L(ˆθ) lnl(θ 0 ) χ 2 q Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.13/69
log-likelihood curve Likelihood curve in one parameter Ln (Likelihood) length of a branch in the tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.14/69
Its maximum likelihood estimate Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.15/69
The(approximate, asymptotic) confidence interval Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it Ln (Likelihood) 1/2 the value of a chi square with 1 d.f. significant at 95% 95% confidence interval length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.16/69
ontours of a log-likelihood surface in two dimensions length of branch 2 length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.17/69
ontours of a log-likelihood surface in two dimensions length of branch 2 ML length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.18/69
Log-likelihood-based confidence set for two variables shaded area is the joint confidence interval length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with two degrees of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.19/69
onfidence interval for one variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.20/69
onfidence interval for the other variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.21/69
ln L Likelihood ratio interval for a parameter 2620 2625 2630 2635 2640 5 10 20 50 100 200 Transition / transversion ratio Inferring the transition/transversion ratio for an 84 model with the 14-species primate mitochondria data set. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.22/69
LRTofamolecularclock howmanyparameters? onstraints for a clock v 1 v 2 v 4 v 5 v 1 = v 2 v 6 v 3 v v 4 = 5 v 8 v 1 + v v 6 = 3 v 7 v v 3 + 7 = v 4 + v 8 How does each equation constrain the branch lengths in the unrooted tree? What about the red equation? Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.23/69
Likelihood Ratio Test for a molecular clock Using the 7-species mitochondrial N data set (the great apes plus ovine and Mouse), we get with Ts/Tn = 30 and an 84 model: Tree ln L No clock 1372.77620 lock 1414.45053 ifference 41.67473 hi-square statistic: 2 41.675 = 83.35, with n 2 = 5 degrees of freedom highly significant. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.24/69
Model selection using the LRT Parameters 29 84, T estimated 28 81 84, T=2 27 26 K2P, T estimated 25 Jukes antor K2P, T=2 The problem with using likelihood ratio tests is the multiplicity of tests and the multiple routes to the same hypotheses. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.25/69
The kaike Information riterion ompare between hypotheses 2 lnl + 2p (the same as reducing the log-likelihood by the number of parameters) Number of Model ln L parameters I Jukes-antor 3068.29186 25 6186.58 K2P, R = 2.0 2953.15830 25 5956.32 K2P, R = 1.889 2952.94264 26 5957.89 81 2935.25430 28 5926.51 84, R = 2.0 2680.32982 28 5416.66 84, R = 28.95 2616.3981 29 5290.80 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.26/69
ln Likelihood anwetesttreesusingthelrt? 192 194 t 196 t 198 t 200 0.2 0.1 t 0.0 0.1 0.2 If so, how many degrees of freedom for the comparison of the two peaks? These are three-species clocklike trees (shown here plotted in a profile log-likelihood plot plotting the highest likelihood for each value of the interior branch length). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.27/69 t
The bootstrap estimate of θ (unknown) true value of θ 150 data points empirical distribution of sample (unknown) true distribution ootstrap replicates (each 150 draws) istribution of estimates of parameters n example with mixed normal distributions. raw from the empirical distribution 150 times if there are 150 data points. With replacement! Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.28/69
The bootstrap for phylogenies Original ata sites sequences ootstrap sample #1 sites stimate of the tree sequences sample same number of sites, with replacement ootstrap sample #2 sequences sites sample same number of sites, with replacement ootstrap estimate of the tree, #1 (and so on) ootstrap estimate of the tree, #2 rawing columns of the data matrix, with replacement. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.29/69
partitiondefinedbyabranchinthefirsttree Trees: How many times each partition of species is found: 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.30/69
nother partition from the first tree Trees: How many times each partition of species is found: 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.31/69
The third partition from that tree Trees: How many times each partition of species is found: 1 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.32/69
Partitions from the second tree Trees: How many times each partition of species is found: 1 2 1 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.33/69
Partitions from the third tree Trees: How many times each partition of species is found: 2 2 1 1 1 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.34/69
Partitions from the fourth tree Trees: How many times each partition of species is found: 3 2 1 1 1 2 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.35/69
Partitions from the fifth tree Trees: How many times each partition of species is found: 3 3 1 1 1 2 1 3 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.36/69
The table of partitions from all trees Trees: How many times each partition of species is found: 3 3 1 1 1 2 1 3 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.37/69
The majority-rule consensus tree Trees: How many times each partition of species is found: 3 3 1 1 1 2 1 3 60 60 60 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.38/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible So there must be a tree that contains them all. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
The MR tree with 14-species primate mtn data ovine Mouse Squir Monk 35 42 72 74 84 80 himp Human Gorilla Orang 77 49 100 99 99 Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.40/69
Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests" problem means P values are overstated 4. P values are biased (too conservative) 5. ootstrapping does not correct biases in phylogeny methods Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.41/69
Other resampling methods elete-half jackknife. Sample a random 50% of the sites, without replacement. elete-1/e jackknife (arris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coefficient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Künsch, 1989) lock-bootstrapping sample n/b blocks of b adjacent sites. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.42/69
With the delete-half jackknife 50 69 72 84 80 ovine Mouse Squir Monk himp Human Gorilla Orang 32 80 100 98 99 Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq 59 Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.43/69
ootstrap versus jackknife in a simple case xact computation of the effects of deletion fraction for the jackknife n 1 n 2 n characters (suppose 1 and 2 are conflicting groups) m 1 m 2 n(1 δ) characters We can compute for various n s the probabilities of getting more evidence for group 1 than for group 2 typical result is for n 1 = 10, n 2 = 8, n = 100 : ootstrap Jackknife δ = 1/2 δ = 1/e Prob( m >m ) 1 2 0.6384 0.5923 0.6441 Prob( m >m ) 1 2 0.7230 0.7587 0.8040 Prob( m >m ) 1 2 + 1 Prob( m =m ) 2 1 2 0.6807 0.6755 0.7240 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.44/69
Probability of a character being omitted from a bootstrap N (1 1/N) N 1 0 11 0.35049 25 0.36040 100 0.36603 2 0.25 12 0.35200 30 0.36166 150 0.36665 3 0.29630 13 0.35326 35 0.36256 200 0.36696 4 0.31641 14 0.35434 40 0.36323 250 0.36714 5 0.32768 15 0.35526 45 0.36375 300 0.36727 6 0.33490 16 0.35607 50 0.36417 500 0.36751 7 0.33992 17 0.35679 60 0.36479 1000 0.36770 8 0.34361 18 0.35742 70 0.36524 0.36788 9 0.34644 19 0.35798 80 0.36557 10 0.34868 20 0.35849 90 0.36583 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.45/69
toyexampletoexaminebiasof Pvalues True value of mean istribution of individual values of x True distribution of sample means stimated distributions of sample means "Topology" II 0 "Topology" I ssuming a normal distribution, trying to infer whether the mean is above 0, when the mean is unknown and the variance known to be 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.46/69
iasinthepvalues note that the true P is more extreme than the average of the P s P estimate of the "phylogeny" topology II 0 topology I the true mean Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.47/69
HowmuchbiasinthePvalues? 1.0 0.8 verage P 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 True P Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.48/69
iasinthepvalueswithdifferentpriors 1.00 Probability of correct topology 0.80 0.60 0.40 0.20 n σ 2 = 2.0 1.0 0.1 0.00 0.00 0.50 1.00 P for expectation of µ Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.49/69
The parametric bootstrap computer simulation estimation of tree data set #1 T 1 estimate of tree data set #2 T 2 original data data set #3 T 3 data set #100 T 100 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.50/69
The parametric bootstrap with the primates data ovine 83 98 78 81 93 98 96 Lemur Tarsier Squirrel Monkey Mouse Jp Macacque 98 arbary Mac 82 rab ating Mac Rhesus Mac Gorilla 95 himp 96 Human Orang Gibbon Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.51/69
Goldman s test using simulation (related to the "parametric bootstrap") no clock tree T log likelihood l data 2 ( l l ) c clock T c l c simulating data sets... data data data... data estimating clocklike and nonclocklike trees from each data set...... 2 ( l l ) 2 ( l l ) 2 ( l l ) 2 ( l l c c c c ) 2 ( l l c ) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.52/69
n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.53/69
n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.54/69
n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.55/69
n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.56/69
rownian motion along a tree x 1 x x x 1 8 2 v x x 1 2 8 v 2 x 8 x x 8 9 v 8 x 3 x x 1 3 8 9 v 3 x 9 x x 9 0 v 9 x 6 x x 5 7 x x 6 10 x x v 7 10 x x x 6 v7 4 5 11 x 10 v x x 5 x x 10 11 4 12 v 10 v x 4 11 x x 11 12 v x 11 12 x x 12 0 v 12 x 0 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.57/69
istribution of tips on a tree under rownian Motion root v 3 0 3 v 1 v 2 1 2 Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
istribution of tips on a tree under rownian Motion root v 3 0 3 v 1 v 2 1 2 Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
istribution of tips on a tree under rownian Motion root v 3 0 3 v 1 v 2 1 2 Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
istribution of tips on a tree under rownian Motion root v 3 0 3 v 1 v 2 1 2 Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
istribution of tips on a tree under rownian Motion root v 3 0 3 v 1 v 2 1 2 Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. In fact the covariance of the values at tip 1 and tip 2 is the variance of the shared term that is the same in both of them, so it is v 3. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
ovariances of species on the tree 2 v 1 + v 8 + v 9 v 8 + v 9 v 9 0 0 0 0 v 8 + v 9 v 2 + v 8 + v 9 v 9 0 0 0 0 v 9 v 9 v 3 + v 9 0 0 0 0 0 0 0 v 4 + v 12 v 12 v 12 v 12 0 0 0 v 12 v 5 + v 11 + v 12 v 11 + v 12 v 11 + v 12 6 0 0 0 v 4 12 v 11 + v 12 v 6 + v 10 + v 11 + v 12 v 10 + v 11 + v 12 3 7 5 0 0 0 v 12 v 11 + v 12 v 10 + v 11 + v 12 v 7 + v 10 + v 11 + v 12 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.59/69
ovariances are of form a b c 0 0 0 0 b d c 0 0 0 0 c c e 0 0 0 0 0 0 0 f g g g 0 0 0 g h i i 0 0 0 g i j k 0 0 0 g i k l Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.60/69
Likelihood under rownian motion with two species x 1 x 2 v 1 v 2 x 0 f ( x; µ, σ 2) = ) 1 ( σ 2π exp (x µ)2 2σ 2 L = p i=1 ( 1 (2π) exp 1 v 1 v 2 2 [ (x 1i x 0i ) 2 + (x 2i x 0i ) 2 v 1 v 2 ]) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.61/69
What happens if we estimate means and branch lengths? o we get the right answer if we estimate for each coordinate (each character) the value at the root and the branch lengths v 1 and v 2? ctually no. elow, we will do this by finding values of these that maximize the likelihood, and show that the likelihood becomes infinite if either v 1 or v 2 approaches zero. ven if we constrain there to be a clock, so v 1 = v 2 and look only at their sum v 1 + v 2 this turns out to be half as big as the truth, even with an infinite number of characters. Why? The problem seems to be that we are estimating too many parameters. There is one parameter (the root value) for each character. So the ratio of data to parameters does not rise to infinity as we increase the number of parameters. In circumstances like this, likelihood methods can misbehave. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.62/69
The solution: don t infer ancestors; use RML We can eliminate these problems by: 1. o not infer the states of the interior nodes. 2. Use only the relative positions of the tips. This eliminates the starting state at the root. It is RML, a variant of ML that loses almost no statistical power. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.63/69
Minimizing for each character i so: and then: Q = (x 1i x 0i ) 2 v 1 + (x 2i x 0i ) 2 v 2 dq dx 0i = 2 (x 1i x 0i ) v 1 2 (x 2i x 0i ) v 2 = 0 x 0i = 1 v 1 x 1i + 1 v 2 x 2i 1 v 1 + 1 v 2 So that we have a maximum likelihood estimate of the starting value x 0i for each character. The result is that Q = (x 1i x 2i ) 2 v 1 + v 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.64/69
Likelihood after estimating initial coordinates Substituting in our estimates of x 0i, we end up with L = ( 1 (2π) p (v 1 v 2 ) exp 1 1 2 p 2 p i=1 ) (x 1i x 2i ) 2 v 1 + v 2 and this finally turns into: lnl = p ln(2π) 1 2 p ln(v 1v 2 ) 1 2 p i=1 (x 1i x 2i ) 2 v 1 + v 2 This actually goes to infinity as either v 1 or v 2 goes to zero! This is related to the problem that dwards and avalli-sforza had with their maximum likelihood method in 1964. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.65/69
Ifthereisaclock... If instead we constrain v 1 = v 2 because assume a clock: ln L = K p ln(v 1 + v 2 ) 1 2 2 (v 1 + v 2 ) which leads to v 1 = v 2 = 2 /(4p) (which is half as big as it should be!) The number of parameters being estimated is p + 1, which rises as we consider more characters. The fact that the ratio of data to parameters does not rise without limit is the reason why likelihood misbehaves in this case. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.66/69
The difference between ML and RML Information we use for ML inference: species 2 species 1 species 4 species 3 1.0 2.0 3.0 4.0 Information we use for RML inference: species 2 species 1 species 4 species 3 1.0+x 2.0+x 3.0+x 4.0+x oes it matter that we don t know x? It makes it unnecessary to estimate the starting value x 0, and that eliminates p parameters. It means that the ratio of data to parameters does then rise as we add characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.67/69
Using only differences between populations(rml) We assume that we have observed only the differences x 1i x 2i, and not the actual locations on the phenotype scale. Then ( ) p 1 L = exp 1 (x 1i x 2i ) 2 2π v1 + v 2 2 v 1 + v 2 i=1 lnl = K p 2 ln(v 1 + v 2 ) + 1 2 (v 1 + v 2 ) n (x i1 x i2 ) 2 i=1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.68/69
Likelihood with two species using RML lnl = K p 2 ln (v 1 + v 2 ) + 2 2 (v 1 + v 2 ) lnl = K p 2 ln (v T) + 2 2 v T v T = 2 /p The number of parameters being estimated is 1 (it is the sum v 1 + v 2 ). The number of parameters does not rise as we consider more characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.69/69
Pruning a tree in the rownian motion case x 1 x 2 v 1 v 2 x 1 x 2 x 3 x 4 + δ = v v 1 2 v + v 1 2 v 1 v 2 v 5 v 3 v 4 x 12 x 3 x 4 v x + v x 2 1 1 2 x = 12 v + v 1 2 v 6 δ v 3 v 4 v 5 v 6 The likelihood for the tree is the product of the linkelihoods for these two trees. y repeatedly applying this we can decompose the tree into n 1 independent two-species trees. Getting their likelihoods is easy. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.70/69