Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies

Size: px
Start display at page:

Download "Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies"

Transcription

1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69

2 density e log (density) Normal distribution: curvature of log of height x x 1 σ 2π e 1 2 (x µ) 2 σ 2 (constant stuff) 1 2 (x µ) 2 σ 2 Taking the logarithm of the height of the density curve of a normal distribution whose variance is σ 2, we see that it is a quadratic curve whose curvature is 1/σ 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.2/69

3 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.3/69

4 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.4/69

5 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.5/69

6 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.6/69

7 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.7/69

8 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.8/69

9 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.9/69

10 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.10/69

11 urvatures and covariances of ML estimates ML estimates have covariances computable from curvatures of the expected log-likelihood: ] Var[ θ 1 / ( d 2 (log(l)) dθ 2 The same is true when there are multiple parameters: ] Var[ θ V 1 ) where ( 2 ) log(l) ij = θ i θ j Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.11/69

12 With large amounts of data, asymptotically When the true value of θ is θ 0, ˆθ θ 0 v N(0, 1) Since 1/v is the negative of the curvature of the log-likelihood: lnl(θ 0 ) = lnl(ˆθ) 1 2 (θ 0 ˆθ) 2 v so that twice the difference of log-likelihoods is the square of a normal: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.12/69

13 orresponding results for multiple parameters lnl(θ 0 ) lnl(θ 0 ) 1 2 (θ 0 θ) T (θ 0 θ) (θ θ 0 ) T (θ θ 0 ) χ 2 p so that the log-likelihood difference is: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 p When in the (true) null hypothesis θ 0 we have q of the p parameters constrained: ( ) 2 ln L(ˆθ) lnl(θ 0 ) χ 2 q Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.13/69

14 log-likelihood curve Likelihood curve in one parameter Ln (Likelihood) length of a branch in the tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.14/69

15 Its maximum likelihood estimate Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.15/69

16 The(approximate, asymptotic) confidence interval Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it Ln (Likelihood) 1/2 the value of a chi square with 1 d.f. significant at 95% 95% confidence interval length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.16/69

17 ontours of a log-likelihood surface in two dimensions length of branch 2 length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.17/69

18 ontours of a log-likelihood surface in two dimensions length of branch 2 ML length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.18/69

19 Log-likelihood-based confidence set for two variables shaded area is the joint confidence interval length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with two degrees of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.19/69

20 onfidence interval for one variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.20/69

21 onfidence interval for the other variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.21/69

22 ln L Likelihood ratio interval for a parameter Transition / transversion ratio Inferring the transition/transversion ratio for an 84 model with the 14-species primate mitochondria data set. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.22/69

23 LRTofamolecularclock howmanyparameters? onstraints for a clock v 1 v 2 v 4 v 5 v 1 = v 2 v 6 v 3 v v 4 = 5 v 8 v 1 + v v 6 = 3 v 7 v v = v 4 + v 8 How does each equation constrain the branch lengths in the unrooted tree? What about the red equation? Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.23/69

24 Likelihood Ratio Test for a molecular clock Using the 7-species mitochondrial N data set (the great apes plus ovine and Mouse), we get with Ts/Tn = 30 and an 84 model: Tree ln L No clock lock ifference hi-square statistic: = 83.35, with n 2 = 5 degrees of freedom highly significant. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.24/69

25 Model selection using the LRT Parameters 29 84, T estimated , T= K2P, T estimated 25 Jukes antor K2P, T=2 The problem with using likelihood ratio tests is the multiplicity of tests and the multiple routes to the same hypotheses. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.25/69

26 The kaike Information riterion ompare between hypotheses 2 lnl + 2p (the same as reducing the log-likelihood by the number of parameters) Number of Model ln L parameters I Jukes-antor K2P, R = K2P, R = , R = , R = Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.26/69

27 ln Likelihood anwetesttreesusingthelrt? t 196 t 198 t t If so, how many degrees of freedom for the comparison of the two peaks? These are three-species clocklike trees (shown here plotted in a profile log-likelihood plot plotting the highest likelihood for each value of the interior branch length). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.27/69 t

28 The bootstrap estimate of θ (unknown) true value of θ 150 data points empirical distribution of sample (unknown) true distribution ootstrap replicates (each 150 draws) istribution of estimates of parameters n example with mixed normal distributions. raw from the empirical distribution 150 times if there are 150 data points. With replacement! Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.28/69

29 The bootstrap for phylogenies Original ata sites sequences ootstrap sample #1 sites stimate of the tree sequences sample same number of sites, with replacement ootstrap sample #2 sequences sites sample same number of sites, with replacement ootstrap estimate of the tree, #1 (and so on) ootstrap estimate of the tree, #2 rawing columns of the data matrix, with replacement. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.29/69

30 partitiondefinedbyabranchinthefirsttree Trees: How many times each partition of species is found: 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.30/69

31 nother partition from the first tree Trees: How many times each partition of species is found: 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.31/69

32 The third partition from that tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.32/69

33 Partitions from the second tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.33/69

34 Partitions from the third tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.34/69

35 Partitions from the fourth tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.35/69

36 Partitions from the fifth tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.36/69

37 The table of partitions from all trees Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.37/69

38 The majority-rule consensus tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.38/69

39 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

40 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

41 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

42 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

43 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

44 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must co-occur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible So there must be a tree that contains them all. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69

45 The MR tree with 14-species primate mtn data ovine Mouse Squir Monk himp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.40/69

46 Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests" problem means P values are overstated 4. P values are biased (too conservative) 5. ootstrapping does not correct biases in phylogeny methods Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.41/69

47 Other resampling methods elete-half jackknife. Sample a random 50% of the sites, without replacement. elete-1/e jackknife (arris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coefficient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Künsch, 1989) lock-bootstrapping sample n/b blocks of b adjacent sites. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.42/69

48 With the delete-half jackknife ovine Mouse Squir Monk himp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq 59 Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.43/69

49 ootstrap versus jackknife in a simple case xact computation of the effects of deletion fraction for the jackknife n 1 n 2 n characters (suppose 1 and 2 are conflicting groups) m 1 m 2 n(1 δ) characters We can compute for various n s the probabilities of getting more evidence for group 1 than for group 2 typical result is for n 1 = 10, n 2 = 8, n = 100 : ootstrap Jackknife δ = 1/2 δ = 1/e Prob( m >m ) Prob( m >m ) Prob( m >m ) Prob( m =m ) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.44/69

50 Probability of a character being omitted from a bootstrap N (1 1/N) N Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.45/69

51 toyexampletoexaminebiasof Pvalues True value of mean istribution of individual values of x True distribution of sample means stimated distributions of sample means "Topology" II 0 "Topology" I ssuming a normal distribution, trying to infer whether the mean is above 0, when the mean is unknown and the variance known to be 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.46/69

52 iasinthepvalues note that the true P is more extreme than the average of the P s P estimate of the "phylogeny" topology II 0 topology I the true mean Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.47/69

53 HowmuchbiasinthePvalues? verage P True P Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.48/69

54 iasinthepvalueswithdifferentpriors 1.00 Probability of correct topology n σ 2 = P for expectation of µ Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.49/69

55 The parametric bootstrap computer simulation estimation of tree data set #1 T 1 estimate of tree data set #2 T 2 original data data set #3 T 3 data set #100 T 100 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.50/69

56 The parametric bootstrap with the primates data ovine Lemur Tarsier Squirrel Monkey Mouse Jp Macacque 98 arbary Mac 82 rab ating Mac Rhesus Mac Gorilla 95 himp 96 Human Orang Gibbon Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.51/69

57 Goldman s test using simulation (related to the "parametric bootstrap") no clock tree T log likelihood l data 2 ( l l ) c clock T c l c simulating data sets... data data data... data estimating clocklike and nonclocklike trees from each data set ( l l ) 2 ( l l ) 2 ( l l ) 2 ( l l c c c c ) 2 ( l l c ) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.52/69

58 n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.53/69

59 n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.54/69

60 n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.55/69

61 n outcome of rownian motion on a 5-species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.56/69

62 rownian motion along a tree x 1 x x x v x x v 2 x 8 x x 8 9 v 8 x 3 x x v 3 x 9 x x 9 0 v 9 x 6 x x 5 7 x x 6 10 x x v 7 10 x x x 6 v x 10 v x x 5 x x v 10 v x 4 11 x x v x x x 12 0 v 12 x 0 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.57/69

63 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69

64 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69

65 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69

66 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69

67 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. In fact the covariance of the values at tip 1 and tip 2 is the variance of the shared term that is the same in both of them, so it is v 3. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69

68 ovariances of species on the tree 2 v 1 + v 8 + v 9 v 8 + v 9 v v 8 + v 9 v 2 + v 8 + v 9 v v 9 v 9 v 3 + v v 4 + v 12 v 12 v 12 v v 12 v 5 + v 11 + v 12 v 11 + v 12 v 11 + v v 4 12 v 11 + v 12 v 6 + v 10 + v 11 + v 12 v 10 + v 11 + v v 12 v 11 + v 12 v 10 + v 11 + v 12 v 7 + v 10 + v 11 + v 12 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.59/69

69 ovariances are of form a b c b d c c c e f g g g g h i i g i j k g i k l Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.60/69

70 Likelihood under rownian motion with two species x 1 x 2 v 1 v 2 x 0 f ( x; µ, σ 2) = ) 1 ( σ 2π exp (x µ)2 2σ 2 L = p i=1 ( 1 (2π) exp 1 v 1 v 2 2 [ (x 1i x 0i ) 2 + (x 2i x 0i ) 2 v 1 v 2 ]) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.61/69

71 What happens if we estimate means and branch lengths? o we get the right answer if we estimate for each coordinate (each character) the value at the root and the branch lengths v 1 and v 2? ctually no. elow, we will do this by finding values of these that maximize the likelihood, and show that the likelihood becomes infinite if either v 1 or v 2 approaches zero. ven if we constrain there to be a clock, so v 1 = v 2 and look only at their sum v 1 + v 2 this turns out to be half as big as the truth, even with an infinite number of characters. Why? The problem seems to be that we are estimating too many parameters. There is one parameter (the root value) for each character. So the ratio of data to parameters does not rise to infinity as we increase the number of parameters. In circumstances like this, likelihood methods can misbehave. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.62/69

72 The solution: don t infer ancestors; use RML We can eliminate these problems by: 1. o not infer the states of the interior nodes. 2. Use only the relative positions of the tips. This eliminates the starting state at the root. It is RML, a variant of ML that loses almost no statistical power. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.63/69

73 Minimizing for each character i so: and then: Q = (x 1i x 0i ) 2 v 1 + (x 2i x 0i ) 2 v 2 dq dx 0i = 2 (x 1i x 0i ) v 1 2 (x 2i x 0i ) v 2 = 0 x 0i = 1 v 1 x 1i + 1 v 2 x 2i 1 v v 2 So that we have a maximum likelihood estimate of the starting value x 0i for each character. The result is that Q = (x 1i x 2i ) 2 v 1 + v 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.64/69

74 Likelihood after estimating initial coordinates Substituting in our estimates of x 0i, we end up with L = ( 1 (2π) p (v 1 v 2 ) exp p 2 p i=1 ) (x 1i x 2i ) 2 v 1 + v 2 and this finally turns into: lnl = p ln(2π) 1 2 p ln(v 1v 2 ) 1 2 p i=1 (x 1i x 2i ) 2 v 1 + v 2 This actually goes to infinity as either v 1 or v 2 goes to zero! This is related to the problem that dwards and avalli-sforza had with their maximum likelihood method in Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.65/69

75 Ifthereisaclock... If instead we constrain v 1 = v 2 because assume a clock: ln L = K p ln(v 1 + v 2 ) (v 1 + v 2 ) which leads to v 1 = v 2 = 2 /(4p) (which is half as big as it should be!) The number of parameters being estimated is p + 1, which rises as we consider more characters. The fact that the ratio of data to parameters does not rise without limit is the reason why likelihood misbehaves in this case. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.66/69

76 The difference between ML and RML Information we use for ML inference: species 2 species 1 species 4 species Information we use for RML inference: species 2 species 1 species 4 species x 2.0+x 3.0+x 4.0+x oes it matter that we don t know x? It makes it unnecessary to estimate the starting value x 0, and that eliminates p parameters. It means that the ratio of data to parameters does then rise as we add characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.67/69

77 Using only differences between populations(rml) We assume that we have observed only the differences x 1i x 2i, and not the actual locations on the phenotype scale. Then ( ) p 1 L = exp 1 (x 1i x 2i ) 2 2π v1 + v 2 2 v 1 + v 2 i=1 lnl = K p 2 ln(v 1 + v 2 ) (v 1 + v 2 ) n (x i1 x i2 ) 2 i=1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.68/69

78 Likelihood with two species using RML lnl = K p 2 ln (v 1 + v 2 ) (v 1 + v 2 ) lnl = K p 2 ln (v T) v T v T = 2 /p The number of parameters being estimated is 1 (it is the sum v 1 + v 2 ). The number of parameters does not rise as we consider more characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.69/69

79 Pruning a tree in the rownian motion case x 1 x 2 v 1 v 2 x 1 x 2 x 3 x 4 + δ = v v 1 2 v + v 1 2 v 1 v 2 v 5 v 3 v 4 x 12 x 3 x 4 v x + v x x = 12 v + v 1 2 v 6 δ v 3 v 4 v 5 v 6 The likelihood for the tree is the product of the linkelihoods for these two trees. y repeatedly applying this we can decompose the tree into n 1 independent two-species trees. Getting their likelihoods is easy. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.70/69

Week 7: Bayesian inference, Testing trees, Bootstraps

Week 7: Bayesian inference, Testing trees, Bootstraps Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob

More information

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval

Bootstraps and testing trees. Alog-likelihoodcurveanditsconfidenceinterval ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 log-likelihoodcurveanditsconfidenceinterval 2620 2625 ln L

More information

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30

Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A non-phylogeny

More information

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )

= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 ) 4 1 1/3 1 = 4 3 1 4 1/3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given

More information

Week 5: Distance methods, DNA and protein models

Week 5: Distance methods, DNA and protein models Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31

Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s

More information

Concepts and Methods in Molecular Divergence Time Estimation

Concepts and Methods in Molecular Divergence Time Estimation Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?

Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

Interval Estimation III: Fisher's Information & Bootstrapping

Interval Estimation III: Fisher's Information & Bootstrapping Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments

1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments /4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Topic 19 Extensions on the Likelihood Ratio

Topic 19 Extensions on the Likelihood Ratio Topic 19 Extensions on the Likelihood Ratio Two-Sided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive. Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

ST495: Survival Analysis: Hypothesis testing and confidence intervals

ST495: Survival Analysis: Hypothesis testing and confidence intervals ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2011 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.) - comparing

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

BTRY 4830/6830: Quantitative Genomics and Genetics

BTRY 4830/6830: Quantitative Genomics and Genetics BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:40-9:55 Announcements

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Hypothesis Testing - Frequentist

Hypothesis Testing - Frequentist Frequentist Hypothesis Testing - Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary

Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Some New Aspects of Dose-Response Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

User s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK

User s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK User s Manual for Continuous (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK email: m.pagel@rdg.ac.uk (www.ams.reading.ac.uk/zoology/pagel/)

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

The Surprising Conditional Adventures of the Bootstrap

The Surprising Conditional Adventures of the Bootstrap The Surprising Conditional Adventures of the Bootstrap G. Alastair Young Department of Mathematics Imperial College London Inaugural Lecture, 13 March 2006 Acknowledgements Early influences: Eric Renshaw,

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

SPRING 2007 EXAM C SOLUTIONS

SPRING 2007 EXAM C SOLUTIONS SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39

Gov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39 Gov 2001: Section 4 February 20, 2013 Gov 2001: Section 4 February 20, 2013 1 / 39 Outline 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We

More information

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD

PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of

More information

arxiv: v1 [math.st] 22 Jun 2018

arxiv: v1 [math.st] 22 Jun 2018 Hypothesis testing near singularities and boundaries arxiv:1806.08458v1 [math.st] Jun 018 Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes Department of Mathematics & Statistics University

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Chapter 10: Inferences based on two samples

Chapter 10: Inferences based on two samples November 16 th, 2017 Overview Week 1 Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 1: Descriptive statistics Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter 8: Confidence

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

Maximum Likelihood (ML) Estimation

Maximum Likelihood (ML) Estimation Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015 STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information