Week 8: Testing trees, Bootstraps, jackknifes, gene frequencies


 Samson Fox
 2 years ago
 Views:
Transcription
1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies Genome 570 ebruary, 2016 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.1/69
2 density e log (density) Normal distribution: curvature of log of height x x 1 σ 2π e 1 2 (x µ) 2 σ 2 (constant stuff) 1 2 (x µ) 2 σ 2 Taking the logarithm of the height of the density curve of a normal distribution whose variance is σ 2, we see that it is a quadratic curve whose curvature is 1/σ 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.2/69
3 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.3/69
4 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.4/69
5 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.5/69
6 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.6/69
7 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.7/69
8 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.8/69
9 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.9/69
10 The likelihood curve is nearly a normal distribution for large amounts of data θ θ the value for our data set θ from t(x), the "sufficient statistic" If we have large amounts of data, the values of parameters we need to try are all very similar, and the shape of the distribution (which is nearly normal) will not be too different for these values. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.10/69
11 urvatures and covariances of ML estimates ML estimates have covariances computable from curvatures of the expected loglikelihood: ] Var[ θ 1 / ( d 2 (log(l)) dθ 2 The same is true when there are multiple parameters: ] Var[ θ V 1 ) where ( 2 ) log(l) ij = θ i θ j Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.11/69
12 With large amounts of data, asymptotically When the true value of θ is θ 0, ˆθ θ 0 v N(0, 1) Since 1/v is the negative of the curvature of the loglikelihood: lnl(θ 0 ) = lnl(ˆθ) 1 2 (θ 0 ˆθ) 2 v so that twice the difference of loglikelihoods is the square of a normal: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.12/69
13 orresponding results for multiple parameters lnl(θ 0 ) lnl(θ 0 ) 1 2 (θ 0 θ) T (θ 0 θ) (θ θ 0 ) T (θ θ 0 ) χ 2 p so that the loglikelihood difference is: 2 ( ) ln L(ˆθ) lnl(θ 0 ) χ 2 p When in the (true) null hypothesis θ 0 we have q of the p parameters constrained: ( ) 2 ln L(ˆθ) lnl(θ 0 ) χ 2 q Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.13/69
14 loglikelihood curve Likelihood curve in one parameter Ln (Likelihood) length of a branch in the tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.14/69
15 Its maximum likelihood estimate Likelihood curve in one parameter and the maximum likelihood estimate Ln (Likelihood) length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.15/69
16 The(approximate, asymptotic) confidence interval Likelihood curve in one parameter and the maximum likelihood estimate and confidence interval derived from it Ln (Likelihood) 1/2 the value of a chi square with 1 d.f. significant at 95% 95% confidence interval length of a branch in the tree maximum likelihood estimate (ML) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.16/69
17 ontours of a loglikelihood surface in two dimensions length of branch 2 length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.17/69
18 ontours of a loglikelihood surface in two dimensions length of branch 2 ML length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.18/69
19 Loglikelihoodbased confidence set for two variables shaded area is the joint confidence interval length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with two degrees of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.19/69
20 onfidence interval for one variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.20/69
21 onfidence interval for the other variable length of branch 2 height of this contour is less than at the peak by an amount equal to 1/2 the chi square value with one degree of freedom which is significant at 95% level length of branch 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.21/69
22 ln L Likelihood ratio interval for a parameter Transition / transversion ratio Inferring the transition/transversion ratio for an 84 model with the 14species primate mitochondria data set. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.22/69
23 LRTofamolecularclock howmanyparameters? onstraints for a clock v 1 v 2 v 4 v 5 v 1 = v 2 v 6 v 3 v v 4 = 5 v 8 v 1 + v v 6 = 3 v 7 v v = v 4 + v 8 How does each equation constrain the branch lengths in the unrooted tree? What about the red equation? Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.23/69
24 Likelihood Ratio Test for a molecular clock Using the 7species mitochondrial N data set (the great apes plus ovine and Mouse), we get with Ts/Tn = 30 and an 84 model: Tree ln L No clock lock ifference hisquare statistic: = 83.35, with n 2 = 5 degrees of freedom highly significant. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.24/69
25 Model selection using the LRT Parameters 29 84, T estimated , T= K2P, T estimated 25 Jukes antor K2P, T=2 The problem with using likelihood ratio tests is the multiplicity of tests and the multiple routes to the same hypotheses. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.25/69
26 The kaike Information riterion ompare between hypotheses 2 lnl + 2p (the same as reducing the loglikelihood by the number of parameters) Number of Model ln L parameters I Jukesantor K2P, R = K2P, R = , R = , R = Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.26/69
27 ln Likelihood anwetesttreesusingthelrt? t 196 t 198 t t If so, how many degrees of freedom for the comparison of the two peaks? These are threespecies clocklike trees (shown here plotted in a profile loglikelihood plot plotting the highest likelihood for each value of the interior branch length). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.27/69 t
28 The bootstrap estimate of θ (unknown) true value of θ 150 data points empirical distribution of sample (unknown) true distribution ootstrap replicates (each 150 draws) istribution of estimates of parameters n example with mixed normal distributions. raw from the empirical distribution 150 times if there are 150 data points. With replacement! Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.28/69
29 The bootstrap for phylogenies Original ata sites sequences ootstrap sample #1 sites stimate of the tree sequences sample same number of sites, with replacement ootstrap sample #2 sequences sites sample same number of sites, with replacement ootstrap estimate of the tree, #1 (and so on) ootstrap estimate of the tree, #2 rawing columns of the data matrix, with replacement. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.29/69
30 partitiondefinedbyabranchinthefirsttree Trees: How many times each partition of species is found: 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.30/69
31 nother partition from the first tree Trees: How many times each partition of species is found: 1 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.31/69
32 The third partition from that tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.32/69
33 Partitions from the second tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.33/69
34 Partitions from the third tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.34/69
35 Partitions from the fourth tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.35/69
36 Partitions from the fifth tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.36/69
37 The table of partitions from all trees Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.37/69
38 The majorityrule consensus tree Trees: How many times each partition of species is found: Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.38/69
39 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
40 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
41 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must cooccur in at least one tree. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
42 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must cooccur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
43 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must cooccur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
44 WhywilltheMRconsensusgiveatree? Suppose that for each partition in a tree we construct a (fake) morphological character with 0 for one set in the partition, 1 for the other. Such a character is compatible with a tree if (and only if) the tree contains that partition. If two of these characters both occur in more than 50% of the trees, they must cooccur in at least one tree. Thus the set of these characters that occur in more then 50% of the trees are all pairwise compatible. y the Pairwise ompatibility Theorem (remember that?) they must then be jointly compatible So there must be a tree that contains them all. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.39/69
45 The MR tree with 14species primate mtn data ovine Mouse Squir Monk himp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.40/69
46 Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multipletests" problem means P values are overstated 4. P values are biased (too conservative) 5. ootstrapping does not correct biases in phylogeny methods Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.41/69
47 Other resampling methods eletehalf jackknife. Sample a random 50% of the sites, without replacement. elete1/e jackknife (arris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coefficient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Künsch, 1989) lockbootstrapping sample n/b blocks of b adjacent sites. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.42/69
48 With the deletehalf jackknife ovine Mouse Squir Monk himp Human Gorilla Orang Gibbon Rhesus Mac Jpn Macaq rab.mac arbmacaq 59 Tarsier Lemur Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.43/69
49 ootstrap versus jackknife in a simple case xact computation of the effects of deletion fraction for the jackknife n 1 n 2 n characters (suppose 1 and 2 are conflicting groups) m 1 m 2 n(1 δ) characters We can compute for various n s the probabilities of getting more evidence for group 1 than for group 2 typical result is for n 1 = 10, n 2 = 8, n = 100 : ootstrap Jackknife δ = 1/2 δ = 1/e Prob( m >m ) Prob( m >m ) Prob( m >m ) Prob( m =m ) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.44/69
50 Probability of a character being omitted from a bootstrap N (1 1/N) N Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.45/69
51 toyexampletoexaminebiasof Pvalues True value of mean istribution of individual values of x True distribution of sample means stimated distributions of sample means "Topology" II 0 "Topology" I ssuming a normal distribution, trying to infer whether the mean is above 0, when the mean is unknown and the variance known to be 1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.46/69
52 iasinthepvalues note that the true P is more extreme than the average of the P s P estimate of the "phylogeny" topology II 0 topology I the true mean Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.47/69
53 HowmuchbiasinthePvalues? verage P True P Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.48/69
54 iasinthepvalueswithdifferentpriors 1.00 Probability of correct topology n σ 2 = P for expectation of µ Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.49/69
55 The parametric bootstrap computer simulation estimation of tree data set #1 T 1 estimate of tree data set #2 T 2 original data data set #3 T 3 data set #100 T 100 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.50/69
56 The parametric bootstrap with the primates data ovine Lemur Tarsier Squirrel Monkey Mouse Jp Macacque 98 arbary Mac 82 rab ating Mac Rhesus Mac Gorilla 95 himp 96 Human Orang Gibbon Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.51/69
57 Goldman s test using simulation (related to the "parametric bootstrap") no clock tree T log likelihood l data 2 ( l l ) c clock T c l c simulating data sets... data data data... data estimating clocklike and nonclocklike trees from each data set ( l l ) 2 ( l l ) 2 ( l l ) 2 ( l l c c c c ) 2 ( l l c ) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.52/69
58 n outcome of rownian motion on a 5species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.53/69
59 n outcome of rownian motion on a 5species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.54/69
60 n outcome of rownian motion on a 5species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.55/69
61 n outcome of rownian motion on a 5species tree Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.56/69
62 rownian motion along a tree x 1 x x x v x x v 2 x 8 x x 8 9 v 8 x 3 x x v 3 x 9 x x 9 0 v 9 x 6 x x 5 7 x x 6 10 x x v 7 10 x x x 6 v x 10 v x x 5 x x v 10 v x 4 11 x x v x x x 12 0 v 12 x 0 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.57/69
63 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
64 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
65 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
66 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
67 istribution of tips on a tree under rownian Motion root v v 1 v Tip 1 is the sum of two independent changes each of which is drawn from a normal distribution (with mean 0 and variances v 3 and v 1 ) so it is normally distributed with mean 0 and variance v 3 + v 1. Similarly for tip 2 (variance is v 3 + v 2 ). They share branch 3, and the change there affects both random variables. So they are not independent or uncorrelated. Variance is the expectation of the square (of deviation from the mean), and covariance is the expectation of the product of those deviations, for the two variables. In fact the covariance of the values at tip 1 and tip 2 is the variance of the shared term that is the same in both of them, so it is v 3. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.58/69
68 ovariances of species on the tree 2 v 1 + v 8 + v 9 v 8 + v 9 v v 8 + v 9 v 2 + v 8 + v 9 v v 9 v 9 v 3 + v v 4 + v 12 v 12 v 12 v v 12 v 5 + v 11 + v 12 v 11 + v 12 v 11 + v v 4 12 v 11 + v 12 v 6 + v 10 + v 11 + v 12 v 10 + v 11 + v v 12 v 11 + v 12 v 10 + v 11 + v 12 v 7 + v 10 + v 11 + v 12 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.59/69
69 ovariances are of form a b c b d c c c e f g g g g h i i g i j k g i k l Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.60/69
70 Likelihood under rownian motion with two species x 1 x 2 v 1 v 2 x 0 f ( x; µ, σ 2) = ) 1 ( σ 2π exp (x µ)2 2σ 2 L = p i=1 ( 1 (2π) exp 1 v 1 v 2 2 [ (x 1i x 0i ) 2 + (x 2i x 0i ) 2 v 1 v 2 ]) Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.61/69
71 What happens if we estimate means and branch lengths? o we get the right answer if we estimate for each coordinate (each character) the value at the root and the branch lengths v 1 and v 2? ctually no. elow, we will do this by finding values of these that maximize the likelihood, and show that the likelihood becomes infinite if either v 1 or v 2 approaches zero. ven if we constrain there to be a clock, so v 1 = v 2 and look only at their sum v 1 + v 2 this turns out to be half as big as the truth, even with an infinite number of characters. Why? The problem seems to be that we are estimating too many parameters. There is one parameter (the root value) for each character. So the ratio of data to parameters does not rise to infinity as we increase the number of parameters. In circumstances like this, likelihood methods can misbehave. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.62/69
72 The solution: don t infer ancestors; use RML We can eliminate these problems by: 1. o not infer the states of the interior nodes. 2. Use only the relative positions of the tips. This eliminates the starting state at the root. It is RML, a variant of ML that loses almost no statistical power. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.63/69
73 Minimizing for each character i so: and then: Q = (x 1i x 0i ) 2 v 1 + (x 2i x 0i ) 2 v 2 dq dx 0i = 2 (x 1i x 0i ) v 1 2 (x 2i x 0i ) v 2 = 0 x 0i = 1 v 1 x 1i + 1 v 2 x 2i 1 v v 2 So that we have a maximum likelihood estimate of the starting value x 0i for each character. The result is that Q = (x 1i x 2i ) 2 v 1 + v 2 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.64/69
74 Likelihood after estimating initial coordinates Substituting in our estimates of x 0i, we end up with L = ( 1 (2π) p (v 1 v 2 ) exp p 2 p i=1 ) (x 1i x 2i ) 2 v 1 + v 2 and this finally turns into: lnl = p ln(2π) 1 2 p ln(v 1v 2 ) 1 2 p i=1 (x 1i x 2i ) 2 v 1 + v 2 This actually goes to infinity as either v 1 or v 2 goes to zero! This is related to the problem that dwards and avallisforza had with their maximum likelihood method in Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.65/69
75 Ifthereisaclock... If instead we constrain v 1 = v 2 because assume a clock: ln L = K p ln(v 1 + v 2 ) (v 1 + v 2 ) which leads to v 1 = v 2 = 2 /(4p) (which is half as big as it should be!) The number of parameters being estimated is p + 1, which rises as we consider more characters. The fact that the ratio of data to parameters does not rise without limit is the reason why likelihood misbehaves in this case. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.66/69
76 The difference between ML and RML Information we use for ML inference: species 2 species 1 species 4 species Information we use for RML inference: species 2 species 1 species 4 species x 2.0+x 3.0+x 4.0+x oes it matter that we don t know x? It makes it unnecessary to estimate the starting value x 0, and that eliminates p parameters. It means that the ratio of data to parameters does then rise as we add characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.67/69
77 Using only differences between populations(rml) We assume that we have observed only the differences x 1i x 2i, and not the actual locations on the phenotype scale. Then ( ) p 1 L = exp 1 (x 1i x 2i ) 2 2π v1 + v 2 2 v 1 + v 2 i=1 lnl = K p 2 ln(v 1 + v 2 ) (v 1 + v 2 ) n (x i1 x i2 ) 2 i=1 Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.68/69
78 Likelihood with two species using RML lnl = K p 2 ln (v 1 + v 2 ) (v 1 + v 2 ) lnl = K p 2 ln (v T) v T v T = 2 /p The number of parameters being estimated is 1 (it is the sum v 1 + v 2 ). The number of parameters does not rise as we consider more characters. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.69/69
79 Pruning a tree in the rownian motion case x 1 x 2 v 1 v 2 x 1 x 2 x 3 x 4 + δ = v v 1 2 v + v 1 2 v 1 v 2 v 5 v 3 v 4 x 12 x 3 x 4 v x + v x x = 12 v + v 1 2 v 6 δ v 3 v 4 v 5 v 6 The likelihood for the tree is the product of the linkelihoods for these two trees. y repeatedly applying this we can decompose the tree into n 1 independent twospecies trees. Getting their likelihoods is easy. Week 8: Testing trees, ootstraps, jackknifes, gene frequencies p.70/69
Week 7: Bayesian inference, Testing trees, Bootstraps
Week 7: ayesian inference, Testing trees, ootstraps Genome 570 May, 2008 Week 7: ayesian inference, Testing trees, ootstraps p.1/54 ayes Theorem onditional probability of hypothesis given data is: Prob
More informationBootstraps and testing trees. Aloglikelihoodcurveanditsconfidenceinterval
ootstraps and testing trees Joe elsenstein epts. of Genome Sciences and of iology, University of Washington ootstraps and testing trees p.1/20 loglikelihoodcurveanditsconfidenceinterval 2620 2625 ln L
More informationLecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30
Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 7 (Bootstraps, etc.) p.1/30 A nonphylogeny
More information= 1 = 4 3. Odds ratio justification for maximum likelihood. Likelihoods, Bootstraps and Testing Trees. Prob (H 2 D) Prob (H 1 D) Prob (D H 2 )
4 1 1/3 1 = 4 3 1 4 1/3 1 = 1 12 Odds ratio justification for maximum likelihood Likelihoods, ootstraps and Testing Trees Joe elsenstein the data H 1 Hypothesis 1 H 2 Hypothesis 2 the symbol for given
More informationWeek 5: Distance methods, DNA and protein models
Week 5: Distance methods, DNA and protein models Genome 570 February, 2016 Week 5: Distance methods, DNA and protein models p.1/69 A tree and the expected distances it predicts E A 0.08 0.05 0.06 0.03
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis) advantages of different information types
More informationInference of phylogenies, with some thoughts on statistics and geometry p.1/31
Inference of phylogenies, with some thoughts on statistics and geometry Joe Felsenstein University of Washington Inference of phylogenies, with some thoughts on statistics and geometry p.1/31 Darwin s
More informationConcepts and Methods in Molecular Divergence Time Estimation
Concepts and Methods in Molecular Divergence Time Estimation 26 November 2012 Prashant P. Sharma American Museum of Natural History Overview 1. Why do we date trees? 2. The molecular clock 3. Local clocks
More informationAmira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationDr. Amira A. ALHosary
Phylogenetic analysis Amira A. ALHosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut UniversityEgypt Phylogenetic Basics: Biological
More informationIntegrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley
Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;
More informationMaximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood
Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood PRE 906: Structural Equation Modeling Lecture #3 February 4, 2015 PRE 906, SEM: Estimation Today s Class An
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationStatistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution?
Statistical nonmolecular phylogenetics: can molecular phylogenies illuminate morphological evolution? 30 July 2011. Joe Felsenstein Workshop on Molecular Evolution, MBL, Woods Hole Statistical nonmolecular
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationInterval Estimation III: Fisher's Information & Bootstrapping
Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/ 1.96 se) Likelihood Profile
More informationAppendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny
008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationPhylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline
Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying
More informationConstructing Evolutionary/Phylogenetic Trees
Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istancebased methods Ultrametric Additive: UPGMA Transformed istance NeighborJoining Characterbased Maximum Parsimony Maximum Likelihood
More informationPolitical Science 236 Hypothesis Testing: Review and Bootstrapping
Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The
More information1/24/2008. Review of Statistical Inference. C.1 A Sample of Data. C.2 An Econometric Model. C.4 Estimating the Population Variance and Other Moments
/4/008 Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University C. A Sample of Data C. An Econometric Model C.3 Estimating the Mean of a Population C.4 Estimating the Population
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationTopic 19 Extensions on the Likelihood Ratio
Topic 19 Extensions on the Likelihood Ratio TwoSided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis
More informationParameter Estimation and Fitting to Data
Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodnessoffit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationPOPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics
POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics  in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa.  before we review the
More informationInstitute of Actuaries of India
Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationMaximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.
Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters  "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf
More informationAdditive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.
Additive distances Let T be a tree on leaf set S and let w : E R + be an edgeweighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then
More informationStatistical Data Analysis Stat 3: pvalues, parameter estimation
Statistical Data Analysis Stat 3: pvalues, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationLab 9: Maximum Likelihood and Modeltest
Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationGoodness of Fit Goodness of fit  2 classes
Goodness of Fit Goodness of fit  2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact pvalue Exact confidence
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationMaximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018
Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to midterm exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationStatistics  Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics  Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION  theory that groups of organisms change over time so that descendeants differ structurally
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am  2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationST495: Survival Analysis: Hypothesis testing and confidence intervals
ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian
More informationT.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS
ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2011 University of California, Berkeley B.D. Mishler Feb. 1, 2011. Qualitative character evolution (cont.)  comparing
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationBTRY 4830/6830: Quantitative Genomics and Genetics
BTRY 4830/6830: Quantitative Genomics and Genetics Lecture 23: Alternative tests in GWAS / (Brief) Introduction to Bayesian Inference Jason Mezey jgm45@cornell.edu Nov. 13, 2014 (Th) 8:409:55 Announcements
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationSome of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!
Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis
More informationRecall the Basics of Hypothesis Testing
Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE
More informationHypothesis Testing  Frequentist
Frequentist Hypothesis Testing  Frequentist Compare two hypotheses to see which one better explains the data. Or, alternatively, what is the best way to separate events into two classes, those originating
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationChapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer
Chapter 6 Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer The aim of this chapter is to calculate confidence intervals for the maximum power consumption per customer
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Nonparametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Nonparametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationPractice Problems Section Problems
Practice Problems Section 443 44 45 46 47 48 410 Supplemental Problems 41 to 49 413, 14, 15, 17, 19, 0 43, 34, 36, 38 447, 49, 5, 54, 55 459, 60, 63 466, 68, 69, 70, 74 479, 81, 84 485,
More informationSome New Aspects of DoseResponse Models with Applications to Multistage Models Having Parameters on the Boundary
Some New Aspects of DoseResponse Models with Applications to Multistage Models Having Parameters on the Boundary Bimal Sinha Department of Mathematics & Statistics University of Maryland, Baltimore County,
More informationInferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution
Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationUser s Manual for. Continuous. (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK
User s Manual for Continuous (copyright M. Pagel) Mark Pagel School of Animal and Microbial Sciences University of Reading Reading RG6 6AJ UK email: m.pagel@rdg.ac.uk (www.ams.reading.ac.uk/zoology/pagel/)
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationThe Surprising Conditional Adventures of the Bootstrap
The Surprising Conditional Adventures of the Bootstrap G. Alastair Young Department of Mathematics Imperial College London Inaugural Lecture, 13 March 2006 Acknowledgements Early influences: Eric Renshaw,
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationINTERVAL ESTIMATION AND HYPOTHESES TESTING
INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,
More informationSPRING 2007 EXAM C SOLUTIONS
SPRING 007 EXAM C SOLUTIONS Question #1 The data are already shifted (have had the policy limit and the deductible of 50 applied). The two 350 payments are censored. Thus the likelihood function is L =
More informationMarginal Screening and PostSelection Inference
Marginal Screening and PostSelection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationGov 2001: Section 4. February 20, Gov 2001: Section 4 February 20, / 39
Gov 2001: Section 4 February 20, 2013 Gov 2001: Section 4 February 20, 2013 1 / 39 Outline 1 The Likelihood Model with Covariates 2 Likelihood Ratio Test 3 The Central Limit Theorem and the MLE 4 What
More informationLikelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution
Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, UniBonn
Parameter estimation and forecasting Cristiano Porciani AIfA, UniBonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More informationPHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD
Annu. Rev. Ecol. Syst. 1997. 28:437 66 Copyright c 1997 by Annual Reviews Inc. All rights reserved PHYLOGENY ESTIMATION AND HYPOTHESIS TESTING USING MAXIMUM LIKELIHOOD John P. Huelsenbeck Department of
More informationarxiv: v1 [math.st] 22 Jun 2018
Hypothesis testing near singularities and boundaries arxiv:1806.08458v1 [math.st] Jun 018 Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes Department of Mathematics & Statistics University
More informationTesting Restrictions and Comparing Models
Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by
More informationParameter Estimation
Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0706 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationChapter 10: Inferences based on two samples
November 16 th, 2017 Overview Week 1 Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 1: Descriptive statistics Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter 8: Confidence
More informationSTT 843 Key to Homework 1 Spring 2018
STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgibin/talks/allprint.pl TAE 2018 Benasque, Spain 315 Sept 2018 Glen Cowan Physics
More informationMaximum Likelihood (ML) Estimation
Econometrics 2 Fall 2004 Maximum Likelihood (ML) Estimation Heino Bohn Nielsen 1of32 Outline of the Lecture (1) Introduction. (2) ML estimation defined. (3) ExampleI:Binomialtrials. (4) Example II: Linear
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationParametric Modelling of Overdispersed Count Data. Part III / MMath (Applied Statistics) 1
Parametric Modelling of Overdispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and QQ plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and QQ plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More information