Exam Dates. February 19 March 1 If those dates don't work ad hoc in Heidelberg

Size: px
Start display at page:

Download "Exam Dates. February 19 March 1 If those dates don't work ad hoc in Heidelberg"

Transcription

1 Exam Dates February 19 March 1 If those dates don't work ad hoc in Heidelberg

2 Plan for next lectures Lecture 11: More on Likelihood Models & Parallel Computing in Phylogenetics Lecture 12: (Andre) Discrete Operations on Trees Alexis will not be there Lecture 13: (Pavlos & Alexis) Population genetics I Lecture 14: (Pavlos & Alexis) Population genetics II Lecture 15: (Alexis) Wrap-up & exam preparation

3 Likelihood-based Phylogenetic Inference so far Search Techniques The Maximum Likelihood Model Some simple DNA models Jukes Cantor General Time Reversible (GTR) Model

4 Outline for today More on Likelihood Models Computing the likelihood on a tree Rate heterogeneity among sites Model Selection Protein Substitution Models Optimizing and Parallelizing Likelihood Calculations Parallelization Load Balance Saving RAM Course evaluation

5 Maximum Likelihood Length: m Seq1 Seq2 Seq3 Seq4 Alignment

6 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Substitution model

7 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Substitution model Commonly denoted as Q matrix: transition probs for time dt, for time t: P(t)=e^Qt

8 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Prior probabilities, Empirical base frequencies Substitution model ππaa ππcc π πgg ππtt

9 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 Prior probabilities, Empirical base frequencies ππaa ππcc π πgg ππtt Substitution model Seq 3 b3 b1 b5 b2 Seq 2 b4 Seq 4

10 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 Prior probabilities, Empirical base frequencies πa πc πg πt Substitution model Seq 3 b3 b1 b5 b2 b4 Seq 2 Seq 4 virtual root: vr

11 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 Prior probabilities, Empirical base frequencies ππaa ππcc π πgg ππtt Substitution model Seq 3 b3 b1 vr b5 b2 b4 Seq 2 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Seq 4

12 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 A C G T Alignment Seq 1 b1 Prior probabilities, Empirical base frequencies πa πc πg πt Substitution model Floating-point & memory b3 intensive vr Seq 3 b5 b2 b4 Seq 2 P(A) P(C) P(G) P(T) P(A) P(C) P(G) P(T) m Seq 4

13 Post-order Traversal virtual root

14 Post-order Traversal virtual root

15 Post-order Traversal virtual root AGCC A G C C

16 Post-order Traversal virtual root

17 Post-order Traversal virtual root :-)

18 What happens when we compute this inner vector? P(b_i) P(A) P(C) P(G) P(T) L^(k) P(b_j) A C G T A C G T A C G T b_i L^(i) P(A) P(C) P(G) P(T) A C G T b_j P(A) P(C) P(G) P(T) L^(j) Position c

19 Post-order Traversal virtual root :-)

20 Post-order Traversal virtual root :-)

21 Post-order Traversal virtual root :-)

22 Post-order Traversal virtual root :-) :-)

23 Post-order Traversal virtual root :-) :-)

24 Post-order Traversal virtual root :-) :-)

25 Post-order Traversal virtual root :-) :-) :-)

26 Post-order Traversal Overall likelihood: sum over logs of per-site likelihoods virtual root :-) :-) :-)

27 Post-order Traversal LS4

28 Post-order Traversal LS3 b1 b4 LS5

29 Post-order Traversal LS3 b4' b1' x LS5

30 Post-order Traversal b4':=b1+b4 b1' := 0 LS3 x LS5

31 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Prior probabilities, Empirical base frequencies ππaa ππcc π πgg ππtt Substitution model Seq 3 Seq 1 Seq 2 Seq 4 optimize branch lengths

32 Branch Length Optimization starting branch

33 Branch Length Optimization starting branch Essentially we place the virtual root into this branch here

34 Branch Length Optimization starting branch

35 Branch Length Optimization starting branch

36 Branch Length Optimization starting branch

37 Branch Length Optimization starting branch

38 Branch Length Optimization starting branch

39 Branch Length Optimization starting branch

40 Branch Length Optimization starting branch iterate until convergence

41 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Prior probabilities, Empirical base frequencies ππaa ππcc π πgg ππtt Substitution model optimize model parameters Seq 1 Seq 2 Seq 3 Seq 4

42 Maximum Likelihood Length: m A C G T Seq1 Seq2 Seq3 Seq4 Alignment A C G T Prior probabilities, Empirical base frequencies ππaa ππcc π πgg ππtt Substitution model optimize model parameters Seq 1 We don't need to re-estimate Seq 2 model params. for every tree topology as long as the tree is reasonable, i.e., non-random Seq 3 Seq 4

43 Numerical Optimization Procedures See chapters 9 & 10 of: Numerical Recipes in C The Art of Scientific Computing

44 Basic Operations Maximum Likelihood Compute Conditional Likelihood Array at an inner node Compute Likelihood at Virtual Root Optimize a Branch Length for a given Branch Optimize all Branch Lengths Optimize other Model Parameters

45 Basic Operations Maximum Likelihood Compute Conditional Likelihood Array at an inner node Compute Likelihood at Virtual Root Optimize a Branch Length for a given Branch Optimize all Branch Lengths Optimize other Model Parameters Bayesian programs only require two operations

46 Rate Heterogeneity among Sites Gene ACGGGGGGGGGGGTTTTCCCCC ATGGGGGGGGGGGTTTCCCCCC ACCGGGGGGGGGGTTTTGCCCC AGGGGGGGGGGGCTTTTCCCCC Among-site rate heterogeneity Biological phenomenon different sites/columns evolve at different speeds Need to accommodate this in our models

47 Γ-Distribution

48 Γ-Distribution Small α high rate heterogeneity Large α low rate heterogeneity

49 Discrete Γ-Distribution r0 r1 r2 r3

50 An Abstract View of Γ rate 0 P(t) =e^qr0t rate 1 P(t) =e^qr1t rate 2 P(t) =e^qr2t rate 3 P(t) =e^qr3t length 7 This is the integral of the likelihood we approximate via discretization LnL(i) = log(¼ * (L0 + L1 + L2 + L3)) Log likelihood at site i All Γ rates have equal probability

51 An Abstract View of Γ rate 0 P(t) =e^qr0t rate 1 P(t) =e^qr1t rate 2 P(t) =e^qr2t length 7 4 times higher memory consumption rate 3 P(t) =e^qr3t

52 An Abstract View of Γ rate 0 P(t) =e^qr0t rate 1 P(t) =e^qr1t rate 2 P(t) =e^qr2t length 7 4 times more floating point operations rate 3 P(t) =e^qr3t

53 Γ Model of Rate Heterogeneity with 4 discrete rates α r0 r1 r2 r r0 r1 r2 r3

54 Another Famous Computational Evolutionary Biologist Ziheng Yang the guy who has written this nice book Z Yang: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods, Journal of Molecular evolution 39 (3), , At LMU Munich during student protests

55 An Abstract View of CAT alignment length: 7

56 CAT: ML-based Optimization of Individual Rates per Site r0, r1,,rm

57 CAT Step 1: ML-based Optimization of Individual Rates per Site A. Stamatakis: Phylogenetic models of rate heterogeneity: a high performance computing perspective, Proceedings of IPDPS r0, r1,,rm

58 CAT Step 2: Order List of ri according to site log likelihoods li ([r0, -1.2], [r1, -5.6], [r2, -2.5], [r3,-0.3]) Sort according to log likelihood contributions of sites ([r1, -5.6], [r2, -2.5], [r0, -1.2], [r3,-0.3])

59 CAT Step 3: Categorize; Example with 2 Rate Categories (default: 25) ([r1, -5.6], [r2, -2.5], [r0, -1.2], [r3,-0.3]) Select the 2 rates with highest Likelihood Contributions rc0 := r1, category[1] := 0 rc1 := r2, category[2] := 1 The index refers to the alignment column position Assign remaining rate categories of positions r0,r3 to the rate category 0 or 1 which minimizes (δ(r1, r0), δ(r2, r0)) and (δ(r1, r3), δ(rc2, r3)) respectively

60 ML-Loops CAT-LOOP for(i = 0; i < n; i++) { cat = category[i]; r = rate[cat]; p[i] = f(q[i], pq, r[i], pr, r); } Γ-LOOP for(i = 0; { p[i].g0 p[i].g1 p[i].g2 p[i].g3 } i < n; i++) = = = = f(q[i], f(q[i], f(q[i], f(q[i], pq, pq, pq, pq, r[i], r[i], r[i], r[i], pr, pr, pr, pr, r0); r1); r2)); r3));

61 715 Sequences under HKY85+Γ Log Likelihood Score under Γ C.E. Robertson et al (2005) Phylogenetic diversity and ecology of environmental Archaea, In Current Opinion in Microbiology. Execution Time

62 Model Selection How do we select a model? How do we avoid over-parameterizing a model? Typically: a model with more parameters yields better likelihood scores better likelihood is not a criterion for selecting a more complex model we may over-parametrize the model key question: Does a more complex model yield a different tree? Model selection criteria: Likelihood-ratio test can only be applied to nested models! More flexible criteria not requiring nested models AIC: Akaike Information Criterion BIC: Bayesian Information Criterion

63 Protein Substitution Models The GTR Q matrix for protein data has 189 free parameters instead of just 5 (DNA) Estimating 189 rate parameters is difficult, time-consuming, and may lead to overparameterizing the model Instead, empirical models such as JTT, LG, WAG, MTMAM, etc. are used The Q matrices are obtained by jointly optimizing model parameters on a large collection of reference alignments The models differ with respect to: the amount of data used to obtain them the type of data on which the models have been optimized e.g., dedicated models for HIV, FLU, Mammals the numerical optimization methods used Examples of general models: LG: Le & Gascuel: An Improved General Amino Acid Replacement Matrix WAG: Whelan & Goldman: A General Empirical Model of Protein Evolution Derived from Multiple Protein Families Using a Maximum-Likelihood Approach

64 Floating Point Numbers Machine numbers are an imperfect mapping of the infinite real numbers to a finite number of machine values!

65 Floating Point Arithmetics: The Root of All Evil Computational science mostly relies on floating-point intensive codes How do we verify these codes? We stand on shaky grounds Scientists using those codes assume that there are no bugs Double precision arithmetics required for certain applications Who knows what de-normalized floating point numbers are? Please have a look at: J. Björndalen, O. Anshus: Trusting floating point benchmarks-are your benchmarks really data-independent? Applied Parallel Computing. State of the art in Scientific Computing 2010; pp , Springer. and at my micro-benchmark at:

66 Post-order Traversal virtual root Values get smaller and smaller as we Approach the root

67 Post-order Traversal We need to apply numerical scaling techniques to avoid underflow! virtual root Values get smaller and smaller as we Approach the root

68 Optimization Use SSE3 & AVX vector intrinsics Special implementations for: TIP TIP TIP INNER INNER I will spare you the details But, avoid redundant computations INNER

69 Loop Level Parallelism virtual root P Q R P[i] = f(q[i], R[i])

70 Loop Level Parallelism virtual root P This operation uses 95% of total execution time! Q R P[i] = f(q[i], R[i])

71 Loop Level Parallelism virtual root P Q R

72 Loop Level Parallelism virtual root P Q R

73 Loop Level Parallelism virtual root P Q R

74 Parallel Post-order Traversal Only need to synchronize at the root MPI_Reduce() to calculate: virtual root Σ log(li)

75 Parallel Post-order Traversal Overall Score + Σ log(li) Σ log(li)

76 Parallelization Outline

77 Loop-Level Parallelism on Distributed Memory Machines

78 Classic Fork-Join with Busy-Wait Broadcast Traversal Trigger Compute all vectors in Traversal descriptor Reduce Σ log(li) Broadcast Traversal Barrier Trigger Compute all vectors in Traversal descriptor Reduce Σ log(li) Broadcast new α params. Busy wait Barrier Trigger Compute all vectors in Traversal descriptor Reduce Σ log(li) Barrier

79 Synchronizations in RAxML with Pthreads RAxML Pthreads for a run time of about 10 seconds on 16 cores/threads 404 taxa 7429 sites: 194,000 Barriers 1481 taxa 1241 sites: 739,000 Barriers A paper on performance of alternative PThreads barrier implementations: S.A. Berger, A. Stamatakis: "Assessment of Barrier Implementions for Fine-Grain Parallel Regions on Current Multi-core Architectures", IEEE Cluster 2010.

80 Classic Fork-Join with Busy-Wait 0 A D Broadcast Traversal Trigger Compute all vectors in Traversal descriptor C A B B Reduce Σ log(li) Broadcast Traversal Barrier Trigger Compute all vectors in Traversal descriptor C D A C Reduce Σ log(li) B Broadcast new α params. α:=0.2 Compute all vectors in Traversal descriptor D Reduce Σ log(li) Barrier Trigger Barrier

81 Classic Fork-Join with Busy-Wait 0 A D Broadcast Traversal Trigger Compute all vectors For good parallel performance: thein Traversal broadcast must be fast!descriptor C A B B Reduce Σ log(li) Broadcast Traversal Barrier Trigger Compute all vectors in Traversal descriptor C D A C Reduce Σ log(li) B Broadcast new α params. α:=0.2 Compute all vectors in Traversal descriptor D Reduce Σ log(li) Barrier Trigger Barrier

82 Parallel Performance Problems They all start with partitioned datasets!

83 What is a partitioned dataset? Multi-gene or whole-genome alignment

84 What is a partitioned dataset? Gene 0 Gene 1 Gene 2 Gene 3 Multi-gene or whole-genome alignment Gene 4

85 What is a partitioned dataset? We may also partition by 1st, 2nd, and 3rd codon position Gene 0 Gene 1 Gene 2 Gene 3 Multi-gene or whole-genome alignment Gene 4

86 What is a partitioned dataset? Gene 0 Gene 1 Gene 2 Gene 3 Gene 4 α0 GTR0 α1 GTR1 α2 GTR2 α3 GTR3 α4 GTR4

87 What is a partitioned dataset? Gene 0 Gene 1 Gene 2 Gene 3 Gene 4 α0 GTR0 α1 GTR1 α2 GTR2 α3 GTR3 α4 GTR4 Joint branch length estimate

88 Assigning Protein Models? Gene 0 Gene 1 WAG LG JTT... WAG LG JTT... Gene 2 WAG LG JTT... Gene 3 WAG LG JTT... Gene 4 WAG LG JTT... Joint branch length estimate

89 Assigning Protein Models? Gene 0 Gene 1 WAG LG JTT... WAG LG JTT... There exits about 20 empirical protein models 205 possible Assignments! Gene 2 WAG LG JTT... Gene 3 WAG LG JTT... Gene 4 WAG LG JTT... Joint branch length estimate

90 Assigning Protein Models? Gene 0 Gene 1 WAG LG JTT... WAG LG JTT... Because partitions are linked via Branch lengths, this is NP-hard proof: joint work with Kassian heuristics: a Master's thesis in our lab Gene 2 WAG LG JTT... Gene 3 WAG LG JTT... Gene 4 WAG LG JTT... Joint branch length estimate

91 What is a partitioned dataset? Gene 0 Gene 1 Gene 2 Gene 3 Gene 4 α0 GTR0 α1 GTR1 α2 GTR2 α3 GTR3 α4 GTR4 per-partition branch length estimate

92 What is a partitioned dataset? Gene 0 α0 GTR0 Gene 1 α1 GTR1 Gene 3 Gene 4 Same underlying tree α2 topology! α3 GTR2 GTR3 α4 GTR4 Gene 2

93 What is a partitioned dataset? Gene 0 α0 GTR0 Gene 2 Gene 3 Gene 1 Per-partition branch length estimate Increases the number of parameters in the likelihood model by 4 * (2n 3), α1 where n is the α2 number of taxaα3 GTR1 GTR2 GTR3 Gene 4 α4 GTR4

94 Parallel Performance Problems They all start with partitioned datasets! How do we distribute partitions to processors? How do we calculate parameter changes? How much time does our broadcast take? Goal: Keep all processors busy all the time minimize communication and synchronization!

95 Data Distribution Orangutan Gorilla Chimp Homo Sapiens AACG AAGG A-GG AGGA CPU cache Shared TTTT TTTTTTT TTTT CPU cache memory

96 Data Distribution Orangutan Gorilla Chimp Homo Sapiens AACG AAGG A-GG AGGA CPU TTTT TTTTTTT TTTT CPU cache cache Distributed memory Distributed memory

97 Data Distribution Orangutan Gorilla Chimp Homo Sapiens AACG AAGG A-GG AGGA CPU Partitioned data distribution is not that trivial! TTTT TTTTTTT TTTT CPU cache cache Distributed memory Distributed memory

98 Data Distribution I Orangutan Gorilla Chimp Homo Sapiens AACG AAGG A-GG AGGA CPU TTTT TTTTTTT TTTT CPU cache cache Distributed memory Distributed memory

99 Data Distribution I Orangutan Gorilla Chimp Homo Sapiens Works well when we have more partitions than processors: May lead to load imbalance not all processors obtain equal number of sites! AACG AAGG A-GG AGGA CPU TTTT TTTTTTT TTTT CPU cache cache Distributed memory Distributed memory

100 Data Distribution II Orangutan Gorilla Chimp Homo Sapiens Works well when we have more processors than partitions: However we will need to compute: P(t)=eQt for each partition at each processor! AACG AAGG A-GG AGGA CPU TTTT TTTTTTT TTTT CPU cache cache Distributed memory Distributed memory

101 Data Distribution II Orangutan Gorilla Chimp Homo Sapiens Works well when we have more processors than partitions: However we will need to compute: P(t)=eQt for each partition at each processor! AACG AAGG eq t eq t A-GG AGGA 1 CPU 1 TTTT TTTeQ t eq t TTTT TTTT 2 2 CPU cache cache Distributed memory Distributed memory

102 Data Distribution II Orangutan Gorilla Chimp Performance impact depends on number of states in Homo Sapiens data/dimension of Q Works well when we have more processors than partitions: However we will need to compute: P(t)=eQt for each partition at each processor! AACG AAGG eq t eq t A-GG AGGA 1 CPU 1 TTTT TTTeQ t eq t TTTT TTTT 2 2 CPU cache cache Distributed memory Distributed memory

103 Data Distribution II Orangutan Gorilla How do we distribute partitions to Chimp processors? Homo Sapiens Works well when we have more processors than partitions: However we will need to compute: P(t)=eQt for each partition at each processor! AACG AAGG eq t eq t A-GG AGGA 1 CPU 1 TTTT TTTeQ t eq t TTTT TTTT 2 2 CPU cache cache Distributed memory Distributed memory

104 Load Balance I G0 P0 G1 G2 P1 G3

105 Load Balance I P0 P1

106 Load Balance I P0 P1 Find the partition-to-processor assignment such that the maximum number of sites per processor is minimized this is NP-hard

107 Load Balance I The multiprocessor job scheduling problem in phylogenetics Problem when #partitions >> #cores Tested per-site (cyclic/modulo) data distribution versus per partition data distribution We used the Longest Processing Time (LPT) heuristics for assigning partitions to processors 25 taxa, 220,000 sites, 100 genes GAMMA model naïve: 613 secs LPT: 550 secs CAT model naïve: 298 secs LPT: 127 secs Larger protein dataset under Г model of rate heterogeneity: 10-fold performance improvement! J. Zhang, A. Stamatakis: "The Multi-Processor Scheduling Problem in Phylogenetics", 11th IEEE HICOMB workshop (in conjunction with IPDPS 2012).

108 LPT heuristics for multi-processor scheduling Sort jobs (partitions) by processing length (partition length) in decreasing order Remove a job (partition) from the sorted list and assign it to the processor with the earliest end time (the smallest sum of partition lengths) Repeat until the sorted list is empty Upper bound: 4/3 1/(3p) * OPT, where p is the number of processors Graham, R. L.: "Bounds on Multiprocessing Timing Anomalies". SIAM Journal on Applied Mathematics 17 (2): , Remark: LPT works surprisingly well (see our paper on the phylogenetic problem where we also tested other heuristics)

109 Partitioned Branch Lengths & other parameters

110 Load-Balance II Zoom

111 Synchronization Points Assume 10 branches Each branch requires 10 Newton-Raphson Iterations Each NR Iteration requires a synchronization via a reduction operation One branch/partition at a time: 100 sync. points, less work (only one partition) per sync. point All branches concurrently: 10 sync. points, more work per sync. point Branches will need distinct number of operations Add convergence state bit vector

112 Synchronization Points Org1 AC GT Org2 AC TT

113 Synchronization Points Org1 AC GT Org2 AC TT

114 Synchronization Points Org1 AC GT Org2 AC TT 00

115 Synchronization Points Org1 AC GT Org2 AC TT 00 00

116 Synchronization Points Org1 AC GT Org2 AC TT

117 Synchronization Points Org1 AC GT Org2 AC TT In this example: 4 instead of 7 sync points!

118 Load Balance II A. Stamatakis, M. Ott: "Load Balance in the Phylogenetic Likelihood Kernel". Proceedings of ICPP 2009, Vienna, Austria, September 2009.

119 Classic Fork-Join with For good parallel performance: the Busy-Wait broadcast must be fast! Remember: 10 secs 16 cores approx 0 A C A D B B times. 500,000 Traversal What happensbroadcast if we have 1000 partitions and Compute propose 1000 new in all vectors alpha parameters? Traversal descriptor Reduce Σ log(li) Broadcast Traversal Trigger Barrier Trigger Compute all vectors in Traversal descriptor C D A C Reduce Σ log(li) B Broadcast new α params. α:=0.2 Compute all vectors in Traversal descriptor D Reduce Σ log(li) Barrier Trigger Barrier

120 Alternative MPI parallelization P0 P1 MPI_Reduce() MPI_Bcast() ? ? Execution time MPI_Reduce() MPI_Bcast()

121 Alternative MPI parallelization P P1 I think this is the way we will have to the future. MPI_Reduce() do it in MPI_Bcast()?? Execution time MPI_Reduce() MPI_Bcast()

122 ExaML New code implementing this new parallelization scheme A. Stamatakis, A. J. Aberer: "Novel Parallelization Schemes for Large-Scale Likelihood-based Phylogenetic Inference", accepted for publication at IPDPS 2013, Boston, USA, Up to 3 times faster than RAxML-Light (2012) on large, partitioned datasets Tested with up to 1536 cores on our cluster at HITS Future developments 20,000,000 CPU hors on SuperMUC for Improving scalability Implementing fault tolerance Execute 1KITE tree inferences Further details ask Andre Aberer

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1

Smith et al. American Journal of Botany 98(3): Data Supplement S2 page 1 Smith et al. American Journal of Botany 98(3):404-414. 2011. Data Supplement S1 page 1 Smith, Stephen A., Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue. 2011. Understanding angiosperm

More information

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). 1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff

More information

COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS

COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS COMPUTING LARGE PHYLOGENIES WITH STATISTICAL METHODS: PROBLEMS & SOLUTIONS *Stamatakis A.P., Ludwig T., Meier H. Department of Computer Science, Technische Universität München Department of Computer Science,

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29):

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week (Jan 27 & 29): Statistical estimation of models of sequence evolution Phylogenetic inference using maximum likelihood:

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Preliminaries. Download PAUP* from: Tuesday, July 19, 16

Preliminaries. Download PAUP* from:   Tuesday, July 19, 16 Preliminaries Download PAUP* from: http://people.sc.fsu.edu/~dswofford/paup_test 1 A model of the Boston T System 1 Idea from Paul Lewis A simpler model? 2 Why do models matter? Model-based methods including

More information

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid

T R K V CCU CG A AAA GUC T R K V CCU CGG AAA GUC. T Q K V CCU C AG AAA GUC (Amino-acid Lecture 11 Increasing Model Complexity I. Introduction. At this point, we ve increased the complexity of models of substitution considerably, but we re still left with the assumption that rates are uniform

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018 Maximum Likelihood Tree Estimation Carrie Tribble IB 200 9 Feb 2018 Outline 1. Tree building process under maximum likelihood 2. Key differences between maximum likelihood and parsimony 3. Some fancy extras

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetics. BIOL 7711 Computational Bioscience

Phylogenetics. BIOL 7711 Computational Bioscience Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium

More information

Molecular Evolution & Phylogenetics

Molecular Evolution & Phylogenetics Molecular Evolution & Phylogenetics Heuristics based on tree alterations, maximum likelihood, Bayesian methods, statistical confidence measures Jean-Baka Domelevo Entfellner Learning Objectives know basic

More information

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington. Maximum Likelihood This presentation is based almost entirely on Peter G. Fosters - "The Idiot s Guide to the Zen of Likelihood in a Nutshell in Seven Days for Dummies, Unleashed. http://www.bioinf.org/molsys/data/idiots.pdf

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Taming the Beast Workshop

Taming the Beast Workshop Workshop and Chi Zhang June 28, 2016 1 / 19 Species tree Species tree the phylogeny representing the relationships among a group of species Figure adapted from [Rogers and Gibbs, 2014] Gene tree the phylogeny

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences Basic Bioinformatics Workshop, ILRI Addis Ababa, 12 December 2017 1 Learning Objectives

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

Molecular Evolution, course # Final Exam, May 3, 2006

Molecular Evolution, course # Final Exam, May 3, 2006 Molecular Evolution, course #27615 Final Exam, May 3, 2006 This exam includes a total of 12 problems on 7 pages (including this cover page). The maximum number of points obtainable is 150, and at least

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES INTRODUCTION CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES This worksheet complements the Click and Learn developed in conjunction with the 2011 Holiday Lectures on Science, Bones, Stones, and Genes:

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR BIOINFORMATICS KT Maastricht University Faculty of Humanities and Science Knowledge Engineering Study TRIAL EXAMINATION MASTERS KT-OR Examiner: R.L. Westra Date: March 30, 2007 Time: 13:30 15:30 Place:

More information

Probabilistic modeling and molecular phylogeny

Probabilistic modeling and molecular phylogeny Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft]

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2016 University of California, Berkeley. Parsimony & Likelihood [draft] Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2016 University of California, Berkeley K.W. Will Parsimony & Likelihood [draft] 1. Hennig and Parsimony: Hennig was not concerned with parsimony

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld

INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM. Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld INFORMATION-THEORETIC BOUNDS OF EVOLUTIONARY PROCESSES MODELED AS A PROTEIN COMMUNICATION SYSTEM Liuling Gong, Nidhal Bouaynaya and Dan Schonfeld University of Illinois at Chicago, Dept. of Electrical

More information

Reconstruire le passé biologique modèles, méthodes, performances, limites

Reconstruire le passé biologique modèles, méthodes, performances, limites Reconstruire le passé biologique modèles, méthodes, performances, limites Olivier Gascuel Centre de Bioinformatique, Biostatistique et Biologie Intégrative C3BI USR 3756 Institut Pasteur & CNRS Reconstruire

More information

Tree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

Phylogenetic Assumptions

Phylogenetic Assumptions Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by

More information

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing Parallel Processing CS575 Parallel Processing Lecture five: Efficiency Wim Bohm, Colorado State University Some material from Speedup vs Efficiency in Parallel Systems - Eager, Zahorjan and Lazowska IEEE

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties of Bayesian

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

CSE 241 Class 1. Jeremy Buhler. August 24,

CSE 241 Class 1. Jeremy Buhler. August 24, CSE 41 Class 1 Jeremy Buhler August 4, 015 Before class, write URL on board: http://classes.engineering.wustl.edu/cse41/. Also: Jeremy Buhler, Office: Jolley 506, 314-935-6180 1 Welcome and Introduction

More information

Using R for Iterative and Incremental Processing

Using R for Iterative and Incremental Processing Using R for Iterative and Incremental Processing Shivaram Venkataraman, Indrajit Roy, Alvin AuYoung, Robert Schreiber UC Berkeley and HP Labs UC BERKELEY Big Data, Complex Algorithms PageRank (Dominant

More information

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Learn what Bayes Theorem and Bayesian Inference are 2. Reinforce the properties

More information

Phylogenetic Inference using RevBayes

Phylogenetic Inference using RevBayes Phylogenetic Inference using RevBayes Model section using Bayes factors Sebastian Höhna 1 Overview This tutorial demonstrates some general principles of Bayesian model comparison, which is based on estimating

More information

Molecular Evolution and Phylogenetic Tree Reconstruction

Molecular Evolution and Phylogenetic Tree Reconstruction 1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe? How should we go about modeling this? gorilla GAAGTCCTTGAGAAATAAACTGCACACACTGG orangutan GGACTCCTTGAGAAATAAACTGCACACACTGG Model parameters? Time Substitution rate Can we observe time or subst. rate? What

More information

Lab 9: Maximum Likelihood and Modeltest

Lab 9: Maximum Likelihood and Modeltest Integrative Biology 200A University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS" Spring 2010 Updated by Nick Matzke Lab 9: Maximum Likelihood and Modeltest In this lab we re going to use PAUP*

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Workshop III: Evolutionary Genomics

Workshop III: Evolutionary Genomics Identifying Species Trees from Gene Trees Elizabeth S. Allman University of Alaska IPAM Los Angeles, CA November 17, 2011 Workshop III: Evolutionary Genomics Collaborators The work in today s talk is joint

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

7. Tests for selection

7. Tests for selection Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info

More information

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101

Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109. Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 Welcome to CSE21! Lecture B Miles Jones MWF 9-9:50pm PCYN 109 Lecture D Russell (Impagliazzo) MWF 4-4:50am Center 101 http://cseweb.ucsd.edu/classes/sp16/cse21-bd/ March 30, 2016 Sorting (or Ordering)

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Lecture 4. Models of DNA and protein change. Likelihood methods

Lecture 4. Models of DNA and protein change. Likelihood methods Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2018 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2018 University of California, Berkeley D.D. Ackerly Feb. 26, 2018 Maximum Likelihood Principles, and Applications to

More information

Structure Learning: the good, the bad, the ugly

Structure Learning: the good, the bad, the ugly Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform

More information

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem

A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem A New Dominant Point-Based Parallel Algorithm for Multiple Longest Common Subsequence Problem Dmitry Korkin This work introduces a new parallel algorithm for computing a multiple longest common subsequence

More information

Scheduling Parallel Jobs with Linear Speedup

Scheduling Parallel Jobs with Linear Speedup Scheduling Parallel Jobs with Linear Speedup Alexander Grigoriev and Marc Uetz Maastricht University, Quantitative Economics, P.O.Box 616, 6200 MD Maastricht, The Netherlands. Email: {a.grigoriev, m.uetz}@ke.unimaas.nl

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Parallel Program Performance Analysis

Parallel Program Performance Analysis Parallel Program Performance Analysis Chris Kauffman CS 499: Spring 2016 GMU Logistics Today Final details of HW2 interviews HW2 timings HW2 Questions Parallel Performance Theory Special Office Hours Mon

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

C.DARWIN ( )

C.DARWIN ( ) C.DARWIN (1809-1882) LAMARCK Each evolutionary lineage has evolved, transforming itself, from a ancestor appeared by spontaneous generation DARWIN All organisms are historically interconnected. Their relationships

More information

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University.

Mixture Models in Phylogenetic Inference. Mark Pagel and Andrew Meade Reading University. Mixture Models in Phylogenetic Inference Mark Pagel and Andrew Meade Reading University m.pagel@rdg.ac.uk Mixture models in phylogenetic inference!some background statistics relevant to phylogenetic inference!mixture

More information

Inferring Molecular Phylogeny

Inferring Molecular Phylogeny Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction

More information

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations

Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations September 18, 2012 Extending Parallel Scalability of LAMMPS and Multiscale Reactive Molecular Simulations Yuxing Peng, Chris Knight, Philip Blood, Lonnie Crosby, and Gregory A. Voth Outline Proton Solvation

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Consensus methods. Strict consensus methods

Consensus methods. Strict consensus methods Consensus methods A consensus tree is a summary of the agreement among a set of fundamental trees There are many consensus methods that differ in: 1. the kind of agreement 2. the level of agreement Consensus

More information

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples

Data Structures. Outline. Introduction. Andres Mendez-Vazquez. December 3, Data Manipulation Examples Data Structures Introduction Andres Mendez-Vazquez December 3, 2015 1 / 53 Outline 1 What the Course is About? Data Manipulation Examples 2 What is a Good Algorithm? Sorting Example A Naive Algorithm Counting

More information

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria

1 Overview. 2 Adapting to computing system evolution. 11 th European LS-DYNA Conference 2017, Salzburg, Austria 1 Overview Improving LSTC s Multifrontal Linear Solver Roger Grimes 3, Robert Lucas 3, Nick Meng 2, Francois-Henry Rouet 3, Clement Weisbecker 3, and Ting-Ting Zhu 1 1 Cray Incorporated 2 Intel Corporation

More information