Molecular evolution 2. Please sit in row K or forward

Similar documents
EVOLUTIONARY DISTANCES

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Understanding relationship between homologous sequences

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Dr. Amira A. AL-Hosary

Lecture 11 Friday, October 21, 2011

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

What Is Conservation?

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogenetic Tree Reconstruction

8/23/2014. Phylogeny and the Tree of Life

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Evolutionary Models. Evolutionary Models

Phylogenetics. BIOL 7711 Computational Bioscience

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Introduction to Bioinformatics Introduction to Bioinformatics

Reading for Lecture 13 Release v10


Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

Algorithms in Bioinformatics

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Processes of Evolution

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Chapter 7: Models of discrete character evolution

Stochastic processes and

BINF6201/8201. Molecular phylogenetic methods

Lecture 6 Phylogenetic Inference

Bioinformatics course

Phylogeny and the Tree of Life

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

Cladistics and Bioinformatics Questions 2013

Stochastic Errors vs. Modeling Errors in Distance Based Phylogenetic Reconstructions

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Molecular Evolution and Phylogenetic Tree Reconstruction

C.DARWIN ( )

31/10/2012. Human Evolution. Cytochrome c DNA tree

Constructing Evolutionary/Phylogenetic Trees

Evolution by duplication

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Phylogeny. November 7, 2017

Organizing Life s Diversity

Phylogenetics: Building Phylogenetic Trees

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

Phylogeny and the Tree of Life

CS5263 Bioinformatics. Guest Lecture Part II Phylogenetics

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Chapter 16: Reconstructing and Using Phylogenies

Tree Building Activity

FUNDAMENTALS OF MOLECULAR EVOLUTION

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Consensus methods. Strict consensus methods

Phylogenetic inference

Multiple Sequence Alignment. Sequences

Concepts and Methods in Molecular Divergence Time Estimation

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Lecture 8 Multiple Alignment and Phylogeny

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Evolutionary Theory and Principles of Phylogenetics. Lucy Skrabanek ICB, WMC March 19, 2008

CLADOGRAMS & GENETIC PHYLOGENIES

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Genomes and Their Evolution

Modeling Evolution DAVID EPSTEIN CELEBRATION. John Milnor. Warwick University, July 14, Stony Brook University

Constructing Evolutionary Trees

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

A (short) introduction to phylogenetics

Molecular Evolution & Phylogenetics Traits, phylogenies, evolutionary models and divergence time between sequences

The Phylogenetic Handbook

Phylogeny Tree Algorithms

CHAPTER : Prokaryotic Genetics

7. Tests for selection

SUPPLEMENTARY METHODS

Molecular Evolution, course # Final Exam, May 3, 2006

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Quantifying sequence similarity

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Maximum Likelihood Until recently the newest method. Popularized by Joseph Felsenstein, Seattle, Washington.

BIOINFORMATICS LAB AP BIOLOGY

Phylogeny and the Tree of Life

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

BIOINFORMATICS TRIAL EXAMINATION MASTERS KT-OR

How to read and make phylogenetic trees Zuzana Starostová

Transcription:

Molecular evolution 2 Please sit in row K or forward

RBFD: cat, mouse, parasite Toxoplamsa gondii cyst in a mouse brain http://phenomena.nationalgeographic.com/2013/04/26/mind-bending-parasite-permanently-quells-cat-fear-in-mice/ Credit: Jitinder P. Dubey https://commons.wikimedia.org/wiki/file:kittyply_edit1.jpg https://commons.wikimedia.org/wiki/file:мышь_2.jpg http://commons.wikimedia.org/wiki/file:phylogenetic_tree_of_life.png

Topics for the next few days The HIV genome in context Phylogenetic trees of hosts vs. pathogens: introducing the HIV/SIV case Phylogenetic reconstruction methods Constructing a distance matrix between sequences What is a sequence alignment? The Jukes-Cantor correction The neighbor-joining algorithm to make a tree from distances

The HIV genome in context Human E. coli Mimivirus M. genitalium HIV LINE transposon SINE transposon ~20,000 ~4000 ~1000 ~500 9 1 0 Variation in protein coding gene count

HIV (entire genome) E. coli (~25 thousand bp on the E. coli chromosome) Human (~1 million bp on human chromosome 4) https://commons.wikimedia.org/wiki/file:hiv-genome.png

Topics for the next few days The HIV genome in context Phylogenetic trees of hosts vs. pathogens: introducing the HIV/SIV case Phylogenetic reconstruction methods Constructing a distance matrix between sequences What is a sequence alignment? The Jukes-Cantor correction The neighbor-joining algorithm to make a tree from distances

HIV and SIV https://www.flickr.com/photos/23993953@n04/13079389505 http://www.evoanth.net/2015/05/23/what-does-a-chimp-look-for-in-a-tool/ http://pin.primate.wisc.edu/factsheets/entry/sooty_mangabey https://www.flickr.com/photos/berniedup/7692054594

Question: If we made a tree for the HIV/SIV sequences infecting these primates, how would it compare with this host tree? Gabon talapoin Sooty mangabey Drill Chimp Human Phylogeny from Perelman et al. molecular phylogeny of living primates 2011.

The data Strain and host HIV1_human_a HIV1_human_b HIV1_human_c HIV1_human_d HIV1_human_e SIV_chimp_a SIV_chimp_b HIV2_human_a HIV2_human_b HIV2_human_c SIV_sootyMangabey_a SIV_sootyMangabey_b SIV_drill SIV_gabonTalapoin HTLV-1 Sequence TTTTTTGGGTTTGGC... TTTTTTGGGGTTTGGC... TTTTTTGGCTCTGGC... TTTTTTGGGGGCTGGT... TTTTTTGGGGTCTGGC... TTTTTTGGGCGCCCC... TTTTTTGGGGGGCTGGC... TTTTTTGGGTGGGCTCC... TTTTTTGGGTTGGCCCT... TTTTTTGGGTTTGGCCCT... TTTTTTGGTTTGGCCCT... TTTTTTGGTTTGGTCCTT... TTTTTTGGGTCTCCCT... TTTTTTGGGGTCTTTTT... TGCGCTGGCCCTTCCT...

Representing nucleic acid molecules on a computer UUUUUUGGGGUCUGGCCUUCCUCGGG By convention, we represent as a single string going 5' to 3.

Representing nucleic acid molecules on a computer 5' TTTTTTGGGGTCTGGCCTTCCTCGGG 3' 3' TCCCTTCTGCCGGGGTGTTCCCTT 5' TTTTTTGGGGTCTGGCCTTCCTCGGG or Either of these ok Called reverse complements TTCCCTTGTGGGGCCGTCTTCCCT

Topics for the next few days The HIV genome in context Phylogenetic trees of hosts vs. pathogens: introducing the HIV/SIV case Phylogenetic reconstruction methods Constructing a distance matrix between sequences What is a sequence alignment? The Jukes-Cantor correction The neighbor-joining algorithm to make a tree from distances

Trees and distances: often substitutions accumulate (roughly) proportional to time B C D time

What is a sequence alignment? GTCGGT GTCGGT GTCCGCT GTCCGCT Our goal: to obtain distances between sequences by estimating the number of substitutions

What is a sequence alignment? GTCGGT GTCGGT GTCCGCT lignment process GTCCGCT G-T--CGGT GTCCGCT

Topics for the next few days The HIV genome in context Phylogenetic trees of hosts vs. pathogens: introducing the HIV/SIV case Phylogenetic reconstruction methods Constructing a distance matrix between sequences What is a sequence alignment? The Jukes-Cantor correction The neighbor-joining algorithm to make a tree from distances

Distances from alignments: estimating the number of substitutions Partial alignment from the gag gene: SIV_deBrazzaMonkey: TTTCTGGGTT HIV2_human_a: TGT------GCGGT Ignore sites with a gap character. How many substitutions occurred between these two sequences since their last common ancestor? Number of sequence differences at non-gap sites: 6 Does this mean there were 6 substitutions?

Number of substitutions vs. number of observed differences C T G T T G C G T T G G G T T T G T

Number of substitutions vs. number of observed differences T T G T TT GT Number of observed differences (3) is less than true number of substitutions (5).

Correcting for multiple hits with the probabilistic Jukes-Cantor model C Model a single nucleotide position G T Series of discrete time steps

Correcting for multiple hits with the probabilistic Jukes-Cantor model Model a single nucleotide position P (0) = 1 P (1) = G C T Series of discrete time steps

Correcting for multiple hits with the probabilistic Jukes-Cantor model Model a single nucleotide position P (0) = 1 P (1) = 1 3a G C T Series of discrete time steps

Two ways we can have nucleotide n at time t+1: 1. Nucleotide present: Time: n n t t+1 stays same 2. Nucleotide present: Time: not n n t t+1 changes to n Write an expression for P n (t+1) in terms of P n (t) and : P n (t +1) =

Two ways we can have nucleotide n at time t+1: 1. Nucleotide present: Time: n n t t+1 stays same 2. Nucleotide present: Time: not n n t t+1 changes to n Write an expression for P n (t+1) in terms of P n (t) and : P n (t +1) = (1 3)P n (t)+ [ 1 P n (t)]

Expression for the change in probability of nucleotide n, arising over one time step, from time t to time t+1.

Expression for the change in probability of nucleotide n, arising over one time step, from time t to time t+1. ΔP n (t) = P n (t +1) P n (t) ΔP n (t) = P n (t) 3P n (t)+ P n (t) P n (t) ΔP n (t) = 4P n (t)+

P n (0) P n (t) dp n (t) dt Math 45 = 4P n (t)+ P n (t) = 1 4 + " P (0) 1 % $ n # 4 & 'e 4t Probability of j at time t given j at time 0.! "" # = 1 4 + 3 4 )*+,- Probability of k at time t given j at time 0.! ". # = 1 4 1 4 )*+,-

Worksheet (Rip it off from the back of your packet) Name:! "" # = 1 4 + 3 4 )*+,-! ". # = 1 4 1 4 )*+,- 1. If the nucleotide is C at time 0, ie P C (0)=1, what is the probability is is C after a long time? 2. If the nucleotide is C at time 0, ie P C (0)=1, what is the probability it is G after a long time? 3. What are the equilibrium nucleotide frequencies we would expect as a result of this process? In other words, if we had a long sequence, and this process was happening at every position, what frequency of s, Cs, Gs and Ts would we expect after a long time?

Worksheet (Rip it off from the back of your packet) Name:! "" # = 1 4 + 3 4 )*+,-! ". # = 1 4 1 4 )*+,- 1. If the nucleotide is C at time 0, ie P C (0)=1, what is the probability is is C after a long time? 0.25 2. If the nucleotide is C at time 0, ie P C (0)=1, what is the probability it is G after a long time? 0.25 3. What are the equilibrium nucleotide frequencies we would expect as a result of this process? In other words, if we had a long sequence, and this process was happening at every position, what frequency of s, Cs, Gs and Ts would we expect after a long time? 0.25 each

Consider a single nucleotide position ancestor descendant strain 1 descendant strain 2 The probability strain 1 and strain 2 have the same nucleotide at this position I(t) = P 2 (t)+ P2 C (t)+ P2 G (t)+ P2 T (t) Express I(t) in terms of and t. I(t) =

Consider a single nucleotide position ancestor descendant strain 1 descendant strain 2 The probability strain 1 and strain 2 have the same nucleotide at this position I(t) = P 2 (t)+ P2 C (t)+ P2 G (t)+ P2 T (t) Express I(t) in terms of and t. " I(t) = 1 4 + 3 % $ # 4 e 4t ' & 2 " + 3 1 4 1 % $ # 4 e 4t ' & 2 *Note that this would still be the same if we had imagined the ancestral nucleotide was C, G or T.

" I(t) = 1 4 + 3 % $ # 4 e 4t ' & 2 " + 3 1 4 1 % $ # 4 e 4t ' & 2 I(t) = 1 16 + 6 16 e 4t + 9 16 e 8t + 3 16 6 16 e 4t + 3 16 e 8t I(t) = 4 16 + 12 16 e 8t I(t) = 1 4 + 3 4 e 8t

Can measure from alignments. 1. Probability two nucleotides are different. p =1 I(t) = 3 4 3 4 e 8t = 3 4 ( 1 e 8t ) 2. Probability of a substitution per site per unit time: 3 Expected number of substitutions per site in one lineage: 3t Expected number of substitutions per site separating the two strains: K = 6t

4 3 p =1 e 8t e 8t =1 4 3 p " 8t = ln$ 1 4 # 3 p % ' &! = 3 4 ln 1 4 3 )

Using the Jukes-Cantor correction Partial alignment from the gag gene: SIV_deBrazzaMonkey: TTTCTGGGTT HIV2_human_a: TGT------GCGGT! = 6 13 = 0.462 Proportion of sites that are different (consider only nongap sites) * = 3 4 ln 1 4 3! = 0.717 Estimated substitutions per site separating the two strains.

Topics for the next few days The HIV genome in context Phylogenetic trees of hosts vs. pathogens: introducing the HIV/SIV case Phylogenetic reconstruction methods Constructing a distance matrix between sequences What is a sequence alignment? The Jukes-Cantor correction The neighbor-joining algorithm to make a tree from distances

Hand in your worksheet please! (and be sure you put your full name on it)