Lecture 8 Multiple Alignment and Phylogeny

Similar documents
Phylogenetic analyses. Kirsi Kostamo


CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BINF6201/8201. Molecular phylogenetic methods

Introduction to Bioinformatics Introduction to Bioinformatics

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Copyright 2000 N. AYDIN. All rights reserved. 1

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Phylogenetic inference

Evolutionary Tree Analysis. Overview

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Theory of Evolution Charles Darwin

Phylogeny: building the tree of life

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Introduction to Bioinformatics Online Course: IBT

Phylogenetic Tree Reconstruction

Dr. Amira A. AL-Hosary

Algorithms in Bioinformatics

Phylogenetic trees 07/10/13

EVOLUTIONARY DISTANCES

Cladistics and Bioinformatics Questions 2013

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Comparative Genomics II

Constructing Evolutionary/Phylogenetic Trees

Introduction to Bioinformatics

Pairwise & Multiple sequence alignments

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Multiple Sequence Alignment

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Evolutionary Models. Evolutionary Models

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

GENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.

Theory of Evolution. Charles Darwin

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

A (short) introduction to phylogenetics

Practical Bioinformatics

Quantifying sequence similarity

Effects of Gap Open and Gap Extension Penalties

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Multiple Whole Genome Alignment

Phylogeny: traditional and Bayesian approaches

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

Sequence Alignment (chapter 6)

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Alignment Algorithms. Alignment Algorithms

Cluster Analysis of Gene Expression Microarray Data. BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B Introduction to Bioinformatics April 8, 2002

Comparative Bioinformatics Midterm II Fall 2004

Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Session 5: Phylogenomics

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Gel Electrophoresis. 10/28/0310/21/2003 CAP/CGS 5991 Lecture 10Lecture 9 1

Comparative genomics: Overview & Tools + MUMmer algorithm

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

C3020 Molecular Evolution. Exercises #3: Phylogenetics

7. Tests for selection

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Using Bioinformatics to Study Evolutionary Relationships Instructions

What is Phylogenetics

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Tools and Algorithms in Bioinformatics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Processes of Evolution

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Motivating the need for optimal sequence alignments...

Graph Alignment and Biological Networks

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Algorithms for phylogeny construction

Comparing whole genomes

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

EECS730: Introduction to Bioinformatics

Multiple Alignment. Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Transcription:

Introduction to Bioinformatics for Medical Research Gideon Greenspan gdg@cs.technion.ac.il Lecture 8 Multiple Alignment and Phylogeny

Multiple Alignment & Phylogeny Multiple Alignment Scoring Complexity Feasible multiple alignment ClustalW Phylogenetic Trees Tree reconstruction Phylodendron 2

Multiple Alignment Like pairwise alignment n input sequences instead of 2 Add indels to make same length Local and global alignments Score columns in alignment Positive score for agreement Negative for disagreement or indels Seek an alignment to maximize score 3

Alignment Example Same allele in 3 or 4 GTCGTAGTCGGCTCGAC GTCTAGCGAGCGTGAT GCGAAGAGGCGAGC GCCGTCGCGTCGTAAC 1 23 2 30 3 6 4 4 Lone allele GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC 1 15 2 6 3 39 4 16 4

Motivation Determine consensus sequence Ignore variations, e.g. for genome sequencing Identify variations, e.g. for SNP mapping Build sequence families Orthologous genes, protein groups Build phylogenetic trees Infer evolutionary relationship 5

Scoring (1) Sum of pairwise alignment scores For n sequences, there are n (n-1)/2 pairs GTCGTAGTCG-GC-TCGAC GTC-TAG-CGAGCGT-GAT GC-GAAG-AG-GCG-AG-C GCCGTCG-CG-TCGTA-AC 6

Scoring (2) Sum of column scores C T G C C T C - C - G A C T C A Minus number of different values Example: 1, 2, 2, 3 Minus number not in consensus Example: 0, 1, 2, 2 Maximum profile probability Example: 1.0, 0.1, 0.06, 0.02 7

Complexity (1) Pairwise A B alignment table Cell (i,j) = score of best alignment between first i elements of A and first j elements of B Complexity: length of A length of B 3-way A B C alignment table Cell (i,j,k) = score of best alignment between first i elements of A, first j of B, first k of C Complexity: length A length B length C 8

Complexity (2) n-way S 1 S 2 S n-1 S n alignment table Cell (x 1,,x n ) = best alignment score between first x 1 elements of S 1,, x n elements of S n Complexity: length S 1 length S n Example: protein family alignment 100 proteins, 1000 amino acids each Complexity: 10 300 table cells Calculation time: beyond the big crunch! 9

Feasible Approach Based on pairwise alignment scores Build n by n table of pairwise scores Align similar sequences first After alignment, consider as single sequence Continue aligning with further sequences Complexity is n 2 l 2 100 proteins of 1000 amino acids: ~ 5,000 alignments of 10 6 cells fi ~ 5 10 9 operations 10

Example (1) 1 GTCGTAGTCGGCTCGAC 2 GTCTAGCGAGCGTGAT 3 GCGAAGAGGCGAGC 4 GCCGTCGCGTCGTAAC 2 3 4 1 7 4 5 2 2 2 3 2 11

Example (2) 1 GTCGTAGTCG-GC-TCGAC 2 GTC-TAG-CGAGCGT-GAT 3 GCGAAGAGGCGAGC 4 GCCGTCGCGTCGTAAC 12 3 3-3 4-1 2 12

Example (3) 1 GTCGTAGTCG-GC-TCGAC 2 GTC-TAG-CGAGCGT-GAT 3 GC-GAAGAGGCG-AGC 4 GCCGTCGCGTCGTAAC 1 GTCGTA-GTCG-GC-TCGAC 2 GTC-TA-G-CGAGCGT-GAT 3 G-C-GAAGA-G-GCG-AG-C 4 G-CCGTCGC-G-TCGTAA-C 13

ClustalW Input Fast alignment? Scoring matrix Input sequences Alignment format Fast alignment options Gap scoring Phylogenetic trees 14

ClustalW Output (1) Input sequences Pairwise alignment scores Building alignment Final score 15

ClustalW Output (2) Sequence names Sequence positions Match strength in decreasing order: * :. 16

Phylogenetic Trees Represent closeness between many entities In our case, genomic or protein sequences chimp Distance representation monkey Observed entity human Unobserved commonality 17

Rooting Trees A tree can be hung from a root Adds directional information Requires addition of outgroup We know this is furthest pig monkey human chimp So we hang the tree from where it joins 18

Phylogeny and Evolution Evolutionary Time Common Ancestor Speciation Number of mutations 19

Tree Reconstruction Build tree based on organism sequences Distance-based methods Use pairwise alignment scores to build tree Ignores sequences after initial alignments Character-based methods Learn a tree with intermediate sequences that minimizes total number of mutations Slower but generally better results 20

Distance-based Example (1) 1 2 3 2 3 4 1 2 7 4 2 5 2 2 3 4 21

Distance-based Example (2) 12 3 3-3 4-1 2 1 2 3 4 22

Distance-based Example (3) 12 34-4 3 4 1 2 23

Newick Tree Format (CFTR_SHEEP:0.01457, (CFTR_HUMAN:0.16153, (CFTR_MOUSE:0.70599, (CFTR_RABIT:2.76042, (CFTR_SQUAC:1.27192, CFTR_XENLA:0.28818) :3.42183) :0.77076) :0.65873) :0.73937, CFTR_BOVIN:0.00953); 24

Phylodendron Input Graphical style Newick tree description Orientation Tree size 25