Biosequence Alignment 徐鹰佐治亚大学生化系 吉林大学计算机学院

Similar documents
Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Sequence analysis and Genomics

Pairwise & Multiple sequence alignments

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Sequence analysis and comparison

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Basic Local Alignment Search Tool

BLAST. Varieties of BLAST

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Bioinformatics and BLAST

Algorithms in Bioinformatics

Sequence Alignment (chapter 6)

An Introduction to Sequence Similarity ( Homology ) Searching

Quantifying sequence similarity

In-Depth Assessment of Local Sequence Alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Single alignment: Substitution Matrix. 16 march 2017

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Large-Scale Genomic Surveys

Network Alignment 858L

Introduction to Bioinformatics

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

BINF 730. DNA Sequence Alignment Why?

bioinformatics 1 -- lecture 7

Computational Biology

1.5 Sequence alignment

BIOINFORMATICS: An Introduction

Introduction to Bioinformatics

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Chapter 7: Rapid alignment methods: FASTA and BLAST

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Bioinformatics. Dept. of Computational Biology & Bioinformatics

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Week 10: Homology Modelling (II) - HHpred

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Biol478/ August

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Evidence of Evolution

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Introduction to protein alignments

Introduction to Bioinformatics Introduction to Bioinformatics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Phylogenetic inference

Practical considerations of working with sequencing data

Local Alignment Statistics

Genomics and bioinformatics summary. Finding genes -- computer searches

Sequence Analysis '17 -- lecture 7

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Tools and Algorithms in Bioinformatics

Bio nformatics. Lecture 3. Saad Mneimneh

BLAST: Basic Local Alignment Search Tool

Computational methods for predicting protein-protein interactions

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Sequence Database Search Techniques I: Blast and PatternHunter tools

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Practical Bioinformatics

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

EECS730: Introduction to Bioinformatics

Bioinformatics Chapter 1. Introduction

Introduction to Bioinformatics

Hidden Markov Models

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Pairwise sequence alignment

EECS730: Introduction to Bioinformatics

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Sequencing alignment Ameer Effat M. Elfarash

Evidence of Species Change

EECS730: Introduction to Bioinformatics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Motivating the need for optimal sequence alignments...

NUMB3RS Activity: DNA Sequence Alignment. Episode: Guns and Roses

Tools and Algorithms in Bioinformatics

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Sequences, Structures, and Gene Regulatory Networks

Life and Information Dr. Heinz Lycklama

Moreover, the circular logic

BLAST: Target frequencies and information content Dannie Durand

Dr. Amira A. AL-Hosary

Tutorial 4 Substitution matrices and PSI-BLAST

Example of Function Prediction

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

BIOINFORMATICS LAB AP BIOLOGY

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Bioinformatics Exercises

Alignment & BLAST. By: Hadi Mozafari KUMS

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Transcription:

Biosequence Alignment 徐鹰佐治亚大学生化系 吉林大学计算机学院

Bio sequences Sequences could be DNA, protein and RNA sequences DNA sequence (consisting of 4 letters: A, C, G, T) Ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtg RNA sequence (consisting of 4 letters: A, C, G, U) Protein sequence (consisting of 20 letters, A, C, D,., Y) 2

Central Dogma of Biology 3

Sequence Homology Genes have evolved from a common ancestor generally have sequence level similarity, i.e., similar sequences Similar sequences tend to have similar biological functions. Through sequence comparison, one can infer if two sequences may have the same or related functions. 4

Sequence Homology Through multiple sequence alignment, one can possibly derive the functional sites of a sequence In biology, only useful things will be preserved. 5

Sequence Homology 6

Bio Sequence Comparison DNA sequence alignment aligning two DNA sequences to maximize their similarity AACG Example 1: AACG and AACG Example 2: AAGG and AACG AACG AAGG AACG 1 mismatch Example 3: AACGGTATGC and ATCGGGTTGC AACG -GT ATGC ATCG GGT -TGC 2 gaps and 1 mismatch 7

Bio Sequence Comparison Best alignment to align two sequences using the smallest number of mismatches and gaps Score: each aligned position: +2; each mismatch/ gap: 1 AACG AACG AAGG AACG AACG-GTATGC ATCGGGT-TGC score = 8 score = 5 score = 13 8

Bio Sequence Comparison Protein sequence alignment: it is more complex to measure protein sequence similarity than DNA sequences DNA sequence alignment: match or mismatch/gap Protein sequence alignment: degree of similarity There are twenty types of amino acids; each pair of amino acids have a similarity score, which varies for different amino acids Example: (A, A) = 4; (R, R) = 5; (A, R) = 1; (C, A) = 0; 9

Bio Sequence Comparison Blosum matrix A R N D C Q E G H I L K M F P S T W Y V 4-1 5-2 0 6-2 -2 1 6 0-3 -3-3 9-1 1 0 0-3 5-1 0 0 2-4 2 5 0-2 0-1 -3-2 -2 6-2 0 1-1 -3 0 0-2 8-1 -3-3 -3-1 -3-3 -4-3 4-1 -2-3 -4-1 -2-3 -4-3 2 4-1 2 0-1 -3 1 1-2 -1-3 -2 5-1 -1-2 -3-1 0-2 -3-2 1 2-1 5-2 -3-3 -3-2 -3-3 -3-1 0 0-3 0 6-1 -2-2 -1-3 -1-1 -2-2 -3-3 -1-2 -4 7 1-1 1 0-1 0 0 0-1 -2-2 0-1 -2-1 4 0-1 0-1 -1-1 -1-2 -2-1 -1-1 -1-2 -1 1 5-3 -3-4 -4-2 -2-3 -2-2 -3-2 -3-1 1-4 -3-2 11-2 -2-2 -3-2 -1-2 -3 2-1 -1-2 -1 3-3 -2-2 2 7 0-3 -3-3 -1-2 -2-3 -3 3 1-2 1-1 -2-2 0-3 -1 4 A R N D C Q E G H I L K M F P S T W 10Y V

Bio Sequence Comparison Aligning protein sequences: (gap = 5) FDSKTHRGHR and FESYWTHGHR FDSK-THRGHR :.: :: ::: FESYWTH-GHR Score: 6+2+4-2-5+5+8-5+6+5+5 = 29 FDSKTHRGHR - - FESYWTHWHR Score: -5-3+0+0-2-2-1-5-2-2+0-5 = -27 Amino acids with similar physiochemical properties have higher similarity scores among them 11

Computing Sequence Alignment Two sequences: AACG and AAGG Step #1: calculating alignment matrix A A G A A C G 2 1-3 1-3 -4 4 3 2 3 3 5 AAGG AACG Rule: 1: initialization fill the first row and column with matching scores plus gap penalty 2: fill an empty cell based on scores of its left, upper and upperleft neighbors + the matching score of the current cell 3: chose the one giving the highest score G -4 2 2 5 12

Computing Sequence Alignment Step #2: Tracing back to recover the alignment A A G A A C G 2 1 0 1 0-1 4 3 2 3 3 5 Rule: 1: start from the rightlower corner 2: trace back to left, upper or upper-left neighbor which gives the current cell s score 3. Keep doing this until it cannot continue G -1 2 2 5 13

14

15

Sequence Alignment Algorithm Algorithmically the sequence alignment problem can be solved using a dynamic programming method 16

Dynamic Programming

ace Back for Solution Recovery

nterpreting Sequence Alignments oes higher sequence alignment score always mean better equence alignment? equence alignment scores depend not only on the quality f an alignment but also on sequence length and ompositions o we need to get rid of the background information to erive the true quality of a sequence alignment

Interpreting Sequence Alignments ery sequence: AAAA abase #1: AATTAATACATTAATATAATAAAATTACTGA abase #2: CGGTAGTACGTAGTGTTTAGTAGCTATGAA hich of these two sequences will have better chance to ve a good match with the query sequence after randomly

terpreting Sequence Alignments E-value ne way to assess the true uality of a particular lignment is to derive the ackground alignmentcore distribution of similar equences with the same letter composition.

equence Alignment Programs

omology Search by BLAST

omology Search by BLAST

Take Home Message equence comparison provides a powerful tool for erivation of homologous genes, and hence functional and tructural information 60% of the genes in a newly sequenced genomes have omologues among well annotated genes onserved sequence segments across multiple omologous genes suggest functional sites