Multiple Sequence Alignment. Progressive Alignment Iterative Pairwise Guide Tree ClustalW Co-linearity Multiple Sequence Alignment Editors

Similar documents
Multiple Sequence Alignment

Clustering Methods without Given Number of Clusters

Exercise set 1 Solution

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

Advanced Digital Signal Processing. Stationary/nonstationary signals. Time-Frequency Analysis... Some nonstationary signals. Time-Frequency Analysis

Codes Correcting Two Deletions

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model

EE 4443/5329. LAB 3: Control of Industrial Systems. Simulation and Hardware Control (PID Design) The Inverted Pendulum. (ECP Systems-Model: 505)

The Dynamics of Learning Vector Quantization

Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

III.9. THE HYSTERESIS CYCLE OF FERROELECTRIC SUBSTANCES

Factor Analysis with Poisson Output

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

A statistical analysis of the Evolutionary Trace method. Heidi Spratt

New bounds for Morse clusters

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors

Learning Multiplicative Interactions

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

One Class of Splitting Iterative Schemes

Lecture 9: Shor s Algorithm

Preemptive scheduling on a small number of hierarchical machines

arxiv: v1 [math.mg] 25 Aug 2011

Week 3 Statistics for bioinformatics and escience

CHAPTER 10 CHEMICAL BONDING II: MOLECULAR GEOMETRY AND HYBRIDIZATION OF ATOMIC ORBITALS

Supplementary information

Digital Control System

Multi-dimensional Fuzzy Euler Approximation

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

EXTENDED STABILITY MARGINS ON CONTROLLER DESIGN FOR NONLINEAR INPUT DELAY SYSTEMS. Otto J. Roesch, Hubert Roth, Asif Iqbal

Application of NMR Techniques to the Structural Determination of Caryophyllene Oxide (Unknown #92)

Design By Emulation (Indirect Method)

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Sampling and the Discrete Fourier Transform

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

State Space: Observer Design Lecture 11

A SIMPLE NASH-MOSER IMPLICIT FUNCTION THEOREM IN WEIGHTED BANACH SPACES. Sanghyun Cho

Social Studies 201 Notes for March 18, 2005

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

THE IDENTIFICATION OF THE OPERATING REGIMES OF THE CONTROLLERS BY THE HELP OF THE PHASE TRAJECTORY

arxiv: v2 [nucl-th] 3 May 2018

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas)

The machines in the exercise work as follows:

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

Online supplementary information

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

Hylleraas wavefunction for He. dv 2. ,! r 2. )dv 1. in the trial function. A simple trial function that does include r 12. is ) f (r 12.

Innovational Approach to the Mathematics Teaching at the Technical Universities

On the Isomorphism of Fractional Factorial Designs 1

Control Systems Analysis and Design by the Root-Locus Method

PHYSICS LAB Experiment 5 Fall 2004 FRICTION

A A Game Theoretic Model for the Formation of Navigable Small-World Networks the Tradeoff between Distance and Reciprocity

Improved multi-level pedestrian behavior prediction based on matching with classified motion patterns

Moment of Inertia of an Equilateral Triangle with Pivot at one Vertex

Approximating discrete probability distributions with Bayesian networks

Sequence analysis and comparison

12th International Congress on the Deterioration and Conservation of Stone Columbia University, New York, 2012

Nonlinear Single-Particle Dynamics in High Energy Accelerators

Copyright 2000 N. AYDIN. All rights reserved. 1

AN ADAPTIVE SIGNAL SEARCH ALGORITHM IN GPS RECEIVER

Lecture 23 Date:

Balanced Network Flows

The Use of MDL to Select among Computational Models of Cognition

online learning Unit Workbook 4 RLC Transients

Yoram Gat. Technical report No. 548, March Abstract. A classier is said to have good generalization ability if it performs on

A Bluffer s Guide to... Sphericity

Module: 8 Lecture: 1

An estimation approach for autotuning of event-based PI control systems

Behavioral thermal modeling for quad-core microprocessors

Rarefaction Example. Consider this dataset: Original matrix:

Advanced D-Partitioning Analysis and its Comparison with the Kharitonov s Theorem Assessment

Counting Parameterized Border Arrays for a Binary Alphabet

Alternate Dispersion Measures in Replicated Factorial Experiments

3.1 The Revised Simplex Algorithm. 3 Computational considerations. Thus, we work with the following tableau. Basic observations = CARRY. ... m.

RELIABILITY OF REPAIRABLE k out of n: F SYSTEM HAVING DISCRETE REPAIR AND FAILURE TIMES DISTRIBUTIONS

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

Connectivity in large mobile ad-hoc networks

Optimal Coordination of Samples in Business Surveys

APPLICATION OF THE SINGLE IMPACT MICROINDENTATION FOR NON- DESTRUCTIVE TESTING OF THE FRACTURE TOUGHNESS OF NONMETALLIC AND POLYMERIC MATERIALS

Symmetry Lecture 9. 1 Gellmann-Nishijima relation

The simplex method is strongly polynomial for deterministic Markov decision processes

Given the following circuit with unknown initial capacitor voltage v(0): X(s) Immediately, we know that the transfer function H(s) is

Advanced Algorithms and Models for Computational Biology -- a machine learning approach. Some important dates in history (billions of years ago)

Second order conditions for k means clustering: partitions vs. centroids

On Optimal Non-Overlapping Segmentation and Solutions of Three-Dimensional Linear Programming Problems through the Super Convergent Line Series

Performance-driven Design of Engine Control Tasks

3.2. The Ramsey model - linearization, backward integration, relaxation (complete)

Fair Game Review. Chapter 7 A B C D E Name Date. Complete the number sentence with <, >, or =

Improving the Efficiency of a Digital Filtering Scheme for Diabatic Initialization

2/1/2016. Species Abundance Curves Plot of rank abundance (x-axis) vs abundance or P i (yaxis).

MODELLING OF FRICTIONAL SOIL DAMPING IN FINITE ELEMENT ANALYSIS

A Text-based HMM of Foreign Affair Sentiment

V = 4 3 πr3. d dt V = d ( 4 dv dt. = 4 3 π d dt r3 dv π 3r2 dv. dt = 4πr 2 dr

SOME RESULTS ON INFINITE POWER TOWERS

c 2017 SIAM. Published by SIAM under the terms

Avoiding Forbidden Submatrices by Row Deletions

A Luenberger Soil Quality Indicator

Social Studies 201 Notes for November 14, 2003

Transcription:

Yverdon Le Bain Introduction to Bioinformatic Introduction to Bioinformatic ami Khuri Department of Computer cience an Joé tate Univerity an Joé, California, UA khuri@c.ju.edu www.c.ju.edu/faculty/khuri @2010 ami Khuri Multiple equence Alignment Progreive Alignment Iterative Pairwie Guide Tree ClutalW Colinearity Multiple equence Alignment Editor 4.1

Yverdon Le Bain Introduction to Bioinformatic * 20 * 40 * 60 * 80 Wombat : AAAGTTAATGAGTGGTTATCCAGAAGTAGTGACATTTTAGCCTCTGATAACTCCAACGGTAGGAGCCATGAGCAGAGCGCAGA : 83 Opoum : AAAGTTAATGAGTGGTTATTCAGAAGTAATGACGTTTTAGCCCCAGATTACTCAAGTGTTAGGAGCCATGAACAGAATGCAGA : 83 Armadillo : AAAGTTAACGAGTGGTTTTCCAGAGGTGATGACATATTAACTTCTGATGACTCACACGATAGGGGGTCTGAATTAAATGCAGA : 83 loth : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGACATACTAACTTCTGATGACTCACACAATGGGGGGTCTGAATCAAATGCAGA : 83 Dugong : AAAGTTAATGAGTGGTTTTTCAGAAGTGATGGCCTGGATGACTTGCATGATAAGGGGTCTGAGTCAAATGCAGA : 74 Hyrax : AAAGTTAATGAGTGGTTTTCCAGAAGTGACAACCTAAGTGATTCACCTAGTGAGGGGTCTGAATTAAATGGAAA : 74 Aardvark : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGGCCTGGATGGCTCACATGATGAAGGGTCTGAATCAAATGCAGA : 74 Tenrec : AAGGTTAACGAGTGGTTTTCCAAAAGCCACGGCCTGGGTGACTCTCGCGATGGGCGGCCTGAGTCAGGCGCAGA : 74 Rhinocero : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATATTAACTTCTGATGACTCACATGATGGGGGGCCTGAATCAAATACTGA : 83 Pig : AAAGTTAATGAGTGGTTTTCTAGAAGCGATGAAATGTTAACTTCTGACGACTCACAGGACAGGAGGTCTGAATCAAATACTGG : 83 Hedgehog : AAAGTGAATGAATGGCTTTCCAGAAGTGATGAACTGTTAACTTCTGATGACTCATATGATAAGGGATCTAAATCAAAAACTGA : 83 Human : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGAGTCTGAATCAAATGCCAA : 83 Rat : AAAGTGAATGAGTGGTTTTCCAGAACTGGTGAAATGTTAACTTCTGACAATGCATCTGACAGGAGGCCTGCGTCAAATGCAGA : 83 Hare : AAAGTTAACGAGTGGTTCTCCAGAAGTAATGAAATGTTAACTCCTGATGACTCACTTGACCGGCGGTCTGAATCAAATGCCAA : 83 AAaGTtAAtGAgTGGtTtTccAgAagt atga T gatgactca gat g gg ctga t aaatgc ga * 100 * 120 * 140 * Wombat : GGTGCCTAGTGCCTTAGAAGATGGGCATCCAGATACCGCAGAGGGAAATTCTAGCGTTTCTGAGAAGACTGAC : 156 Opoum : GGCAACCAATGCTTTAGAATATGGGCATGTAGAGACAGATGGAAATTCTAGCATTTCTGAAAAGACTGAT : 153 Armadillo : AGTAGCTGGTGCATTGAAAGTTTCAAAAGAAGTAGATGAATATTCTAGTTTTTCAGAGAAGATAGAC : 150 loth : AGTAGTTGGTGCATTGAAAGTTCCAAATGAAGTAGATGGATATTCTGGTTCTTCAGAGAAGATAGAC : 150 Dugong : AGTAGCTGGTGCTTTAGAAGTTCCAGAAGAAGTACATGGATATTCTAGTTCTTCAGAGAAAATAGAC : 141 Hyrax : AGTGGCTGGTCCAGTAAAACTTCCAGGTGAAGTACATAGATATTCTAGTTTTCCAGAGAACATAGAT : 141 Aardvark : AATAGGTGGTGCATTAGAAGTTTCAAATGAAGTACATAGTTACTCTGGTTCTTCAGAGAAAATAGAC : 141 Tenrec : CGTAGCTGTAGCCTTCGAAGTTCCAGACGAAGCATGTGAATCTTATAGTTCTCCAGAGAAAACAGAC : 141 Rhinocero : AGTAGCTGGTGCAGTAGAAGTTCAAAATGAAGTAGATGGATATTCTGGTTCTTCAGAGAAAATAGGC : 150 Pig : GGTAGCTGGTGCAGCAGAGGTTCCAAATGAAGCAGATGGACATTTGGGTTCTTCAGAGAAAATAGAC : 150 Hedgehog : AGTAACTGTAACAACAGAAGTTCCAAATGCAATAGATAGRTTTTTTGGTTCTTCAGAGAAAATAAAC : 150 Human : AGTAGCTGATGTATTGGACGTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGAC : 150 Rat : AGCTGCTGTTGTGTTAGAAGTTTCAAATGAAGTGGATGGATGTTTCAGTTCTTCAAAGAAAATAGAC : 150 Hare : AGTGGCTGGTGCATTAGAAGTCCCAAAGGAGGTAGATGGATATTCTGGTTCTACAGAGAAAATAGAC : 150 gt gctg tgc t gaagtt ca a gaag a atggatatt t Gtt TtCagAgAA Atagac Part of the alignment of the DA equence of the BRCA1 gene From Bioinformatic and Molecular Evolution by Paul Higg and Terea Attwood Aligning BRCA1 equence * * * * * Wombat : KVEWLRDILADGRHEQAEVPALEDGHPDTAEGVEKTD : 52 Opoum : KVEWLFRDVLAPDYVRHEQAEATALEYGHVETDGIEKTD : 51 Armadillo : KVEWFRGDDILTDDHDRGELAEVAGALKVKEVDEYFEKID : 50 loth : KVEWFRDDILTDDHGGEAEVVGALKVPEVDGYGEKID : 50 Dugong : KVEWFFRDGLDDLHDKGEAEVAGALEVPEEVHGYEKID : 47 Hyrax : KVEWFRDLDPEGELGKVAGPVKLPGEVHRYFPEID : 47 Aardvark : KVEWFRDGLDGHDEGEAEIGGALEVEVHYGEKID : 47 Tenrec : KVEWFKHGLGDRDGRPEGADVAVAFEVPDEACEYPEKTD : 47 Rhinocero : KVEWFRDEILTDDHDGGPETEVAGAVEVQEVDGYGEKIG : 50 Pig : KVEWFRDEMLTDDQDRRETGVAGAAEVPEADGHLGEKID : 50 Hedgehog : KVEWLRDELLTDDYDKGKKTEVTVTTEVPAIDXFFGEKI : 50 Human : KVEWFRDELLGDDHDGEEAKVADVLDVLEVDEYGEKID : 50 Rat : KVEWFRTGEMLTDADRRPAAEAAVVLEVEVDGCFKKID : 50 Hare : KVEWFREMLTPDDLDRREAKVAGALEVPKEVDGYGTEKID : 50 KVEWf4 6 d e n e eki Alignment of BRCA1 protein equence for the ame region on the gene From Bioinformatic and Molecular Evolution by Paul Higg and Terea Attwood 4.2

Yverdon Le Bain Introduction to Bioinformatic What i Multiple Alignment Mot imple extenion of pairwie alignment Given: et of equence Match matrix Gap penaltie Find: Alignment of equence uch that an optimal core i achieved. Ue of Multiple Alignment A good alignment i critical for further analyi Determine the relationhip between a group of equence Determine the conerved region Evolutionary Analyi Determine the phylogenetic relationhip and evolution tructural Analyi Determine the overall tructure of the protein 4.3

4.4 Yverdon Le Bain Introduction to Bioinformatic Alignment Difficultie We cannot ue equence comparion algorithm (dynamic programming) and jut add more equence. With each equence, another dimenion i added that we need to earch for the optimal alignment. Add 1 equence Hyperlattice Computation A V Ā V V A A V A = A V A V V A V A A V A V A V max A V k=3 2 k 1=7

Yverdon Le Bain Introduction to Bioinformatic MA: Exact v. Heuritic The exact algorithm travere the entire earch pace find overall meaure of alignment quality and trie to maximize thi quality. The operation i computationally intenive. The larget computer can only optimally align a few equence (78). Therefore, we have to ue heuritic; i.e., fater algorithm, if we want to align many equence. Heuritic Algorithm Baed on a progreive pairwie alignment approach ClutalW (Cluter Alignment) PileUp (GCG) MACAW Build a global alignment baed on local alignment Build local multiple alignment Baed on Hidden Markov Model Baed on Genetic algorithm. 4.5

Yverdon Le Bain Introduction to Bioinformatic Progreive trategie for MA A common trategy to the MA problem i to progreively align pair of equence. A tarting pair of equence i elected and aligned Each ubequent equence i aligned to the previou alignment. Progreive alignment i a greedy algorithm. Iterative Pairwie Alignment The greedy algorithm: align ome pair while not done pick an unaligned tring near ome aligned one() align with the previouly aligned group There are many variant to the algorithm. 4.6

Yverdon Le Bain Introduction to Bioinformatic ClutalW: Package for MA ClutalW [the W i from Weighted] i a oftware package for the MA problem. Different weight are given to equence and parameter in different part of the alignment to and create an alignment that make ene biologically. calable Gap Penaltie for protein profile alignment A gap opening next to a conerved hydrophobic reidue can be penalized more heavily than a gap opening next to a hydrophilic reidue. A gap opening very cloe to another gap can be penalized more heavily than an iolated gap. 1 2 3 4 1 2 3 4 1 2 4 3 4 tep of ClutalW All Pairwie Alignment imilarity Matrix 9 4 4 7 4 Cluter Analyi Multiple Alignment tep: 1. Aligning 1 and 3 2. Aligning 2 and 4 3. Aligning ( 1, 3 ) with ( 2, 4 ). Dendrogram 1 3 Ditance 2 4 4.7

Yverdon Le Bain Introduction to Bioinformatic ClutalW: An Example * = identity : = trongly conerved. = weakly conerved By uing the ame five equence and aligning them with CLUTALW, we get the illutrated reult. 4.8

Yverdon Le Bain Introduction to Bioinformatic Red Blood Cell Hemoglobin tructure 4 ubunit: 2 alpha and 2 beta (red and gold) Each ubunit hold a heme group Each heme group contain an iron atom, which i reponible for the binding of oxygen 4.9

Yverdon Le Bain Introduction to Bioinformatic Practical Conideration When to ue Clutal? Can be ued to align any group of protein or nucleic acid equence that are related to each other over their entire length. Clutal i optimized to align et of equence that are entirely colinear, i.e. equence that have the ame protein domain, in the ame order. When ot To Ue Clutal equence do not hare common ancetry. equence are partially related. equence include hort non overlapping fragment. 4.10

Yverdon Le Bain Introduction to Bioinformatic tructural Alignment What you really want to do i align region of imilar function. Thee are the area that are evolutionarily conerved. (Fold, domain, diulfide bond) Problem The computer doe not know anything about the tructure or function of the protein. olution Ue computer alignment a a firt tep, then manually adjut the alignment to account for region of tructural imilarity. MA Editor Once the multiple alignment i produced, it may be neceary to edit the equence manually to obtain a more reaonable or expected alignment. ome of the conideration for an editor: the ue of color to aid in the viual repreentation of the alignment, the capability of recognizing the alignment format, the ability of uing the moue to add, delete, or move equence, thu allowing for an adequate window interface. 4.11

Yverdon Le Bain Introduction to Bioinformatic Divide and Conquer Alignment Ue the divide and conquer technique to perform multiple equence alignment. MA with the Divide & Conquer Method by J. toye Gene 211(2), GC45GC56, 1998. (GeneCOMBI). http://bibierv.techfak.unibielefeld.de/dca/algorithm/ Appropriate Approache 4.12