Yverdon Le Bain Introduction to Bioinformatic Introduction to Bioinformatic ami Khuri Department of Computer cience an Joé tate Univerity an Joé, California, UA khuri@c.ju.edu www.c.ju.edu/faculty/khuri @2010 ami Khuri Multiple equence Alignment Progreive Alignment Iterative Pairwie Guide Tree ClutalW Colinearity Multiple equence Alignment Editor 4.1
Yverdon Le Bain Introduction to Bioinformatic * 20 * 40 * 60 * 80 Wombat : AAAGTTAATGAGTGGTTATCCAGAAGTAGTGACATTTTAGCCTCTGATAACTCCAACGGTAGGAGCCATGAGCAGAGCGCAGA : 83 Opoum : AAAGTTAATGAGTGGTTATTCAGAAGTAATGACGTTTTAGCCCCAGATTACTCAAGTGTTAGGAGCCATGAACAGAATGCAGA : 83 Armadillo : AAAGTTAACGAGTGGTTTTCCAGAGGTGATGACATATTAACTTCTGATGACTCACACGATAGGGGGTCTGAATTAAATGCAGA : 83 loth : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGACATACTAACTTCTGATGACTCACACAATGGGGGGTCTGAATCAAATGCAGA : 83 Dugong : AAAGTTAATGAGTGGTTTTTCAGAAGTGATGGCCTGGATGACTTGCATGATAAGGGGTCTGAGTCAAATGCAGA : 74 Hyrax : AAAGTTAATGAGTGGTTTTCCAGAAGTGACAACCTAAGTGATTCACCTAGTGAGGGGTCTGAATTAAATGGAAA : 74 Aardvark : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGGCCTGGATGGCTCACATGATGAAGGGTCTGAATCAAATGCAGA : 74 Tenrec : AAGGTTAACGAGTGGTTTTCCAAAAGCCACGGCCTGGGTGACTCTCGCGATGGGCGGCCTGAGTCAGGCGCAGA : 74 Rhinocero : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGAAATATTAACTTCTGATGACTCACATGATGGGGGGCCTGAATCAAATACTGA : 83 Pig : AAAGTTAATGAGTGGTTTTCTAGAAGCGATGAAATGTTAACTTCTGACGACTCACAGGACAGGAGGTCTGAATCAAATACTGG : 83 Hedgehog : AAAGTGAATGAATGGCTTTCCAGAAGTGATGAACTGTTAACTTCTGATGACTCATATGATAAGGGATCTAAATCAAAAACTGA : 83 Human : AAAGTTAATGAGTGGTTTTCCAGAAGTGATGAACTGTTAGGTTCTGATGACTCACATGATGGGGAGTCTGAATCAAATGCCAA : 83 Rat : AAAGTGAATGAGTGGTTTTCCAGAACTGGTGAAATGTTAACTTCTGACAATGCATCTGACAGGAGGCCTGCGTCAAATGCAGA : 83 Hare : AAAGTTAACGAGTGGTTCTCCAGAAGTAATGAAATGTTAACTCCTGATGACTCACTTGACCGGCGGTCTGAATCAAATGCCAA : 83 AAaGTtAAtGAgTGGtTtTccAgAagt atga T gatgactca gat g gg ctga t aaatgc ga * 100 * 120 * 140 * Wombat : GGTGCCTAGTGCCTTAGAAGATGGGCATCCAGATACCGCAGAGGGAAATTCTAGCGTTTCTGAGAAGACTGAC : 156 Opoum : GGCAACCAATGCTTTAGAATATGGGCATGTAGAGACAGATGGAAATTCTAGCATTTCTGAAAAGACTGAT : 153 Armadillo : AGTAGCTGGTGCATTGAAAGTTTCAAAAGAAGTAGATGAATATTCTAGTTTTTCAGAGAAGATAGAC : 150 loth : AGTAGTTGGTGCATTGAAAGTTCCAAATGAAGTAGATGGATATTCTGGTTCTTCAGAGAAGATAGAC : 150 Dugong : AGTAGCTGGTGCTTTAGAAGTTCCAGAAGAAGTACATGGATATTCTAGTTCTTCAGAGAAAATAGAC : 141 Hyrax : AGTGGCTGGTCCAGTAAAACTTCCAGGTGAAGTACATAGATATTCTAGTTTTCCAGAGAACATAGAT : 141 Aardvark : AATAGGTGGTGCATTAGAAGTTTCAAATGAAGTACATAGTTACTCTGGTTCTTCAGAGAAAATAGAC : 141 Tenrec : CGTAGCTGTAGCCTTCGAAGTTCCAGACGAAGCATGTGAATCTTATAGTTCTCCAGAGAAAACAGAC : 141 Rhinocero : AGTAGCTGGTGCAGTAGAAGTTCAAAATGAAGTAGATGGATATTCTGGTTCTTCAGAGAAAATAGGC : 150 Pig : GGTAGCTGGTGCAGCAGAGGTTCCAAATGAAGCAGATGGACATTTGGGTTCTTCAGAGAAAATAGAC : 150 Hedgehog : AGTAACTGTAACAACAGAAGTTCCAAATGCAATAGATAGRTTTTTTGGTTCTTCAGAGAAAATAAAC : 150 Human : AGTAGCTGATGTATTGGACGTTCTAAATGAGGTAGATGAATATTCTGGTTCTTCAGAGAAAATAGAC : 150 Rat : AGCTGCTGTTGTGTTAGAAGTTTCAAATGAAGTGGATGGATGTTTCAGTTCTTCAAAGAAAATAGAC : 150 Hare : AGTGGCTGGTGCATTAGAAGTCCCAAAGGAGGTAGATGGATATTCTGGTTCTACAGAGAAAATAGAC : 150 gt gctg tgc t gaagtt ca a gaag a atggatatt t Gtt TtCagAgAA Atagac Part of the alignment of the DA equence of the BRCA1 gene From Bioinformatic and Molecular Evolution by Paul Higg and Terea Attwood Aligning BRCA1 equence * * * * * Wombat : KVEWLRDILADGRHEQAEVPALEDGHPDTAEGVEKTD : 52 Opoum : KVEWLFRDVLAPDYVRHEQAEATALEYGHVETDGIEKTD : 51 Armadillo : KVEWFRGDDILTDDHDRGELAEVAGALKVKEVDEYFEKID : 50 loth : KVEWFRDDILTDDHGGEAEVVGALKVPEVDGYGEKID : 50 Dugong : KVEWFFRDGLDDLHDKGEAEVAGALEVPEEVHGYEKID : 47 Hyrax : KVEWFRDLDPEGELGKVAGPVKLPGEVHRYFPEID : 47 Aardvark : KVEWFRDGLDGHDEGEAEIGGALEVEVHYGEKID : 47 Tenrec : KVEWFKHGLGDRDGRPEGADVAVAFEVPDEACEYPEKTD : 47 Rhinocero : KVEWFRDEILTDDHDGGPETEVAGAVEVQEVDGYGEKIG : 50 Pig : KVEWFRDEMLTDDQDRRETGVAGAAEVPEADGHLGEKID : 50 Hedgehog : KVEWLRDELLTDDYDKGKKTEVTVTTEVPAIDXFFGEKI : 50 Human : KVEWFRDELLGDDHDGEEAKVADVLDVLEVDEYGEKID : 50 Rat : KVEWFRTGEMLTDADRRPAAEAAVVLEVEVDGCFKKID : 50 Hare : KVEWFREMLTPDDLDRREAKVAGALEVPKEVDGYGTEKID : 50 KVEWf4 6 d e n e eki Alignment of BRCA1 protein equence for the ame region on the gene From Bioinformatic and Molecular Evolution by Paul Higg and Terea Attwood 4.2
Yverdon Le Bain Introduction to Bioinformatic What i Multiple Alignment Mot imple extenion of pairwie alignment Given: et of equence Match matrix Gap penaltie Find: Alignment of equence uch that an optimal core i achieved. Ue of Multiple Alignment A good alignment i critical for further analyi Determine the relationhip between a group of equence Determine the conerved region Evolutionary Analyi Determine the phylogenetic relationhip and evolution tructural Analyi Determine the overall tructure of the protein 4.3
4.4 Yverdon Le Bain Introduction to Bioinformatic Alignment Difficultie We cannot ue equence comparion algorithm (dynamic programming) and jut add more equence. With each equence, another dimenion i added that we need to earch for the optimal alignment. Add 1 equence Hyperlattice Computation A V Ā V V A A V A = A V A V V A V A A V A V A V max A V k=3 2 k 1=7
Yverdon Le Bain Introduction to Bioinformatic MA: Exact v. Heuritic The exact algorithm travere the entire earch pace find overall meaure of alignment quality and trie to maximize thi quality. The operation i computationally intenive. The larget computer can only optimally align a few equence (78). Therefore, we have to ue heuritic; i.e., fater algorithm, if we want to align many equence. Heuritic Algorithm Baed on a progreive pairwie alignment approach ClutalW (Cluter Alignment) PileUp (GCG) MACAW Build a global alignment baed on local alignment Build local multiple alignment Baed on Hidden Markov Model Baed on Genetic algorithm. 4.5
Yverdon Le Bain Introduction to Bioinformatic Progreive trategie for MA A common trategy to the MA problem i to progreively align pair of equence. A tarting pair of equence i elected and aligned Each ubequent equence i aligned to the previou alignment. Progreive alignment i a greedy algorithm. Iterative Pairwie Alignment The greedy algorithm: align ome pair while not done pick an unaligned tring near ome aligned one() align with the previouly aligned group There are many variant to the algorithm. 4.6
Yverdon Le Bain Introduction to Bioinformatic ClutalW: Package for MA ClutalW [the W i from Weighted] i a oftware package for the MA problem. Different weight are given to equence and parameter in different part of the alignment to and create an alignment that make ene biologically. calable Gap Penaltie for protein profile alignment A gap opening next to a conerved hydrophobic reidue can be penalized more heavily than a gap opening next to a hydrophilic reidue. A gap opening very cloe to another gap can be penalized more heavily than an iolated gap. 1 2 3 4 1 2 3 4 1 2 4 3 4 tep of ClutalW All Pairwie Alignment imilarity Matrix 9 4 4 7 4 Cluter Analyi Multiple Alignment tep: 1. Aligning 1 and 3 2. Aligning 2 and 4 3. Aligning ( 1, 3 ) with ( 2, 4 ). Dendrogram 1 3 Ditance 2 4 4.7
Yverdon Le Bain Introduction to Bioinformatic ClutalW: An Example * = identity : = trongly conerved. = weakly conerved By uing the ame five equence and aligning them with CLUTALW, we get the illutrated reult. 4.8
Yverdon Le Bain Introduction to Bioinformatic Red Blood Cell Hemoglobin tructure 4 ubunit: 2 alpha and 2 beta (red and gold) Each ubunit hold a heme group Each heme group contain an iron atom, which i reponible for the binding of oxygen 4.9
Yverdon Le Bain Introduction to Bioinformatic Practical Conideration When to ue Clutal? Can be ued to align any group of protein or nucleic acid equence that are related to each other over their entire length. Clutal i optimized to align et of equence that are entirely colinear, i.e. equence that have the ame protein domain, in the ame order. When ot To Ue Clutal equence do not hare common ancetry. equence are partially related. equence include hort non overlapping fragment. 4.10
Yverdon Le Bain Introduction to Bioinformatic tructural Alignment What you really want to do i align region of imilar function. Thee are the area that are evolutionarily conerved. (Fold, domain, diulfide bond) Problem The computer doe not know anything about the tructure or function of the protein. olution Ue computer alignment a a firt tep, then manually adjut the alignment to account for region of tructural imilarity. MA Editor Once the multiple alignment i produced, it may be neceary to edit the equence manually to obtain a more reaonable or expected alignment. ome of the conideration for an editor: the ue of color to aid in the viual repreentation of the alignment, the capability of recognizing the alignment format, the ability of uing the moue to add, delete, or move equence, thu allowing for an adequate window interface. 4.11
Yverdon Le Bain Introduction to Bioinformatic Divide and Conquer Alignment Ue the divide and conquer technique to perform multiple equence alignment. MA with the Divide & Conquer Method by J. toye Gene 211(2), GC45GC56, 1998. (GeneCOMBI). http://bibierv.techfak.unibielefeld.de/dca/algorithm/ Appropriate Approache 4.12