SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
|
|
- Jacob Cooper
- 5 years ago
- Views:
Transcription
1 SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4
2 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid. 5 6 The standard genetic code TTT Phe TCT Ser TAT Tyr TGT Cys TTC Phe TCC Ser TAC Tyr TGC Cys TTA Leu TCA Ser TAA STOP TGA STOP TTG Leu TCG Ser TAG STOP TGG Trp CTT Leu CCT Pro CAT His CGT Arg CTC Leu CCC pro CAC His CGC Arg CTA Leu CCA Pro CAA Gln CGA Arg CTG Leu CCG Pro CAG Gln CGG Arg ATT Ile ACT Thr AAT Asn AGT Ser ATC Ile ACC Thr AAC Asn AGC Ser ATA Ile ACA Thr AAA Lys AGA Arg ATG Met ACG Thr AAG Lys AGG Arg GTT Val GCT Ala GAT Asp GGT Gly GTC Val GCC Ala GAC Asp GGC Gly GTA Val GCA Ala GAA Glu GGA Gly GTG Val GCG Ala GAG Glu GGG Gly Nucleotides and amino acids The four nucleotides in DNA (RNA) A adenine G guanine C cytosine T thymine (U uracil) The twenty amino-acids in proteins A alanine C cysteine D aspartic acid E glutamic acid F phenylalanine G glycine H histidine I isoleucine K lysine L leucine M methionine N asparagine P proline Q glutamine R arginine S serine T threonine V valine W tryptophan Y tyrosine 7 8
3 Sequences Goals of sequence alignment DNA sequence: GCTGAACGATTCGTTACT Amino-acid sequence: MAAPSRTTLMPPPFRLQLRLLILPILLLLRHDAVHAEPYS Given two (nucleotide or amino acid) sequences, we want to: measure their similarity determine the correspondences between elements of the sequences observe patterns of sequence conservation and variability of sequences over time 9 10 Definition of sequence alignment Changes in alphabets Given an alphabet A, a string is a finite sequence of letters from A Example: GCTGAACG (DNA alphabet) Sequence alignment is the assignment of letter-letter correspondences between two or more strings from a given alphabet exchange of a single letter for another (point mutation) insertion of a single letter deletion of a single letter Pairwise sequence alignment is the process of transforming one sequence into another by repated application of these three operations on single letters 11 12
4 Example alignment Different alignment notations Without gaps: With gaps: With gaps: G C T G A A C G C T A T A A T C G C T G A A C G C T A T A A T C G C T G A A C G C T A T A A T C G C T G A A C G C T A T A A T C G C T G A A C G C T A T A A T C Optimal alignment Dotplot To decide which alignment is the best of all possibilities, we need: 1. A way to systematically examine all possible alignments 2. A score for each possible alignment, which reflects the similarity of the two sequences The optimal alignment will then be the one(s) with the highest score. Note that there may be more than one optimal alignment. D O R O T H Y C R O W F O O T H O D G K I D D D O O O O O O O R R R O O O O O O O T T T H H H Y Y H H H O O O O O O O D D D G G K K I I N N N 15 16
5 Dotplot Dotplot A B R A C A D A B R A C A D A B R A B B B B R R R R A A A A A A A A C C C D D D B B B B R R R R C C C D D D B B B B R R R R 17 Dotplot of the amino acid sequence of SLIT protein of Drosophila melanogaster (fruit fly). Web tool Dotlet: 18 Dotplot: filtering Dotplot with filtering To avoid very short stretches or many small gaps along stretches of matches one may use the filtering parameters window and threshold A dot will appear in a cell of the dotplot if that cell is in the center of a stretch of characters of length window such that the number of matches is threshold Another option is to give the cell a color (or grey value), such that the higher the number of matches in the window, the more intense the color becomes w = 1, t = 1 w = 11, t = 5 Dotplot with window w and threshold t of the amino acid sequence of the protein pancreatic ribonuclease of the horse
6 Measures of sequence similarity Hamming distance Functions that associates a numeric value with a pair of sequences: 1. similarity measure Higher value greater similarity 2. distance function Larger distance smaller similarity (a distance function is a dissimilarity measure) For two strings of equal length their Hamming distance is the number of character positions in which they differ s : A G T C t : C G T A s : A G C A C A C A t : A C A C A C T A Hamming distance = 2 Hamming distance = 6 Disadvantage: shift of just one position leads to large Hamming distance Edit distance Alignments with gaps Distance can be based on the number of edit operations required to change one string to the other Here an edit operation is a deletion, insertion or alteration of a single character in either sequence (a, a) match (no change from s to t) (a, ) deletion of character a (in s); indicated by in t (a, b) replacement of a (in s) by b (in t), where a b (, b) insertion of character b (in s); indicated by in s For the DNA alphabet: a {A, C, T, G} b {A, C, T, G} Input: Alignment 1: Alignment 2: s : A G C A C A C A t : A C A C A C T A s : A G C A C A C A t : A C A C A C T A s : A G C A C A C A t : A C A C A C T A 23 24
7 Protocol of edit operations Unit cost model Alignment 1: s : A G C A C A C A t : A C A C A C T A Match (A, A) Delete (G, ) Match (C, C) Match (A, A) Match (C, C) Match (A, A) Match (C, C) Insert (, T ) Match (A, A) Assign a cost or weight w to each operation. For example: match: w(a, a) = 0 replacement: w(a, b) = 1 for a b deletion/insertion: w(a, ) = w(, b) = 1 This scheme is known as the Levenshtein Distance, also called unit cost model Edit distance Edit distance: examples Given a cost function w for single operations: 1. The cost of an alignment of two sequences s and t is the sum of the costs of all the edit operations needed to transform s to t 2. An optimal alignment of s and t is an alignment which has minimal cost among all possible alignments Alignment 1: cost = 2 Alignment 2: cost = 4 s : A G C A C A C A t : A C A C A C T A s : A G C A C A C A t : A C A C A C T A 3. The edit distance of s and t is the cost of an optimal alignment of s and t under a cost function w. We denote it by d w (s; t) Alignment 1 is optimal under the unit cost model edit distance d w (s; t) =
8 Scoring functions Some changes in nucleotide or amino acid sequences are more likely than others So assign variable weights to different edit operations This leads to the concept of scoring functions or substitution matrices A substitution matrix: square array of values which indicate the scores associated to possible transitions (replacements, insertions, deletions) One uses either similarity scores or dissimilarity scores (such as edit distance) Similarity scoring schemes for DNA sequences Percent Identity substitution matrix: 99% identity 50% identity A T G C A T G C A T G C A T G C Substitutions that are more likely get a higher similarity score or, equivalently, a smaller dissimilarity score Dotplots and sequence alignment D O R O T H Y C R O W F O O T H O D G K I D D D O O O O O O O R R R O O O O O O O T T T H H H Y Y H H H O O O O O O O D D D G G K K I N Any path through this dotplot from upper left to lower right, moving at each cell only East, South or Southeast, corresponds to an alignment. D O R O T H Y C R O W F O O T H O D G K I N D O R O T H Y H O D G K I N N I N 31 Optimal substructure property S Dynamic programming M If M is a point on an optimal path π [S T ] (solid line) then π [S M] and π [M T ] are also optimal paths. The cost of the dotted path from S to M cannot be smaller than the cost of the solid path from S to M. T 32
9 Edit distance: Recursive computation Edit distance: Recursive computation Create a matrix by D, with elements 1 D(i, j), i = 1, 2,..., n and j = 1, 2,..., m such that D(i, j) is the minimal edit distance between the sequences that consist of the first i characters of s and the first j characters of t Then D(n, m) will be the minimal edit distance between the full sequences s and t For initialization, we need to add an extra row D(0, j), j = 0, 1, 2,..., m, and column D(i, 0), i = 0, 1, 2,..., n to the matrix. D 00 D 01 D 0m D = D 10 D 11 D 1m.... D n0 D n1 D nm D(n, m) equals the minimal edit distance between the full sequences s and t 1 i is the row index, running from top to bottom; j is the column index, running from left to right Steps in the dotplot matrix Recursion Each step in the matrix which arrives in cell (i, j) can be of three types: East (previous cell was (i, j 1)) South (previous cell was (i 1, j)) SouthEast (previous cell was (i 1, j 1)) edit operation step in matrix cost substitution of a i b j (i 1, j 1) (i, j) w(a i, b j ) deletion of a i from s (i 1, j) (i, j) w(a i, ) deletion of b j from t (i, j 1) (i, j) w(, b j ) Three paths arrive at cell (i, j): The optimal paths from (0, 0) to (i 1, j 1), (i 1, j), or (i, j 1), followed by: step cost (i 1, j 1) (i, j) D(i 1, j 1) + w(a i, b j ) (i 1, j) (i, j) D(i 1, j) + w(a i, ) (i, j 1) (i, j) D(i, j 1) + w(, b j ) The minimum of these is the cost D(i, j) of the optimal path from (0, 0) to (i, j). Recursion: D(i, j) = min{d(i 1, j 1) + w(a i, b j ), D(i 1, j) + w(a i, ), D(i, j 1) + w(, b j )} 35 36
10 Initialization Retrieving the optimal path(s) On the top row and left column of the matrix we have no North or West neighbours, respectively. So here we have to initialize values: D(i, 0) = i w(a k, ), D(0, j) = k=0 j w(, b k ) k=0 which impose the gap penalty on unmatched characters at the beginning of either sequence store a pointer (an arrow) to one of the three cells (i 1, j 1), (i 1, j) or (i, j 1) that provided the minimal value. This cell is called the predecessor of (i, j) If there are more cells that provided the minimal value (remember that optimal paths need not be unique) we store a pointer to each of these cells In practice one often uses a constant gap penalty: w(a k, ) = w(, b k ) = g Needleman-Wunsch algorithm 1: INPUT: two sequences s = a 1 a 2... a n and t = b 1 b 2... b m ; cost function w with gap penalty g 2: OUTPUT: matrix D containing the minimal edit distance between the sequences s and t 3: for i = 0 to n do 4: D(i, 0) g i 5: end for 6: for j = 0 to m do 7: D(0, j) g j 8: end for 9: for i = 1 to n do 10: for j = 1 to m do 11: Match D(i 1, j 1) + w(a i, b j ) 12: Delete D(i 1, j) + g 13: Insert D(i, j 1) + g 14: D(i, j) min(match, Insert, Delete) 15: end for 16: end for Example Alignment of sequences s=ggaatgg and t=atg with scoring scheme: w(a, a) = 0 (match) w(a, b) = 4 for a b (mismatch) w(a, ) = w(, b) = 5 (gap insertion) 39 40
11 Example (continued) Example (continued) Matrix D(i, j) after initialization and the first diagonal step: s t A T G G 5 4 G 10 A 15 A 20 T 25 G 30 G 35 NB: to be consistent with the definition of the matrix D(i, j), sequence s is plotted vertically, sequence t horizontally 41 Matrix after termination, including pointers: s t A T G G G A A T G G Red arrows indicate trace-back paths of optimal alignment. 42 Example (continued) Sequence logos Two cells where the trace-back path branches four optimal alignments with equal score: Graphical display of multiple alignment, with colored stacks of letters representing nucleotides or amino acids at successive positions. Height of a letter at a certain position increases with increasing frequency of an amino acid at that position. G G A A T G G A T G G G A A T G G A T G G G A A T G G A T G G G A A T G G A T G 43 Sequence logo of human exon-intron splice boundaries. c 44
Practical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationNSCI Basic Properties of Life and The Biochemistry of Life on Earth
NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,
More informationSUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA
SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal
More informationProteins: Characteristics and Properties of Amino Acids
SBI4U:Biochemistry Macromolecules Eachaminoacidhasatleastoneamineandoneacidfunctionalgroupasthe nameimplies.thedifferentpropertiesresultfromvariationsinthestructuresof differentrgroups.thergroupisoftenreferredtoastheaminoacidsidechain.
More informationAdvanced topics in bioinformatics
Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib
More informationHigh throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm
Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence
More informationCharacterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin
International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of
More informationSupplemental data. Pommerrenig et al. (2011). Plant Cell /tpc
Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of
More information3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies
Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationProtein Threading. Combinatorial optimization approach. Stefan Balev.
Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline
More informationCodon Distribution in Error-Detecting Circular Codes
life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,
More informationThe Trigram and other Fundamental Philosophies
The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three
More informationClay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.
QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco
More informationNature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1
Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as
More informationSUPPLEMENTARY DATA - 1 -
- 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationNumber-controlled spatial arrangement of gold nanoparticles with
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*
More informationSSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),
48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.
More informationTM1 TM2 TM3 TM4 TM5 TM6 TM bp
a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T
More informationUsing an Artificial Regulatory Network to Investigate Neural Computation
Using an Artificial Regulatory Network to Investigate Neural Computation W. Garrett Mitchener College of Charleston January 6, 25 W. Garrett Mitchener (C of C) UM January 6, 25 / 4 Evolution and Computing
More informationElectronic supplementary material
Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and
More informationSupporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-
Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,
More informationBiosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the
JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4355-4360 0021-9193/87/094355-06$02.00/0 Copyright X) 1987, American Society for Microbiology Vol. 169, No. 9 Biosynthesis of Bacterial Glycogen: Primary Structure
More informationSupporting Information
Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted
More informationEvolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval
Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University
More informationTHE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE
STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,
More informationLecture 15: Realities of Genome Assembly Protein Sequencing
Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing
More informationSupplemental Figure 1.
A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type
More informationTable S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R
Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus
More informationIntroduction to Molecular Phylogeny
Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =
More informationpart 3: analysis of natural selection pressure
part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections
More informationModelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics
582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline
More informationBuilding a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy
Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,
More informationTranslation. A ribosome, mrna, and trna.
Translation The basic processes of translation are conserved among prokaryotes and eukaryotes. Prokaryotic Translation A ribosome, mrna, and trna. In the initiation of translation in prokaryotes, the Shine-Dalgarno
More informationSupplementary Information
Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures
More informationRegulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)
Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model
More informationSUPPLEMENTARY INFORMATION
DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design
More informationGenetic code on the dyadic plane
Genetic code on the dyadic plane arxiv:q-bio/0701007v3 [q-bio.qm] 2 Nov 2007 A.Yu.Khrennikov, S.V.Kozyrev June 18, 2018 Abstract We introduce the simple parametrization for the space of codons (triples
More informationAoife McLysaght Dept. of Genetics Trinity College Dublin
Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution
More informationIt is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.
It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015. The system-resonance approach in modeling genetic structures Sergey V. Petoukhov Institute
More informationEvolutionary Analysis of Viral Genomes
University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral
More informationEncoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective
Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission
More informationEvolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila
biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.
More informationSupplemental Table 1. Primers used for cloning and PCR amplification in this study
Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC
More informationThe role of the FliD C-terminal domain in pentamer formation and
The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,
More informationWhy do more divergent sequences produce smaller nonsynonymous/synonymous
Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?
More informationProtein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods
Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...
More information373 The Evidence of how DNA and the Scriptures have Identical Numeric Signatures
4 373 The Evidence of how DNA and the Scriptures have Identical Numeric Signatures How are Gen 1.1, Deut 6.4 & John 1.1 encoded in the DNA molecule Peter Bluer A note on Neighbourhood values using Prime
More informationObjective: You will be able to justify the claim that organisms share many conserved core processes and features.
Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved
More informationViewing and Analyzing Proteins, Ligands and their Complexes 2
2 Viewing and Analyzing Proteins, Ligands and their Complexes 2 Overview Viewing the accessible surface Analyzing the properties of proteins containing thousands of atoms is best accomplished by representing
More informationevoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria
evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent
More informationUsing Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell
Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes
More informationA modular Fibonacci sequence in proteins
A modular Fibonacci sequence in proteins P. Dominy 1 and G. Rosen 2 1 Hagerty Library, Drexel University, Philadelphia, PA 19104, USA 2 Department of Physics, Drexel University, Philadelphia, PA 19104,
More informationSex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi
Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben
More information8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009
8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationThe 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole
Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded
More informationevoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria
evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent
More informationRe- engineering cellular physiology by rewiring high- level global regulatory genes
Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron
More informationHADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS
Symmetry: Culture and Science Vol. 16, No. 3, 247-266, 2005 HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS Sergey V. Petoukhov Address: Department of Biomechanics,
More information8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011
8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number
More informationGenetic Code, Attributive Mappings and Stochastic Matrices
Genetic Code, Attributive Mappings and Stochastic Matrices Matthew He Division of Math, Science and Technology Nova Southeastern University Ft. Lauderdale, FL 33314, USA Email: hem@nova.edu Abstract: In
More informationProperties of amino acids in proteins
Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated
More informationChemiScreen CaS Calcium Sensor Receptor Stable Cell Line
PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N
More informationExam III. Please read through each question carefully, and make sure you provide all of the requested information.
09-107 onors Chemistry ame Exam III Please read through each question carefully, and make sure you provide all of the requested information. 1. A series of octahedral metal compounds are made from 1 mol
More informationAmino Acids and Peptides
Amino Acids Amino Acids and Peptides Amino acid a compound that contains both an amino group and a carboxyl group α-amino acid an amino acid in which the amino group is on the carbon adjacent to the carboxyl
More informationChemistry Chapter 22
hemistry 2100 hapter 22 Proteins Proteins serve many functions, including the following. 1. Structure: ollagen and keratin are the chief constituents of skin, bone, hair, and nails. 2. atalysts: Virtually
More informationA p-adic Model of DNA Sequence and Genetic Code 1
ISSN 2070-0466, p-adic Numbers, Ultrametric Analysis and Applications, 2009, Vol. 1, No. 1, pp. 34 41. c Pleiades Publishing, Ltd., 2009. RESEARCH ARTICLES A p-adic Model of DNA Sequence and Genetic Code
More informationNear-instant surface-selective fluorogenic protein quantification using sulfonated
Electronic Supplementary Material (ESI) for rganic & Biomolecular Chemistry. This journal is The Royal Society of Chemistry 2014 Supplemental nline Materials for ear-instant surface-selective fluorogenic
More informationSequence Divergence & The Molecular Clock. Sequence Divergence
Sequence Divergence & The Molecular Clock Sequence Divergence v simple genetic distance, d = the proportion of sites that differ between two aligned, homologous sequences v given a constant mutation/substitution
More informationDiversity of Chlamydia trachomatis Major Outer Membrane
JOURNAL OF ACTERIOLOGY, Sept. 1987, p. 3879-3885 Vol. 169, No. 9 0021-9193/87/093879-07$02.00/0 Copyright 1987, American Society for Microbiology Diversity of Chlamydia trachomatis Major Outer Membrane
More informationpart 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.
2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel
More informationSimilarity or Identity? When are molecules similar?
Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are
More informationTiming molecular motion and production with a synthetic transcriptional clock
Timing molecular motion and production with a synthetic transcriptional clock Elisa Franco,1, Eike Friedrichs 2, Jongmin Kim 3, Ralf Jungmann 2, Richard Murray 1, Erik Winfree 3,4,5, and Friedrich C. Simmel
More informationSequence comparison: Score matrices
Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best
More informationSupplementary Information
Supplementary Information Arginine-rhamnosylation as new strategy to activate translation elongation factor P Jürgen Lassak 1,2,*, Eva Keilhauer 3, Max Fürst 1,2, Kristin Wuichet 4, Julia Gödeke 5, Agata
More informationSequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best
More informationCapacity of DNA Data Embedding Under. Substitution Mutations
Capacity of DNA Data Embedding Under Substitution Mutations Félix Balado arxiv:.3457v [cs.it] 8 Jan Abstract A number of methods have been proposed over the last decade for encoding information using deoxyribonucleic
More informationIn this article, we investigate the possible existence of errordetection/correction
EYEWIRE BY DIEGO LUIS GONZALEZ, SIMONE GIANNERINI, AND RODOLFO ROSA In this article, we investigate the possible existence of errordetection/correction mechanisms in the genetic machinery by means of a
More informationSequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in
More informationSymmetry Studies. Marlos A. G. Viana
Symmetry Studies Marlos A. G. Viana aaa aac aag aat caa cac cag cat aca acc acg act cca ccc ccg cct aga agc agg agt cga cgc cgg cgt ata atc atg att cta ctc ctg ctt gaa gac gag gat taa tac tag tat gca gcc
More informationC CH 3 N C COOH. Write the structural formulas of all of the dipeptides that they could form with each other.
hapter 25 Biochemistry oncept heck 25.1 Two common amino acids are 3 2 N alanine 3 2 N threonine Write the structural formulas of all of the dipeptides that they could form with each other. The carboxyl
More informationSlide 1 / 54. Gene Expression in Eukaryotic cells
Slide 1 / 54 Gene Expression in Eukaryotic cells Slide 2 / 54 Central Dogma DNA is the the genetic material of the eukaryotic cell. Watson & Crick worked out the structure of DNA as a double helix. According
More informationStudies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine
Supporting Information for Studies Leading to the Development of a Highly Selective Colorimetric and Fluorescent Chemosensor for Lysine Ying Zhou, a Jiyeon Won, c Jin Yong Lee, c * and Juyoung Yoon a,
More informationPROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS
Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:
More informationEvolutionary Change in Nucleotide Sequences. Lecture 3
Evolutionary Change in Nucleotide Sequences Lecture 3 1 So far, we described the evolutionary process as a series of gene substitutions in which new alleles, each arising as a mutation ti in a single individual,
More informationAtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR
Online Resource 1. Primers used to generate constructs AtTIL-P91V, AtTIL-P92V, AtTIL-P95V and AtTIL-P98V and YFP(HPR) using overlapping PCR. pentr/d- TOPO-AtTIL was used as template to generate the constructs
More informationUNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level
*1166350738* UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level CEMISTRY 9701/43 Paper 4 Structured Questions October/November
More informationCodon-model based inference of selection pressure. (a very brief review prior to the PAML lab)
Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an index of selection pressure rate ratio mode example dn/ds < 1 purifying (negative) selection histones dn/ds
More informationFrom DNA to protein, i.e. the central dogma
From DNA to protein, i.e. the central dogma DNA RNA Protein Biochemistry, chapters1 5 and Chapters 29 31. Chapters 2 5 and 29 31 will be covered more in detail in other lectures. ph, chapter 1, will be
More informationAn Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code
Bulletin of Mathematical Biology (2007) 69: 677 698 DOI 10.1007/s11538-006-9147-z ORIGINAL ARTICLE An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded
More informationThe Mathematics of Phylogenomics
arxiv:math/0409132v2 [math.st] 27 Sep 2005 The Mathematics of Phylogenomics Lior Pachter and Bernd Sturmfels Department of Mathematics, UC Berkeley [lpachter,bernd]@math.berkeley.edu February 1, 2008 The
More informationThe degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov
The degeneracy of the genetic code and Hadamard matrices Sergey V. Petoukhov Department of Biomechanics, Mechanical Engineering Research Institute of the Russian Academy of Sciences petoukhov@hotmail.com,
More informationThe Journal of Animal & Plant Sciences, 28(5): 2018, Page: Sadia et al., ISSN:
The Journal of Animal & Plant Sciences, 28(5): 2018, Page: 1532-1536 Sadia et al., ISSN: 1018-7081 Short Communication BIOINFORMATICS ANALYSIS OF CODON USAGE BIAS AND RNA SECONDARY STRUCTURES FOR SALT
More informationUsing algebraic geometry for phylogenetic reconstruction
Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA
More informationInsects act as vectors for a number of important diseases of
pubs.acs.org/synthbio Novel Synthetic Medea Selfish Genetic Elements Drive Population Replacement in Drosophila; a Theoretical Exploration of Medea- Dependent Population Suppression Omar S. Abari,,# Chun-Hong
More informationMotif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar
Motif Finding Algorithms Sudarsan Padhy IIIT Bhubaneswar Outline Gene Regulation Regulatory Motifs The Motif Finding Problem Brute Force Motif Finding Consensus and Pattern Branching: Greedy Motif Search
More informationSolutions In each case, the chirality center has the R configuration
CAPTER 25 669 Solutions 25.1. In each case, the chirality center has the R configuration. C C 2 2 C 3 C(C 3 ) 2 D-Alanine D-Valine 25.2. 2 2 S 2 d) 2 25.3. Pro,, Trp, Tyr, and is, Trp, Tyr, and is Arg,
More information