Advanced topics in bioinformatics

Size: px
Start display at page:

Download "Advanced topics in bioinformatics"

Transcription

1 Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site:

2 Lecture 1, 5/3/2003: Substitution matrices theory and schemes 2

3 Sequence alignment ATCAGAGTC TTCAGTC TTCAGTC TTCAGTC TTCA--GTC ^^+++ We wish to identify what regions are most similar to each other in the two sequences. Sequences are shifted one by the other and gaps introduced, to cover all possible alignments. The shifts and gaps provide the steps by which one sequence can be converted into the other. 3

4 Alignment scoring schemes: substitution matrices Unitary substitution matrix - two scores are used, one for matches and one mismatches. Practical usage of such matrices is for nucleotide alphabets. A C G T A C G T

5 Alignment scoring schemes: substitution matrices In protein sequences there are 20 types of residues (amino acids - aa) with complex relations by size, charge, genetic code, and chemistry. Unitary aa substitution matrices are outperformed by matrices that have different scores for the 210 possible aa pairs. These matrices are calculated by scoring the relations between different aa according to some of their features and/or which substitutions occur in correct alignments and what is the probability of having them by chance. 5

6 Sequence alignment BLOSUM62 in 1/2 Bit Units amino acids substitution matrix A R N D C Q E G H I L K M F P S T W Y V X A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V X

7 Altschul JMB 219:555, 91 Alignment scoring schemes: substitution matrices Every substitution matrix is either explicitly calculated from target frequencies of aligned residues (q ij ) and the frequencies of the residues (p i ), or these target and observed frequencies are implicit and can be back-calculated from the substitution scores (s ij ). The ratio of a target frequency to the frequencies it will occur by chance compares the probability an event will occur under two alternative hypotheses - q ij /(p i p j ). This is called a likelihood, or odds, ratio. Such probabilities should be multiplied to get the probability of their independent occurrence, or their log can be added. Log-odds score - s ij = (ln q ij /(p i p j )) / λ (λ determines the base of the logarithm) 7

8 Sequence alignment BLOSUM62 in 1/2 Bit Units amino acids substitution matrix A R N D C Q E G H I L K M F P S T W Y V X A 4 R -1 5 P i P j Q ij Q ij /P i P j 2log 2 (Q ij /P i P j ) N A:A D A:R C Q E G H I L K M F P S T W Y See -3 ftp://ncbi.nlm.nih.gov/repository/blocks/unix/blosum/readme V and -3 ftp://ncbi.nlm.nih.gov/repository/blocks/unix/blosum/blosum/ X

9 Altschul JMB 219:555, 91 Alignment scoring schemes: substitution matrices Substitution matrices are characterized by their average score per residue pair H = Σ i,j q ij s ij = Σ i,j q ij log 2 (q ij /p i p j ) H is the information, in bit units, per aligned residue pair. It depends on the target frequencies (q ij ) - calculated from what we think are correct alignments - and on the alignments that would occur by chance (p i p j ). It is termed the relative entropy of the matrix. H measures the information provided by the matrix to distinguish correct alignments from chance ones. Well made matrices with lower values will identify more distant sequence relationships that produce 9 weaker alignments.

10 Alignment scoring schemes: substitution matrices Substitution matrices differ by the models and data used for their calculation. Each is suitable for identifying alignments of sequences with different evolutionary distances. Nevertheless, longer alignments are needed to identify the relationship between more distant sequences. The scale of the substitution matrix (base of the log) is arbitrary. However, matrices must be in the same scale to be compared to each other, and gap penalties are specific to the matrix and scale used. Typical penalties for local alignment with the BLOSUM62 matrix in half-bit units are 12 for opening a gap and 2 for extending it. 10

11 Amino acid (aa) substitution matrices can be calculated empirically, by examining which substitutions occur in correct alignments and a model for the random protein sequences. These matrices can also be derived by scoring the relations of aas to each other according to some of their features, such as size, charge, hydrophobicity and genetic code. 11

12 Residue-features matrices F S Y C L P H W I T Q R M A N G V D K E 12

13 Residue-features matrices Small N Polar Q D Negative E Charged G SP A C T V K R H Y W Tiny F Positive L I M Aliphatic Aromatic Hydrophobic 13

14 Hydrophobicity aa substitution matrix Residue-features matrices A R N D C Q E G H I L K M F P S T W Y V X A 10 R 5 10 N D C Q E G H I L K M F P S T W Y V X

15 Genetic code aa substitution matrix minimal number of base changes Residue-features matrices A R N D C Q E G H I L K M F P S T W Y V X A 0 R 2 0 N D * C Q E * G * H * * I * L * * K M F * Universal genetic code P TTT F TCT S TAT Y TGT C TTC F TCC S TAC Y TGC C TTA L TCA S TAA * TGA * TTG L TCG S TAG * TGG W CTT L CCT P CAT H CGT R CTC L CCC P CAC H CGC R CTA L CCA P CAA Q CGA R CTG L CCG P CAG Q CGG R ATT I ACT T AAT N AGT S ATC I ACC T AAC N AGC S ATA I ACA T AAA K AGA R ATG M ACG T AAG K AGG R GTT V GCT A GAT D GGT G GTC V GCC A GAC D GGC G GTA V GCA A GAA E GGA G GTG V GCG A GAG E GGG G S T W Y V X

16 Genetic code aa substitution matrix minimal number of base changes Residue-features matrices A R N D C Q E G H I L K M F P S T W Y V X A 0 R 2 0 N D C Q E G H I L K M F Universal genetic code P S T W Y TTT F TCT S TAT Y TGT C TTC F TCC S TAC Y TGC C TTA L TCA S TAA * TGA * TTG L TCG S TAG * TGG W CTT L CCT P CAT H CGT R CTC L CCC P CAC H CGC R CTA L CCA P CAA Q CGA R CTG L CCG P CAG Q CGG R ATT I ACT T AAT N AGT S ATC I ACC T AAC N AGC S ATA I ACA T AAA K AGA R ATG M ACG T AAG K AGG R GTT V GCT A GAT D GGT G GTC V GCC A GAC D GGC G GTA V GCA A GAA E GGA G GTG V GCG A GAG E GGG G V X

17 Residue-features matrices A selection of substitution matrices based on amino acids features- Mutation values for the interconversion of amino acid pairs (Fitch, 1966) Genetic code matrix (Benner et al., 1994) Residue replace ability matrix (Cserzo et al., 1994) Structure-Genetic matrix (Feng et al., 1985) Hydrophobicity scoring matrix (George et al., 1990) Chemical distance (Grantham, 1974) Chemical similarity scores (McLachlan, 1972) Base-substitution-protein-stability matrix (Miyazawa-Jernigan, 1993) Hydrophobicity scoring matrix (Riek et al., 1995) WAC matrix constructed from amino acid comparative profiles (Wei et al., 1997) Source: AAindex database at 17

18 The Dayhoff, or PAM, matrices PAM matrices are based on an explicit model of mutations during evolution. Amino acid (aa) changes in each site are assumed to be independent of previous changes at that site, of changes in other sites and of the position of the site. This model allows the extrapolation of substitutions observed over short evolutionary distances to longer ones. The input data are groups of protein sequences at least 85% identical to each other (protein families). Amino acid substitutions within each group probably result from single mutation events and do not significantly change the proteins function, and are thus termed accepted mutations. Sequences within each family are organized into a phylogenetic tree. 18

19 The Dayhoff, or PAM, matrices ACGH DBGH ADIJ CBIJ B=>C A=>D B=>D A=>C ABGH ABIJ I<=>G J<=>H Phylogenetic trees allows counting the aligned aa pairs (A ij ) that correspond to actual mutation events. This solves the dependence problem of different sequence alignments within one family. Dayhoff et.al A model of evolutionary change in proteins In Atlas of Protein Sequence and Structure, Suppl 3, A B C D G H I J A 1 1 B 1 1 C 1 1 D 1 1 G 1 H 1 I 1 J 1 19

20 The Dayhoff, or PAM, matrices We need to know the probability that each aa will change within a given evolutionary distance. This number is termed the relative mutability of the aa (m j ). Sequence alignment: ADA ADB Amino acids: A B D Changes: Occurrence: Relative mutability: Dayhoff et.al A model of evolutionary change in proteins In Atlas of Protein Sequence and Structure, Suppl 3, => exposure to mutation, these numbers are multiplied by the total number of mutations per 100 positions in each family. This scales the data from different families to the same evolutionary distance 20 : 1 percent accepted mutation - PAM.

21 The Dayhoff, or PAM, matrices Combining the A ij data of accepted point mutations and relative mutabilities, m j, gives the probability, M ij, that an aa i will change into an aa j after a given evolutionary distance. M ij = λm j A ij /(Σ A i ij ) and M jj = 1 - λm j λ is a proportionality constant chosen to keep the evolutionary distance 1 PAM. The PAM1 matrix gives the probabilities for 1 aa change per 100 aas - 1%. To get PAM matrices for larger changes PAM1 is multiplied by itself. 250 multiplications give the PAM250 matrix that has the probabilities for aa substitutions expected to occur over 250% of changes. This will cause sequence divergence of ~80%. Note that a position can mutate back to an aa previously found in it. 21

22 The Dayhoff, or PAM, matrices To get odds values for the PAM matrices the scores are divided by the frequency of the changing residue, f i - R ij = M ij/ / f i After getting the log10 value of the odds ratios the values of R ij and R ji changes are averaged and multiplied by 10. The resulting matrices are termed mutation data matrices (MDM). Their values are log-odds values with about third bits units (10log 10 ~ 3log 2 ). H, the relative entropy of the PAM250 MDM matrix is 0.35 bits and its expected score per aligned pair is There is a series of PAM matrices, each suitable for aligning sequences with differing amount of divergence. For sequence searches this means that you must know ahead what type of relationships you want to optimize the search for. This is true for all types of 22 substitution matrices.

23 The Dayhoff, or PAM, matrices The Dayhoff matrices were extensively, and successfully used for more then 15 years. Dayhoff and her coworkers introduced two key concepts for calculating aa substitution matrices: i) substitution frequencies can be based on estimated mutation data, ii) sequence alignments can be effectively scored by log-odds values. PAM matrices also have several weaknesses. The assumption for independence of mutation events necessitates the use of closely related sequences. Thus, the estimated mutation rates needs to be extrapolated for longer evolutionary distances. This amplifies inaccuracies. One likely cause of inaccuracy might be that aa changes between closely related sequences are mainly due to single nucleotide changes, while more changes in distant sequences typically come from multinucleotide changes. Protein positions are also not equally mutable 23and mutagenesis hot spots and cold spots exist in most protein families.

24 Blocks substitution matrices (BLOSUM) Blocks are ungapped local multiple-sequence alignments. They can be automatically found from groups of related protein sequences (families). Blocks represent the most conserved sequence regions of protein families (motifs). BLOSUM matrices are based on the changes within sequence motifs of protein families. Common evolutionary origin (homology) of the motifs is implicit. Relations between all motifs, that are different by at least some threshold, are equally considered. seq1 CFTKGTQV seq2 CLAEGTRI seq3 CMNYSTRV seq4 CHPADTKV seq5 CLTADARI seq6 CISKFSHI seq7 CVTGDALV seq8 CLTGDALV seq9 CVTGDALV seq10 ALAYDEPI 24

25 Blocks substitution matrices (BLOSUM) Seq1 C Seq2 C Seq3 C Seq4 C Seq5 C Seq6 C Seq7 C Seq8 C Seq9 C Seq10 A Observed aa pairs: 1CC + 2CC + 3CC + 4CC + 5CC + 6CC + 7CC + 8CC + 9AC = 36CC + 9AC seq1 CFTKGTQV seq2 CLAEGTRI seq3 CMNYSTRV seq4 CHPADTKV seq5 CLTADARI seq6 CISKFSHI seq7 CVTGDALV seq8 CLTGDALV seq9 CVTGDALV seq10 ALAYDEPI 25

26 Blocks substitution matrices (BLOSUM) Observed aa pairs: 36CC + 9AC The column defined 45 aa pairs (10 9/2). Observed frequencies of pairs in the column are q CC =36/45=0.8 and q AC =9/45=0.2 The observed frequencies of single aa in the pairs are p i = q ii + Σ i j q ij /2 p C = /2 = 0.9 and p A = 0.2/2 = 0.1 The expected frequencies of aa pairs are e ii = p i p i = p i 2 and e ij i j = p i p j + p j p i = 2p i p j Henikoff & Henikoff PNAS USA 89: C C C C C C C C C A e CC = = 0.81, e AC = 2( ) = 0.18, and e AA = =

27 Blocks substitution matrices (BLOSUM) The observed and expected frequencies of aa pairs in the column can be used to calculate log odds scores for the pairs: s ij = log (q ij /e ij ). Observed and expected frequencies are cumulatively counted for all columns of the blocks database to calculate log odds scores of amino acid substitutions in conserved protein motifs. 27

28 Blocks substitution matrices (BLOSUM) To reduce multiple contributions of the most closely related family members to the aa pair frequencies, sequences are clustered in each block. Clustering is done by percent identity, e.g. all sequences within a block that are 80% identical to each other are clustered together. The aas in each cluster are weighted by the cluster size 1/c, where c is the number of sequences in the cluster. Seq1 A Seq2 A Seq3 S Seq4 C Seq5 C Cluster1 1/3 A,1/3 A,1/3 S Cluster2 1/2 C,1/2 C Observed aa pairs: 1/6 CA + 1/6 CA + 1/6 CS + 1/6 CA + 1/6 CA + 1/6 CS 28

29 Blocks substitution matrices (BLOSUM) Reducing the clustering percent can cause some blocks to be entirely clustered and thus eliminate their contribution of pairs. Reduced clustering percent also decreases the contribution of closely related sequences and lowers the information (H) of the resulting matrix. Matrices made from more distant sequences (lower %clustering and information) are more suitable for identifying distant sequence relationships. Information Henikoff & Henikoff PNAS USA 89: Information - H

30 Blocks substitution matrices (BLOSUM) Construction procedure of the Blocks database from sequences of protein families employs an aa substitution matrix. To make the BLOS UM matrices, the database was first made with a unitary substitution matrix. The constructed blocks were then used to make a BLOSUM matrix, that was now used to reconstruct the database. After three such iterations the Blocks database and BLOSUM matrices converged. Start with a unitary matrix, Iterate until convergence Sequence groups Make database Blocks DB Make matrix Using a PAM matrix or just parts of the data also resulted in very similar matrices. Currently the Blocks database is significantly larger and more diverse than the version used to construct the BLOSUM matrices. Nevertheless, the database yields the same matrices. Substitution matrix 30

31 Blocks substitution matrices (BLOSUM) Performance of substitution matrices depends on type of alignments used and the evolutionary distance between the aligned sequences. Alignment types can be global, local or ungappedlocal. Performance in the first two types also depends on the gap model and penalties. BLOSUM matrices were found to be very good for identifying long diverged sequences and for ungapped local alignments (BLAST algorithm). This probably reflects the data used for their construction. 31

32 More details, sources and things to do for next lecture Sources: Altschul Amino acid substitution matrices from an information theoretic perspective J Mol Biol 219: (1991), Henikoff Scores for sequence searches and alignments Curr Opin Struct Biol 6: (1996). Dayhoff et.al A model of evolutionary change in proteins In Atlas of Protein Sequence and Structure, Suppl 3, NBRF (1978). Henikoff & Henikoff Amino acid substitution matrices from protein blocks Proc. Natl. Acad. Sci. USA 89: (1992). Assignment: Read the source articles for this lecture. List the similarities and differences between the approaches for calculating the PAM and BLOSUM matrices. For example, types of data, underlying assumptions, dealing with the lack of independence 32 (dependence) of the sequence data.

33 More details, sources and things to do for next lecture For those who are not acquainted with information theory or want to be certain they know the basics of it: An information theory primer for molecular biologistshttp:// 33

34 Next lecture: Dynamic programming 34

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

Crick s early Hypothesis Revisited

Crick s early Hypothesis Revisited Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer

More information

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of

More information

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco

More information

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr), 48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.

More information

Electronic supplementary material

Electronic supplementary material Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and

More information

SUPPLEMENTARY DATA - 1 -

SUPPLEMENTARY DATA - 1 - - 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator

More information

Number-controlled spatial arrangement of gold nanoparticles with

Number-controlled spatial arrangement of gold nanoparticles with Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as

More information

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

NSCI Basic Properties of Life and The Biochemistry of Life on Earth NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,

More information

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,

More information

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus

More information

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

TM1 TM2 TM3 TM4 TM5 TM6 TM bp a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T

More information

Supplemental Figure 1.

Supplemental Figure 1. A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type

More information

Supporting Information

Supporting Information Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted

More information

Supplementary Information

Supplementary Information Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.

More information

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models) Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model

More information

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Protein Threading. Combinatorial optimization approach. Stefan Balev. Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline

More information

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,

More information

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University

More information

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Why do more divergent sequences produce smaller nonsynonymous/synonymous Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

More information

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

More information

The Trigram and other Fundamental Philosophies

The Trigram and other Fundamental Philosophies The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design

More information

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Supplemental Table 1. Primers used for cloning and PCR amplification in this study Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics 582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline

More information

The role of the FliD C-terminal domain in pentamer formation and

The role of the FliD C-terminal domain in pentamer formation and The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,

More information

Codon Distribution in Error-Detecting Circular Codes

Codon Distribution in Error-Detecting Circular Codes life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,

More information

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben

More information

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission

More information

Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

More information

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded

More information

part 3: analysis of natural selection pressure

part 3: analysis of natural selection pressure part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections

More information

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent

More information

Re- engineering cellular physiology by rewiring high- level global regulatory genes

Re- engineering cellular physiology by rewiring high- level global regulatory genes Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron

More information

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.

More information

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N

More information

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent

More information

CSE 549: Computational Biology. Substitution Matrices

CSE 549: Computational Biology. Substitution Matrices CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

More information

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR Online Resource 1. Primers used to generate constructs AtTIL-P91V, AtTIL-P92V, AtTIL-P95V and AtTIL-P98V and YFP(HPR) using overlapping PCR. pentr/d- TOPO-AtTIL was used as template to generate the constructs

More information

Near-instant surface-selective fluorogenic protein quantification using sulfonated

Near-instant surface-selective fluorogenic protein quantification using sulfonated Electronic Supplementary Material (ESI) for rganic & Biomolecular Chemistry. This journal is The Royal Society of Chemistry 2014 Supplemental nline Materials for ear-instant surface-selective fluorogenic

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

Supplementary Information

Supplementary Information Supplementary Information Arginine-rhamnosylation as new strategy to activate translation elongation factor P Jürgen Lassak 1,2,*, Eva Keilhauer 3, Max Fürst 1,2, Kristin Wuichet 4, Julia Gödeke 5, Agata

More information

Timing molecular motion and production with a synthetic transcriptional clock

Timing molecular motion and production with a synthetic transcriptional clock Timing molecular motion and production with a synthetic transcriptional clock Elisa Franco,1, Eike Friedrichs 2, Jongmin Kim 3, Ralf Jungmann 2, Richard Murray 1, Erik Winfree 3,4,5, and Friedrich C. Simmel

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Substitution matrices

Substitution matrices Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,

More information

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass Supporting Information for Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass Chun Ma, Marlene Mark Jensen, Barth F. Smets, Bo Thamdrup, Department of Biology, University of Southern

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

Symmetry Studies. Marlos A. G. Viana

Symmetry Studies. Marlos A. G. Viana Symmetry Studies Marlos A. G. Viana aaa aac aag aat caa cac cag cat aca acc acg act cca ccc ccg cct aga agc agg agt cga cgc cgg cgt ata atc atg att cta ctc ctg ctt gaa gac gag gat taa tac tag tat gca gcc

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Insects act as vectors for a number of important diseases of

Insects act as vectors for a number of important diseases of pubs.acs.org/synthbio Novel Synthetic Medea Selfish Genetic Elements Drive Population Replacement in Drosophila; a Theoretical Exploration of Medea- Dependent Population Suppression Omar S. Abari,,# Chun-Hong

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression JOURNAL OF BACTERIOLOGY, July 2008, p. 4979 4988 Vol. 190, No. 14 0021-9193/08/$08.00 0 doi:10.1128/jb.01996-07 Copyright 2008, American Society for Microbiology. All Rights Reserved. FliZ Is a Posttranslational

More information

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the JOURNAL OF BACTERIOLOGY, Sept. 1987, p. 4355-4360 0021-9193/87/094355-06$02.00/0 Copyright X) 1987, American Society for Microbiology Vol. 169, No. 9 Biosynthesis of Bacterial Glycogen: Primary Structure

More information

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments Supporting Information An Electric Single-Molecule Hybridisation Detector for short DNA Fragments A.Y.Y. Loh, 1 C.H. Burgess, 2 D.A. Tanase, 1 G. Ferrari, 3 M.A. Maclachlan, 2 A.E.G. Cass, 1 T. Albrecht*

More information

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry Electronic Supporting Information: Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry Monika Fischler, Alla Sologubenko, Joachim Mayer, Guido Clever, Glenn Burley,

More information

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

It is the author's version of the article accepted for publication in the journal Biosystems on 03/10/2015. It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015. The system-resonance approach in modeling genetic structures Sergey V. Petoukhov Institute

More information

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties

Lecture 1, 31/10/2001: Introduction to sequence alignment. The Needleman-Wunsch algorithm for global sequence alignment: description and properties Lecture 1, 31/10/2001: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties 1 Computational sequence-analysis The major goal of computational

More information

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae INFECrION AND IMMUNITY, OCt. 1994, p. 4515-4525 0019-9567/94/$04.00+0 Copyright 1994, American Society for Microbiology Vol. 62, No. 10 Identification of a Locus Involved in the Utilization of Iron by

More information

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene QD) 1990 Oxford University Press Nucleic Acids Research, Vol. 18, No. 17 5045 DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene David

More information

Motif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar

Motif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar Motif Finding Algorithms Sudarsan Padhy IIIT Bhubaneswar Outline Gene Regulation Regulatory Motifs The Motif Finding Problem Brute Force Motif Finding Consensus and Pattern Branching: Greedy Motif Search

More information

Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism

Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism Supplementary Information for Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism Jorick Franceus, Denise Pinel, Tom Desmet Corresponding author: Tom Desmet,

More information

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus How DNA barcoding can be more effective in microalgae identification: a case of cryptic diversity revelation in Scenedesmus (Chlorophyceae) Shanmei Zou, Cong Fei, Chun Wang, Zhan Gao, Yachao Bao, Meilin

More information

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6.

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6. Supplemental Data. Reinheimer and Kellogg (2009). Evolution of AGL6-like MADSbox genes in grasses (Poaceae): ovule expression is ancient and palea expression is new Supplemental Figure 1. Differences in

More information

Evidence for RNA editing in mitochondria of all major groups of

Evidence for RNA editing in mitochondria of all major groups of Proc. Natl. Acad. Sci. USA Vol. 91, pp. 629-633, January 1994 Plant Biology Evidence for RNA editing in mitochondria of all major groups of land plants except the Bryophyta RUDOLF HIESEL, BRUNO COMBETTES*,

More information

Evidence for Evolution: Change Over Time (Make Up Assignment)

Evidence for Evolution: Change Over Time (Make Up Assignment) Lesson 7.2 Evidence for Evolution: Change Over Time (Make Up Assignment) Name Date Period Key Terms Adaptive radiation Molecular Record Vestigial organ Homologous structure Strata Divergent evolution Evolution

More information

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab) Codon-model based inference of selection pressure (a very brief review prior to the PAML lab) an index of selection pressure rate ratio mode example dn/ds < 1 purifying (negative) selection histones dn/ds

More information

Lecture IV A. Shannon s theory of noisy channels and molecular codes

Lecture IV A. Shannon s theory of noisy channels and molecular codes Lecture IV A Shannon s theory of noisy channels and molecular codes Noisy molecular codes: Rate-Distortion theory S Mapping M Channel/Code = mapping between two molecular spaces. Two functionals determine

More information

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization. Supplementary information Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization. Benjamin Cressiot #, Sandra J. Greive #, Wei Si ^#,

More information

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Inlets are labelled in blue, outlets are labelled in red, and static channels

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Local Alignment Statistics

Local Alignment Statistics Local Alignment Statistics Stephen Altschul National Center for Biotechnology Information National Library of Medicine National Institutes of Health Bethesda, MD Central Issues in Biological Sequence Comparison

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS

HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS Symmetry: Culture and Science Vol. 16, No. 3, 247-266, 2005 HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS Sergey V. Petoukhov Address: Department of Biomechanics,

More information

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each JOURNAL OF BACTEROLOGY, JUlY 1988, p. 3305-3309 0021-9193/88/073305-05$02.00/0 Copyright 1988, American Society for Microbiology Vol. 170, No. 7 Construction of a Minimum-Size Functional Flagellin of Escherichia

More information

Characterization of Multiple-Antimicrobial-Resistant Salmonella Serovars Isolated from Retail Meats

Characterization of Multiple-Antimicrobial-Resistant Salmonella Serovars Isolated from Retail Meats APPLIED AND ENVIRONMENTAL MICROBIOLOGY, Jan. 2004, p. 1 7 Vol. 70, No. 1 0099-2240/04/$08.00 0 DOI: 10.1128/AEM.70.1.1 7.2004 Copyright 2004, American Society for Microbiology. All Rights Reserved. Characterization

More information

Chemical Biology on Genomic DNA: minimizing PCR bias. Electronic Supplementary Information (ESI) for Chemical Communications

Chemical Biology on Genomic DNA: minimizing PCR bias. Electronic Supplementary Information (ESI) for Chemical Communications Electronic Supplementary Material (ESI) for ChemComm. This journal is The Royal Society of Chemistry 2014 Chemical Biology on Genomic DA: minimizing PCR bias Gordon R. McInroy, Eun-Ang Raiber, & Shankar

More information

NEW DNA CYCLIC CODES OVER RINGS

NEW DNA CYCLIC CODES OVER RINGS NEW DNA CYCLIC CODES OVER RINGS NABIL BENNENNI, KENZA GUENDA AND SIHEM MESNAGER arxiv:1505.06263v1 [cs.it] 23 May 2015 Abstract. This paper is dealing with DNA cyclic codes which play an important role

More information

codon substitution models and the analysis of natural selection pressure

codon substitution models and the analysis of natural selection pressure 2015-07-20 codon substitution models and the analysis of natural selection pressure Joseph P. Bielawski Department of Biology Department of Mathematics & Statistics Dalhousie University introduction morphological

More information

Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber.

Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber. Supplementary information: Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber. Ramon Rosselló-Mora 1, Marianna Lucio², Arantxa Peña 3, Jocelyn Brito-Echeverría

More information

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day conditions. Photo was taken when the wild type plant started to bolt. Scale bar represents 1 cm. Supplemental Figure 2. Flowering

More information

BIOL 502 Population Genetics Spring 2017

BIOL 502 Population Genetics Spring 2017 BIOL 502 Population Genetics Spring 2017 Lecture 1 Genomic Variation Arun Sethuraman California State University San Marcos Table of contents 1. What is Population Genetics? 2. Vocabulary Recap 3. Relevance

More information

Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling

Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling Zhang et al. BMC Bioinformatics (2017) 18:335 DOI 10.1186/s12859-017-1743-4 RESEARCH ARTICLE Open Access Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling Sheng

More information

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC Table S1. DNA primers used in this study. Name ydci P1ydcIkd3 Sequence GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC Kd3ydcIp2 lacz fusion YdcIendP1 YdcItrgP2 GAC AGC

More information