Practical Bioinformatics

Similar documents
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

Crick s early Hypothesis Revisited

Advanced topics in bioinformatics

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Supplementary Information for

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

Number-controlled spatial arrangement of gold nanoparticles with

SUPPLEMENTARY DATA - 1 -

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

Electronic supplementary material

Supporting Information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

Supplementary Information

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

SUPPLEMENTARY INFORMATION

Supplemental Figure 1.

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

SUPPLEMENTARY INFORMATION

Protein Threading. Combinatorial optimization approach. Stefan Balev.

The Trigram and other Fundamental Philosophies

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

The role of the FliD C-terminal domain in pentamer formation and

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Codon Distribution in Error-Detecting Circular Codes

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Re- engineering cellular physiology by rewiring high- level global regulatory genes

Why do more divergent sequences produce smaller nonsynonymous/synonymous

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

Introduction to Molecular Phylogeny

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

Near-instant surface-selective fluorogenic protein quantification using sulfonated

part 3: analysis of natural selection pressure

Timing molecular motion and production with a synthetic transcriptional clock

Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Supplementary Information

AtTIL-P91V. AtTIL-P92V. AtTIL-P95V. AtTIL-P98V YFP-HPR

Chain-like assembly of gold nanoparticles on artificial DNA templates via Click Chemistry

Using algebraic geometry for phylogenetic reconstruction

Supporting Information. An Electric Single-Molecule Hybridisation Detector for short DNA Fragments

Pathways and Controls of N 2 O Production in Nitritation Anammox Biomass

Insects act as vectors for a number of important diseases of

Biosynthesis of Bacterial Glycogen: Primary Structure of Salmonella typhimurium ADPglucose Synthetase as Deduced from the

Symmetry Studies. Marlos A. G. Viana

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

FliZ Is a Posttranslational Activator of FlhD 4 C 2 -Dependent Flagellar Gene Expression

Identification of a Locus Involved in the Utilization of Iron by Haemophilus influenzae

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

It is the author's version of the article accepted for publication in the journal "Biosystems" on 03/10/2015.

Motif Finding Algorithms. Sudarsan Padhy IIIT Bhubaneswar

Glucosylglycerate phosphorylase, a novel enzyme specificity involved in compatible solute metabolism

Evidence for RNA editing in mitochondria of all major groups of

Supplemental Figure 1. Differences in amino acid composition between the paralogous copies Os MADS17 and Os MADS6.

DNA sequence analysis of the imp UV protection and mutation operon of the plasmid TP110: identification of a third gene

Evidence for Evolution: Change Over Time (Make Up Assignment)

NEW DNA CYCLIC CODES OVER RINGS

Evolutionary Analysis of Viral Genomes

supplementary information

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

Characterization of Multiple-Antimicrobial-Resistant Salmonella Serovars Isolated from Retail Meats

160, and 220 bases, respectively, shorter than pbr322/hag93. (data not shown). The DNA sequence of approximately 100 bases of each

Codon-model based inference of selection pressure. (a very brief review prior to the PAML lab)

Chemical Biology on Genomic DNA: minimizing PCR bias. Electronic Supplementary Information (ESI) for Chemical Communications

A functional homologue of goosecoid in Drosophila

Supporting Information

Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber.

codon substitution models and the analysis of natural selection pressure

Finding Regulatory Motifs in DNA Sequences

Dissertation. presented by: Hung-wei Sung Diploma: Master of Life Science born intaipei, Taiwan Oral examination:

Supplemental data. Vos et al. (2008). The plant TPX2 protein regulates pro-spindle assembly before nuclear envelope breakdown.

How DNA barcoding can be more effective in microalgae. identification: a case of cryptic diversity revelation in Scenedesmus

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Diversity of Chlamydia trachomatis Major Outer Membrane

HADAMARD MATRICES AND QUINT MATRICES IN MATRIX PRESENTATIONS OF MOLECULAR GENETIC SYSTEMS

Supplementary information. Porphyrin-Assisted Docking of a Thermophage Portal Protein into Lipid Bilayers: Nanopore Engineering and Characterization.

Electronic Supporting Information for

Gene manipulation in Bacillus thuringiensis : Biopesticide Development

Estimating Phred scores of Illumina base calls by logistic regression and sparse modeling

Appendix B Protein-Signaling Networks from Single-cell Fluctuations and Information Theory Profiling B.1. Introduction

MicroGenomics. Universal replication biases in bacteria

Supplemental Figure 1. Phenotype of ProRGA:RGAd17 plants under long day

Aoife McLysaght Dept. of Genetics Trinity College Dublin

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Table S1. DNA oligonucleo3des used for 3D pol mutagenesis. Fingers Domain Entry Channel. Fidelity

Lecture IV A. Shannon s theory of noisy channels and molecular codes

An Analytical Model of Gene Evolution with 9 Mutation Parameters: An Application to the Amino Acids Coded by the Common Circular Code

Supporting Material. Protein Signaling Networks from Single Cell Fluctuations and Information Theory Profiling

BIOL 502 Population Genetics Spring 2017

Transcription:

5/2/2017

Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C )

Dictionaries g e n e t i c C o d e = { TTT : F, TTC : F, TTA : L, TTG : L, CTT : L, CTC : L, CTA : L, CTG : L, ATT : I, ATC : I, ATA : I, ATG : M, GTT : V, GTC : V, GTA : V, GTG : V, TCT : S, TCC : S, TCA : S, TCG : S, CCT : P, CCC : P, CCA : P, CCG : P, ACT : T, ACC : T, ACA : T, ACG : T, GCT : A, GCC : A, GCA : A, GCG : A, TAT : Y, TAC : Y, TAA :, TAG :, CAT : H, CAC : H, CAA : Q, CAG : Q, AAT : N, AAC : N, AAA : K, AAG : K, GAT : D, GAC : D, GAA : E, GAG : E, TGT : C, TGC : C, TGA :, TGG : W, CGT : R, CGC : R, CGA : R, CGG : R, AGT : S, AGC : S, AGA : R, AGG : R, GGT : G, GGC : G, GGA : G, GGG : G }

Exercise: Transforming sequences 1 Write a function to return the antisense strand of a DNA sequence in 3 5 orientation. 2 Write a function to return the complement of a DNA sequence in 5 3 orientation. 3 Write a function to translate a DNA sequence

Why compare sequences?

Why compare sequences? To find genes with a common ancestor To infer conserved molecular mechanism and biological function To find short functional motifs To find repetitive elements within a sequence To predict cross-hybridizing sequences (e.g., in RNAi or CRISPR design) To find genomic origin of imperfectly sequenced or spliced fragments (e.g., in deep sequencing experiments) To predict nucleotide secondary structure

Nomenclature Homologs heritable elements with a common evolutionary origin.

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome.

Nomenclature Homologs heritable elements with a common evolutionary origin. Orthologs homologs arising from speciation. Paralogs homologs arising from duplication and divergence within a single genome. Xenologs homologs arising from horizontal transfer. Onologs homologs arising from whole genome duplication.

Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch) Local Alignment An optimal pair of subsequences is taken from the two sequences and globally aligned (e.g., Smith-Waterman)

Dotplots 1 Unbiased view of all ungapped alignments of two sequences 2 Noise can be filtered by applying a smoothing window to the diagonals.

Exercise: Scoring an ungapped alignment s ={ A : { A : 1. 0, T : 1.0, G : 1.0, C : 1.0}, T : { A : 1.0, T : 1. 0, G : 1.0, C : 1.0}, G : { A : 1.0, T : 1.0, G : 1. 0, C : 1.0}, C : { A : 1.0, T : 1.0, G : 1.0, C : 1.0}}

Exercise: Scoring an ungapped alignment s ={ A : { A : 1. 0, T : 1.0, G : 1.0, C : 1.0}, T : { A : 1.0, T : 1. 0, G : 1.0, C : 1.0}, G : { A : 1.0, T : 1.0, G : 1. 0, C : 1.0}, C : { A : 1.0, T : 1.0, G : 1.0, C : 1.0}} S(x, y) = N s(x i, y i ) i

Exercise: Scoring an ungapped alignment s ={ A : { A : 1. 0, T : 1.0, G : 1.0, C : 1.0}, T : { A : 1.0, T : 1. 0, G : 1.0, C : 1.0}, G : { A : 1.0, T : 1.0, G : 1. 0, C : 1.0}, C : { A : 1.0, T : 1.0, G : 1.0, C : 1.0}} S(x, y) = N s(x i, y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment.

Exercise: Scoring an ungapped alignment s ={ A : { A : 1. 0, T : 1.0, G : 1.0, C : 1.0}, T : { A : 1.0, T : 1. 0, G : 1.0, C : 1.0}, G : { A : 1.0, T : 1.0, G : 1. 0, C : 1.0}, C : { A : 1.0, T : 1.0, G : 1.0, C : 1.0}} S(x, y) = N s(x i, y i ) i 1 Given two equal length sequences and a scoring matrix, return the alignment score for a full length, ungapped alignment. 2 Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where - represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap.

Exercise: Scoring a gapped alignment 1 Given two equal length gapped sequences (where - represents a gap) and a scoring matrix, calculate an alignment score with a -1 penalty for each base aligned to a gap. 2 Write a new scoring function with separate penalties for opening a zero length gap (e.g., G = -11) and extending an open gap by one base (e.g., E = -1). gaps S gapped (x, y) = S(x, y) + (G + E len(i)) i