Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Size: px
Start display at page:

Download "Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi"

Transcription

1 Bioinformatics Sequence Analysis An introduction Part 8 Mahdi Vasighi

2 Sequence analysis Some of the earliest problems in genomics concerned how to measure similarity of DNA and protein sequences, either within a genome, or across the genomes of different individuals, or across the genomes of different species. Why sequence similarity is important? Similar sequence = similar information = same ancestor Similar sequence = similar structure = similar function

3 Homology of sequences Homology between protein or DNA sequences is defined in terms of shared ancestry. Two segments of DNA can have shared ancestry because of either a speciation event (orthologs) a duplication event (paralogs)

4 Mutation Mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. Environmental factors Radiation Chemicals Mistakes in replication or repair

5 Mutation Classification of mutation types: By effect on structure Small scale mutations Large scale mutations By impact on protein sequence

6 Mutation Classification By effect on structure Small scale mutations Point mutations Often caused by chemicals or malfunction of DNA replication, exchange a single nucleotide for another. Transition AGGCGTATTGCATCGTTAAACGCGC AGATGTATTGCGTCGCTAAACGCGC Most common is the transition that exchanges a purine for a purine (A G) or a pyrimidine for a pyrimidine, (C T)

7 Mutation Classification By effect on structure Small scale mutations Point mutations Often caused by chemicals or malfunction of DNA replication, exchange a single nucleotide for another. Transversion AGGCGTATTGCATCGTTAAACGCGC AGTGGTATTGCCTCGATAAACGCGC Less common is a Transversion, which exchanges a purine for a pyrimidine or a pyrimidine for a purine (C/T A/G).

8 Mutation Classification By effect on structure Small scale mutations Point mutations Often caused by chemicals or malfunction of DNA replication, exchange a single nucleotide for another.

9 Mutation Classification by impact on protein sequence Small scale mutations Point mutations Point mutations that occur within the protein coding region of a gene, depending upon what the erroneous codon codes for: Silent mutation M R I A S L N Stop ATG CGT ATT GCA TCG TTA AAC TAA C... ATG CGT ATC GCA TCA TTG AAC TAA C... M R I A S L N Stop New codon translated for the same amino acid

10 Mutation Classification by impact on protein sequence Small scale mutations Point mutations Point mutations that occur within the protein coding region of a gene, depending upon what the erroneous codon codes for: Missense mutations M R I A S L N Stop ATG CGT ATT GCA TCG TTA AAC TAA C... ATG CGT ACT GCA TTG TTA AAC TAA C... M R T A L L N Stop New codon translated for different amino acids.

11 Mutation Classification by impact on protein sequence Small scale mutations Point mutations Point mutations that occur within the protein coding region of a gene, depending upon what the erroneous codon codes for: Non-sense mutations M R I A S L N Stop ATG CGT ATT GCA TCG TTA AAC TAA C... ATG CGT ATT GCA TCG TAA AAC TAA C... M R I A S Stop New codon translated for a stop and can truncate the protein.

12 Mutation Classification by impact on protein sequence Small scale mutations Point mutations Point mutations that occur within the protein coding region of a gene, depending upon what the erroneous codon codes for: Conservative mutations M R I A S L N Stop ATG CGT ATT GCA TCG TTA AAC TAA C... Non-polar (Hydrophobic) ATG CGT ATT GCA TCG TTC AAC TAA C... M R I A S F N Stop A change in a DNA or RNA sequence that leads to the replacement of one amino acid with a biochemically similar one.

13 Mutation Classification By effect on structure Large scale mutations occur in chromosomal structure:

14 Mutation Classification By effect on structure Large scale mutations occur in chromosomal structure:

15 Mutation Classification By effect on structure Large scale mutations occur in chromosomal structure:

16 Mutation

17 Sequence Alignment Sequence comparison can be used: To establish evolutionary relationships among organisms To comparison may allow identification of functionally conserved sequences To find structural similarity To identify corresponding genes in organisms Sequence Alignment is Arranging two or more sequences (DNA, RNA or protein) by searching for a series of individual characters or character patterns that are in the same order in the sequences. ATGCGTATTGCATCGTTAAACTAA ATGCGTATTGCA---TTAAACTAA AT-CGT---GCATCGTTAAACTAA

18 Sequence Alignment ACGTCTAG ACTCTAG- 2 matches 5 mismatches 1 not aligned ACGTCTAG -ACTCTAG 5 matches 2 mismatches 1 not aligned ACGTCTAG AC-TCTAG 7 matches 0 mismatches 1 not aligned this seemingly simple alignment operation is not as simple as it sounds!...aactgagtttacgctcataga... T---CT-A--G How can we measure distance between two strings or biological sequences? edit distance

19 Sequence Alignment There are two types of sequence alignment: 1. Global alignment T T G C G T A T T G C A T C G T T G C C T T T T C C A T Local alignment T A C G T T C G

20 Sequence Alignment Our approach is guided by biology: It is possible for evolutionarily related proteins and nucleic acids to display substitutions at particular positions Substitution (point mutation) Insertion of short segments Deletion of short segments Segmental duplication Inversion Translocation GTATTGCATCGTTAAA GTATTGCA---TTAAA Insertions and/or deletions are called indels. Comparing two genes, it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known.

21 Pairwise Sequence Alignment Alignment of two sequences is performed using the following methods: 1. Dot matrix analysis 2. The dynamic programming (DP) algorithm 3. Word or k-tuple methods, such as used in BLAST and FASTA

22 Pairwise Sequence Alignment Dot matrix analysis A dot matrix analysis is primarily a method for comparing two sequences to look for possible alignment of characters between the sequences. The method is also used for: finding direct or inverted repeats in protein and DNA sequences, predicting regions in RNA that are self-complementary

23 Pairwise Sequence Alignment Dot matrix analysis A T G C C C A T A G T T G C A T A G Any region of similar sequence is revealed by a diagonal row of dots. Isolated dots not on the diagonal represent random matches that are probably not related to any significant alignment.

24 Pairwise Sequence Alignment Dot matrix analysis Detection of matching regions may be improved by filtering out random matches in a dot matrix by defining a window: T T G C A T A G A T G C C C A T A G Window size = 3 Stringency = 2 typical window size for DNA sequences is 15 and a suitable match requirement in this window is 10. For protein sequences, the matrix is often not filtered, but a window size of 3 and a match requirement of 2 will highlight matching regions

25 Pairwise Sequence Alignment Dot matrix analysis A T G C C C A T A G T T G C A T A G A T G C C C A T A G T T G C - - A T A G

26 Pairwise Sequence Alignment Dot matrix analysis G C T A G T C G A T G C T G A T C G A T G C T - G A T C G - - G C T A G - T C G

27 Pairwise Sequence Alignment Dot matrix analysis Seq 1 Seq 2 G C T A G T G C T G A T G C T - G A T G C T A G - T

28 Pairwise Sequence Alignment Dot matrix analysis Seq 2 G C T A G T Seq 1 G C T - G A T G C T - G A T G C T A G - T

29 Pairwise Sequence Alignment Dot matrix analysis Seq 1 Seq 2 G C T A G - T G C T - G A T G C T - G A T G C T A G - T

30 Pairwise Sequence Alignment Dot matrix analysis A T C G T G A T C G A T C G T G A T C G

31 Pairwise Sequence Alignment Dot matrix analysis Gene 1 Gene 1 Gene 2 Gene 2

32 Pairwise Sequence Alignment Dot matrix analysis MATLAB (matrix laboratory) is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran. Bioinformatics Toolbox offers an integrated software environment for genome and proteome analysis. In particular, it provides access to genomic and proteomic data formats, analysis techniques, and specialized visualizations for genomic and proteomic sequence and microarray analysis.

33 Pairwise Sequence Alignment Dot matrix analysis getgenbank S = getgenbank('m10051') S= LocusName: 'HUMINSR' LocusSequenceLength: '4723' Purpose: Retrieve sequence from GenBank database LocusNumberofStrands: '' Syntax: Data = getgenbank('accessionnumber',... LocusTopology: 'linear' LocusMoleculeType: 'mrna' 'PropertyName',PropertyValue...) Unique Accession: identifier 'M10051' for a sequence Version: record. 'M ' Example S = getgenbank('m10051') CDS: [ ] S = getgenbank('m10051, sequence,true) LocusGenBankDivision: 'PRI' LocusModificationDate: '06-JAN-1995' Definition: 'Human insulin receptor mrna, complete cds.' GI: '186439' Keywords: 'insulin receptor; tyrosine kinase.' Segment: [] Source: 'Homo sapiens (human) SourceOrganism: [3x65 char] Reference: {[1x1 struct]} Comment: [14x67 char] Features: [51x74 char] Sequence: [1x4723 char] SearchURL: [1x105 char] RetrieveURL: [1x95 char]

34 getembl Purpose:Retrieve sequence information from EMBL database pdbstruct = Identification: [1x1 struct] Syntax: EMBLData = getembl(accessionnumber) Example Pairwise Sequence Alignment Dot matrix analysis emblout = getembl( X00558 ) Compound: [4x23 char] Source: DateUpdated: [4x38 char] [1x46 char] mblout = getembl( X00558, ToFile, c:\project\rat_protein.txt ) getpdb Purpose: Retrieve protein structure Remark1: data Reference: [1x1 struct] from {[1x1 Protein struct]} Data Bank (PDB) database DatabaseCrossReference: Remark2: [1x1 struct] '' Remark3: Comments: [1x1 struct] '' Syntax: PDBStruct = getpdb(pdbid) Assembly: '' Example pdbstruct = getpdb( 5CYT ) emblout = getembl('x00558') pdbstruct = getpdb('5cyt') emblout = CYTOCHROME C' Created)' Header: Accession: [1x1 struct] 'X00558' SequenceVersion: Title: 'REFINEMENT 'X ' OF MYOGLOBIN AND DateCreated: '13-JUN-1985 (Rel. 06, Keywords: Description: 'ELECTRON 'Rat TRANSPORT liver (HEME apolipoprotein PROTEIN)' A-I mrna (apoa-i)' ExperimentData: 'X-RAY Keyword: DIFFRACTION' [1x44 char] Authors: OrganismSpecies: 'T.TAKANO''Rattus norvegicus (Norway RevisionDate: rat)' [1x2 struct] OrganismClassification: Superseded: [1x1 struct] [3x75 char] Journal: Organelle: [1x1 struct] '' Remark4: [2x59 char] Remark100: [2x59 Feature: char] [22x75 char] Remark200: BaseCount: [49x59 char] [1x1 struct] Remark280: Sequence: [6x59 char] [1x877 char] RetrieveURL: [1x64 char]

35 Pairwise Sequence Alignment Dot matrix analysis seqdotplot Purpose: Create dot plot of two sequences Syntax: seqdotplot(seq1,seq2) seqdotplot(seq1,seq2, Window, Number) Enter an integer for the size of a window. an integer for the number of characters within the window that match. Example Prion protein is a small protein found in high quantity in the brain of animals infected with moufflon = getgenbank('ab060288','sequence',true); mad-cow disease takin = getgenbank('ab060290','sequence',true); seqdotplot(moufflon,takin,11,7)

36 Pairwise Sequence Alignment Dot matrix analysis

37 Pairwise Sequence Alignment Dot matrix analysis

38 nt2aa Pairwise Sequence Alignment Dot matrix analysis Purpose: Convert nucleotide sequence to amino acid sequence Syntax: SeqAA = nt2aa(seqnt) Example S = getgenbank('m10051, sequence,true) p = nt2aa(s.sequence([[2699:2885],[3673:3818]])) >> S.CDS ans = location: 'join( , )' gene: 'INS' product: 'insulin' codon_start: '1' indices: [ ] protein_id: 'AAA ' db_xref: 'GI:386829' note: '' translation: [1x110 char] text: [9x58 char]

39 Pairwise Sequence Alignment Dot matrix analysis nt2int Purpose: Convert nucleotide sequence from letter to integer representation Syntax: SeqInt = nt2int(seqchar) Example s = nt2int('actgctagc') randseq Purpose: Generate random sequence from finite alphabet Syntax: Seq = randseq(seqlength) Example S = randseq(20) S = randseq(20,'alphabet', amino')

40 Pairwise Sequence Alignment Dot matrix analysis molviewer Purpose: Display and manipulate 3-D molecule structure Syntax: molviewer molviewer(pdbid) Example molviewer molviewer('5cyt')

41 Pairwise Sequence Alignment Dot matrix analysis -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- WHICH-ONE-IS-BETTER?---

42 Find the coding sequence of Hemoglobin subunit beta for Human, Chimpanzee and rat. Analysis them using MATLAB dotplot tool

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

B I O I N F O R M A T I C S

B I O I N F O R M A T I C S B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be K Van Steen CH4 : 1 CHAPTER 4: SEQUENCE COMPARISON

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

What is the central dogma of biology?

What is the central dogma of biology? Bellringer What is the central dogma of biology? A. RNA DNA Protein B. DNA Protein Gene C. DNA Gene RNA D. DNA RNA Protein Review of DNA processes Replication (7.1) Transcription(7.2) Translation(7.3)

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. aelfarash@aun.edu.eg Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

GENERAL BIOLOGY LABORATORY EXERCISE Amino Acid Sequence Analysis of Cytochrome C in Bacteria and Eukarya Using Bioinformatics

GENERAL BIOLOGY LABORATORY EXERCISE Amino Acid Sequence Analysis of Cytochrome C in Bacteria and Eukarya Using Bioinformatics GENERAL BIOLOGY LABORATORY EXERCISE Amino Acid Sequence Analysis of Cytochrome C in Bacteria and Eukarya Using Bioinformatics INTRODUCTION: All life forms undergo metabolic processes to obtain energy.

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. amir_effat@yahoo.com Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Biol478/ August

Biol478/ August Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database

More information

Pairwise Sequence Alignment

Pairwise Sequence Alignment Introduction to Bioinformatics Pairwise Sequence Alignment Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Outline Introduction to sequence alignment pair wise sequence alignment The Dot Matrix Scoring

More information

Molecular Evolution and DNA systematics

Molecular Evolution and DNA systematics Biology 4505 - Biogeography & Systematics Dr. Carr Molecular Evolution and DNA systematics Ultimately, the source of all organismal variation that we have examined in this course is the genome, written

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2 Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

mrna Codon Table Mutant Dinosaur Name: Period:

mrna Codon Table Mutant Dinosaur Name: Period: Mutant Dinosaur Name: Period: Intro Your dinosaur is born with a new genetic mutation. Your job is to map out the genes that are influenced by the mutation and to discover how the new dinosaurs interact

More information

Collected Works of Charles Dickens

Collected Works of Charles Dickens Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal

Genômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Kumud Joseph Kujur, Sumit Pal Singh, O.P. Vyas, Ruchir Bhatia, Varun Singh* Indian Institute of Information

More information

Exploring Evolution & Bioinformatics

Exploring Evolution & Bioinformatics Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The What is a genome? A genome is an organism's complete set of genetic instructions. Single strands of DNA are coiled up

More information

Molecular Population Genetics

Molecular Population Genetics Molecular Population Genetics The 10 th CJK Bioinformatics Training Course in Jeju, Korea May, 2011 Yoshio Tateno National Institute of Genetics/POSTECH Top 10 species in INSDC (as of April, 2011) CONTENTS

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer.

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer. Midterm Review Guide Name: Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer. 4. Fill in the Organic Compounds chart : Elements Monomer

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

www.lessonplansinc.com Topic: Dinosaur Evolution Project Summary: Students pretend to evolve two dinosaurs using genetics and watch how the dinosaurs adapt to an environmental change. This is a very comprehensive

More information

www.lessonplansinc.com Topic: Dinosaur Evolution Project Summary: Students pretend to evolve two dinosaurs using genetics and watch how the dinosaurs adapt to an environmental change. This is a very comprehensive

More information

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are

More information

NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES

NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES Conery, J.S. and Lynch, M. Nucleotide substitutions and evolution of duplicate genes. Pacific Symposium on Biocomputing 6:167-178 (2001). NUCLEOTIDE SUBSTITUTIONS AND THE EVOLUTION OF DUPLICATE GENES JOHN

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

Biology 2018 Final Review. Miller and Levine

Biology 2018 Final Review. Miller and Levine Biology 2018 Final Review Miller and Levine bones blood cells elements All living things are made up of. cells If a cell of an organism contains a nucleus, the organism is a(n). eukaryote prokaryote plant

More information

Review sheet for the material covered by exam III

Review sheet for the material covered by exam III Review sheet for the material covered by exam III WARNING: Like last time, I have tried to be complete, but I may have missed something. You are responsible for all the material discussed in class. This

More information

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004

CSE : Computational Issues in Molecular Biology. Lecture 6. Spring 2004 CSE 397-497: Computational Issues in Molecular Biology Lecture 6 Spring 2004-1 - Topics for today Based on premise that algorithms we've studied are too slow: Faster method for global comparison when sequences

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics

Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics Application of Associative Matrices to Recognize DNA Sequences in Bioinformatics 1. Introduction. Jorge L. Ortiz Department of Electrical and Computer Engineering College of Engineering University of Puerto

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information