A Field Guide. GenBank Records. A Typical GenBank Record GenBank Record: Feature Table. The Flatfile Format. part 2. Header. Feature Table.

Size: px
Start display at page:

Download "A Field Guide. GenBank Records. A Typical GenBank Record GenBank Record: Feature Table. The Flatfile Format. part 2. Header. Feature Table."

Transcription

1 A Field Guide part 2 GenBank Records The Flatfile Format Header Feature Table Sequence A Typical GenBank Record GenBank Record: Feature Table LOCUS NM_ bp mrna linear ROD 28-OCT-2004 DEFINITION Mus musculus REV1-like(S. cerevisiae)(rev1l),mrna ACCESSION NM_ VERSION NM_ GI: KEYWORDS. 1

2 GenBank Record: Feature Table, con t. GenBank Record: sequence Indexing for Nucleotide UID Global Entrez Search: HFE 56!""#"#$%&& ' ()* +,- ' ##./ %"""4.""" ) $""01"$1#& $"".1"#4$"".1"/ ) 2

3 Entrez Nucleotide: HFE 137 records Smarter Query hfe[title] AND human[orgn] 42 records hfe[title] Not HFE Curated HFE splice variants (11 total) hfe[title] AND human[orgn] (con t) Primary data Preview/Index Gateway to Advanced Searches 3

4 Preview/Index Preview/Index: Properties, srcdb 2 2 Properties srcdb Preview/Index: Properties, srcdb Preview/Index: Properties, srcdb AND srcdb refseq[properties] AND srcdb ddbj/embl/genbank[properties] 4

5 Properties Search Field Properties Search Field: biomol #1 hfe 137 #2 hfe[title] AND human[orgn] 42 #3 #2 AND srcdb refseq[prop] 11 #4 #2 AND srcdb ddbj/embl/genbank[prop] 31 #5 #4 AND gbdiv pri[prop] 29 #4 #4 AND gbdiv est[prop] #1 hfe 116 #2 hfe[title] AND human[orgn] 42 #3 #2 AND biomol mrna[prop] 29 #4 #2 AND biomol genomic[prop] 13 9 :, 2 :, 2 More Queries 2;) 6 < +3+)8 3 4 srcdb refseq reviewed[prop] AND transcript[title] AND variant[title] More Queries 2;) 6 < +3+)8 3 4 srcdb refseq reviewed[prop] AND transcript[title] AND variant[title] 6 <9 ), 4 topoisomerase[gene name] AND archaea[organism] 9 $= 2[chromosome],: human[organism] AND gene omim [filter] 2 4 integral to plasma membrane [gene ontology] AND cancer[dis] 5

6 Other Entrez Databases >94 3 +, rat[organism] NOT 0[mrna count] 874 $ microsat[snp Class] AND 1[Map Weight] AND 2[Chromosome]) AND human[orgn] 9 9 ' ' >884 9 ) #$ Genethon[Map Name] AND human[organism] AND 12[chromosome] 8 4 )2 2$? bacteria[organism] AND kinase AND :002.00[resolution] 6 <9 Gen Biol: Gen Resources Genomic Biology 6

7 Map Viewer Genome Annotation Updates Gen Biol: Gen Resources Genome Projects: microb Genome Projects: microb #%6 ",2B $ 7 ;## 7

8 Gen Biol: Gen Resources Gen Biol: Gen Resources Gen Biol: Gen Resources ' 5 6 <9 8

9 Homologene No longer orthologs UniGene based orthologs paralogs Protein similarities first Guided by taxonomic tree frog Includes A chick orthologs A mouse and paralogs A mouse B chick B frog B Homologene Cluster MLH1 Cluster A-chain gene B-chain gene gene duplication early globin gene Rice Homolog ' 5 6 <9 9

10 List View Mouse MapViewer: Mouse ADAR MapViewer: Mouse ADAR, 28 Hits 10

11 Mouse MapViewer: Gene Filter MV Hs ADAR exon 3 UTR Maps & Options MapViewer ;;8 ;;,2, 2 '68!A A A A '!:, 9 9 A6+ 7 +) ;;A A ;; ;;9 ;; 9 C!96 ;;+5;; C1 +A;+5 C;D,A 5 + = SNP +)8+, 9 11

12 Maps & Options 8 +,:,+ 5,:,+,:, ' 5 6 <9 12

13 13

14 Basic Local Alignment Search Tool Outline Web BLAST pre-computed results how BLAST works words; scoring matrices; statistics specialized BLAST algorithms what s new, or important example oligo search BLAST Web Searches, 2006 BLAST Web Searches, 2005 $"""""1 $"""""1 14

15 4 +8 'E, ' >9 5!" E 15

16 Best hits # 3D structures CDD-Search $ %%&' $( &)' $)( ( &' * 8# 8$ 8# 8$ +( 92 E,)- #- 2)F G) $- 82) %- 6 2 B > )(587* B 9( 2* B 9( 2; * 16

17 -.(()/0 112 ATGCTGCTAGTCGATGACGTAGCTA ATGCTGCTAGT TGCTGCTAGTC GCTGCTAGTCG... - AIEKCYTGCTLAQEADDTA AIE IEK LEK, IDK, IQK, IER, IDR, etc EKC () KCY CYT (%()% ()/%*%0 A 4 R -1 5 N D C Q E G H I L K M F P S T W Y V X A R N D C Q E G H I L K M F P S T W Y V X 2!#3.45" 78978:!;11" 78;1< 78; ,=7* Nucleotide: one exact match ATGCTGCTAGTCGATGACGTAGCTA TAGTCGATGA Protein: two matches within 40 residues PHAIEKCYTGCTLAQEADDTA IDK EAD 17

18 !*" Query: IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESILEV YLS 15 HFL 18 YLT 12 HFV 15 YVS 12 HFS 14 2 YIT 10 HWL 13 etc NFL 13 2 DFL 12 HWV 10 (;)*## etc Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47 +E YA YL K F+ YLSL +SP+ +DVNVHP+K V HFL+++ I Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI 333 : ;)) )) 5 B Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI 47 +E YA YL K F+ YLSL +SP+ +DVNVHP+K V HFL+++ I Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEI Query 1 IETVYAAYLPKNTHPFLYLSLEISPQNVDVNVHPTKHEVHFLHEESI-LEV 50 +E YA YL K F+YLSL +SP+ +DVNVHP+K VHFL+++ I + + Sbjct 287 LEETYAKYLHKGASYFVYLSLNMSPEQLDVNVHPSKRIVHFLYDQEIATSI ; (587* ;HH HH ))3) (2* 2%"2$"2 2" #0 2 * A G C T A G C T ; #;;% CAGGTAGCAAGCTTGCATGTCA raw score = 19-(6+7)* = 6* CACGTAGCAAGCTTG-GTGTCA I I B90 90B6$ 18

19 A 4 R -1 5 N >! "#!" D C Q E G H I L K M F P S T W Y V X A R N D C Q E G H I L K M F P S T W Y V X 2!#3.45" 2 Position Independent Matrices PAM Matrices (Percent Accepted Mutation)?(@ $ $.1 $.5AB) BLOSUM Matrices (BLOck SUbstitution Matrices) $?(@ ( $ 7*() $ #3.452 * Position-Specific Score Matrix Position-Specific Score Matrix :,;# # 0 J K K A R N D C Q E G H I L K M F P S T W Y V 435 K E S N K P A M A H R D I K S K N I M V K N D L

20 7*C Local Alignment Statistics )2)2N 8 6L ;λ8 6$ ;8M L) λ ) λ8m 2 (λ8; L*1$ 2 6 ( * More info: The Statistics of Sequence Similarity Scores 7 *!" 000 ) ; 7* An alignment BLAST cannot make: 1 GAATATATGAAGACCAAGATTGCAGTCCTGCTGGCCTGAACCACGCTATTCTTGCTGTTG 1 GAGTGTACGATGAGCCCGAGTGTAGCAGTGAAGATCTGGACCACGGTGTACTCGTTGTCG GTTACGGAACCGAGAATGGTAAAGACTACTGGATCATTAAGAACTCCTGGGGAGCCAGTT GCTATGGTGTTAAGGGTGGGAAGAAGTACTGGCTCGTCAAGAACAGCTGGGCTGAATCCT GGGGTGAACAAGGTTATTTCAGGCTTGCTCGTGGTAAAAAC GGGGAGACCAAGGCTACATCCTTATGTCCCGTGACAACAAC 9 *D(0 20

21 . Score Score 84 = bits bits (741), (741), Expect Expect O'E,8H = 7e-77 7e-77 Identities Identities = 147/ /331 (44%), (44%), Positives Positives = 206/ /331 (61%), (61%), Gaps Gaps = 8/331 8/331 (2%) (2%) Frame Frame = 'E,8$8(2*4 # Megablast Discontiguous Megablast PSI-BLAST PHI-BLAST.(9 E +?.( $? $ + $ $ F(@ Uses discontiguous word matches Better for cross-species comparisons WORD SIZE default minimum blastn 11 7 megablast

22 ?- W = 11, t = 16, coding: W = 11, t = 16, non-coding: W = 12, t = 16, coding: W = 12, t = 16, non-coding: W = 11, t = 18, coding: W = 11, t = 18, non-coding: W = 12, t = 18, coding: W = 12, t = 18, non-coding: W = 11, t = 21, coding: W = 11, t = 21, non-coding: W = 12, t = 21, coding: W = 12, t = 21, non-coding: W <OQ t +) 4 ' PE ) 3 -') $""$O#/(%*4KK";0?! 2".?-# 22

23 .().()?0.(7*000 7*9. Query: NM_ Drosophila melanogaster CG18582-PA (mbt) mrna, (3244 bp) /note= mushroom bodies tiny; synonyms: Pak2, STE20, dpak2 Database: nr (nt), Mammalia[orgn] MegaBLAST = poor hits Discontiguous megablast = numerous hits... 23

24 7*9? *9<64. 2!"#$ % &'( ")*+,-. /-' ( 01 24

25 F; F%E 78;'E,84A 'E,84A )# R$ New sp O52692 MTS1_STRCS Modification methylase ScaI (N-4 cytosin e-23 + New sp P29538 MTH1_HAEPA Modification methylase HpaI (Adenine-spe e-22 New sp P23192 MTM2_MORBO Modification methylase MboII (Adenine-sp e-20 + New sp P28638 YHDJ_ECOLI Hypothetical adenine-specific methylase yhd e-18 + New sp Q45971 MTC1_CAUCR Modification methylase CcrMI (Adenine-sp e-16 + New sp O30569 MTS1_RHIME Modification methylase SmeIP (Adenine-sp e-15 + New sp P20590 MTH1_HAEIN Modification methylase HinfI (Adenine-sp e-15 + New sp Q2YMK2 MTB1_BRUA2 Modification methylase BabI (Adenine-spe e-14 + Gene New sp Q04845 MTC1_CITFR Modification methylase CfrBI (N-4 cytosi e-11 + New sp P30774 MTX1_XANCC Modification methylase XcyI (N-4 cytosin e-10 + New sp P14243 MTC9_CITFR Modification methylase Cfr9I (N-4 cytosi e-10 + New sp P50178 MTL22_LACLC Modification methylase LlaDCHIB (Adenin e-10 New sp P09358 MTD22_STRPN Modification methylase DpnIIB (Adenine e-09 New sp P23941 MTB1_BACAM Modification methylase BamHI (N-4 cytosi e-09 New sp Q58893 MTM5_METJA Modification methylase MjaV (N-4 cytosin e-08 + New sp P14230 MTSM_SERMA Modification methylase SmaI (N-4 cytosin e-08 New sp P34721 MT1B_MORBO Modification methylase MboIB (Adenine-sp e-07 + New sp P43871 MTH3_HAEIN Modification methylase HindIII (Adenine e-07 +)2 New sp P18051 MTB2_BACAM Modification methylase BamHII (N-4 cytos e-07 + New sp O52513 MTS1_STRFI Modification methylase SfiI (N-4 cytosin e-06 New sp Q58392 MTM1_METJA Modification methylase MjaI (N-4 cytosin e-06 + New sp Q9S4X2 YUBD_ECOLI Putative methylase yubd e-06 New sp O68556 MTB1_BACSU Modification methylase BglI (N-4 cytosin e-05 + New sp O59647 MTMW_METWO Modification methylase MwoI (N-4 cytosin e-04 + New sp P14827 MTEC_ENTCL Modification methylase EcaI (Adenine-spe e-04 New sp P71366 T3MH_HAEIN Putative type III restriction-modificati New sp P40814 T3MO_SALTY Type III restriction-modification system New sp P08763 T3MO_BPP1 Type III restriction-modification system New sp Q9LAI2 MTB1_BACSQ )2 Modification methylase BslI (N-4 2 cytosin )2 2 2 sp P25076 CY11_SOLTU Cytochrome c1 heme protein, mitochondria sp Q83DD0 SYV_COXBU Valyl-tRNA synthetase (Valine--tRNA ligase)

26 G. #3.45 Last position-specific scoring matrix computed, weighted observed perc A R N D C Q E G H I L K M F P S T W Y V 265 P D D L V V D A 4 R I N D F 274 G C G Q E G H I L K M F P S T W Y V X G274, A R N D C Q E G H I L K M F P S T W Y V X G275 BLOSUM62 5( Basic Local Alignment Search Tool 23 &4?( S ( * S B 9':3 B A B!H!+)8 S S )! B 9883 B!H!+! S )! B A!!9! S B 68:3 S B 593 S 2 B 88:3 S 2) B 7,:3 S B ) 3 S 2 B ) S 3! B 3 26

27 Protein BLAST Databases New Nucleotide Databases Protein nr traditional GenBank records refseq = NP_, XP_ swissprot pdb pat env_nr nr = nr )F #9=F 8 8 ) 27

28 New Output View New Output View 7 ( +'6 ( ' (8 (8 Sorting Results Sorting Hits: by Score 7( +(( (9 * ((&9 28

29 Sorting Hits: by Query Start # ) = <T = < 2 2 ) ) 3 E = (G9 * = =,3 B#"""" ;3$""" ;2$""" ;#"""" #"""";3$""" 3 Example: Mapping Oligos Onto a Genome Map Oligos Onto Genome >forward CCATGGCGACCCTGGAAAAGC >reverse CAGCAGCGGCTGTGCCTGCGG U U U >CCATGGCGACCCTGGAAAAGCNNNNNNNNNNCAGCAGCGGCTGTGCCTGCGG forward primer reverse primer -W 7 e

30 Genome BLAST Results Primer Alignments reverse primer forward primer MapViewer MapViewer 30

31 Sequence View (sv) forward Service Addresses S'E,8 S9 5 info@ncbi.nlm.nih.gov SC matten@ncbi.nlm.nih.gov reverse 31

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

GEP Annotation Report

GEP Annotation Report GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:

More information

Practical search strategies

Practical search strategies Computational and Comparative Genomics Similarity Searching II Practical search strategies Bill Pearson wrp@virginia.edu 1 Protein Evolution and Sequence Similarity Similarity Searching I What is Homology

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 1/23/07 CAP5510 1 Genomic Databases Entrez Portal at National

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids Database searches 1 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids 2 DNA and protein databases EMBL/GenBank/DDBJ database of nucleic acids (cntd) 3 DNA and protein databases SWISS-PROT

More information

Network Alignment 858L

Network Alignment 858L Network Alignment 858L Terms & Questions A homologous h Interolog = B h Species 1 Species 2 Are there conserved pathways? What is the minimum set of pathways required for life? Can we compare networks

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Annotation of Drosophila grimashawi Contig12

Annotation of Drosophila grimashawi Contig12 Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón

GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón GENE ONTOLOGY (GO) Wilver Martínez Martínez Giovanny Silva Rincón What is GO? The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in

More information

Mitochondrial Genome Annotation

Mitochondrial Genome Annotation Protein Genes 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015 Outline Introduction Mitochondrial DNA Problem Tools Training Annotation

More information

Similarity searching summary (2)

Similarity searching summary (2) Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

Exercise 5. Sequence Profiles & BLAST

Exercise 5. Sequence Profiles & BLAST Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

Domain-based computational approaches to understand the molecular basis of diseases

Domain-based computational approaches to understand the molecular basis of diseases Domain-based computational approaches to understand the molecular basis of diseases Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC http://bioinf.umbc.edu Research at Kann s Lab.

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Heuristic Methods. Heuristic methods for alignment Sequence databases Multiple alignment Gene and protein prediction

Heuristic Methods. Heuristic methods for alignment Sequence databases Multiple alignment Gene and protein prediction Heuristic methods for alignment Sequence databases Multiple alignment Gene and protein prediction Armstrong, 2010 Heuristic Methods! FASTA! BLAST! Gapped BLAST! PSI-BLAST Armstrong, 2010 1 Assumptions

More information

BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

BSC 4934: QʼBIC Capstone Workshop Giri Narasimhan. ECS 254A; Phone: x3748 BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan ECS 254A; Phone: x3748 giri@cs.fiu.edu http://www.cs.fiu.edu/~giri/teach/bsc4934_su10.html July 2010 7/12/10 Q'BIC Bioinformatics 1 Overview of Course"

More information

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors

Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors Genes 2011, 2, 449-501; doi:10.3390/genes2030449 Article OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene

More information

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:

More information

MegAlign Pro Pairwise Alignment Tutorials

MegAlign Pro Pairwise Alignment Tutorials MegAlign Pro Pairwise Alignment Tutorials All demo data for the following tutorials can be found in the MegAlignProAlignments.zip archive here. Tutorial 1: Multiple versus pairwise alignments 1. Extract

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi Bioinformatics Sequence Analysis An introduction Part 8 Mahdi Vasighi Sequence analysis Some of the earliest problems in genomics concerned how to measure similarity of DNA and protein sequences, either

More information

Bioinformatics. Can Keşmir Theoretical Biology/Bioinformatics, UU

Bioinformatics. Can Keşmir Theoretical Biology/Bioinformatics, UU Bioinformatics Can Keşmir Theoretical Biology/Bioinformatics, UU 2013 i c Utrecht University, 2013 Ebook publically available at: http://theory.bio.uu.nl/bpa/bioinf2013.pdf ii Contents 1 Introduction to

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

BLAST: Basic Local Alignment Search Tool

BLAST: Basic Local Alignment Search Tool .. CSC 448 Bioinformatics Algorithms Alexander Dekhtyar.. (Rapid) Local Sequence Alignment BLAST BLAST: Basic Local Alignment Search Tool BLAST is a family of rapid approximate local alignment algorithms[2].

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Heuristic Alignment and Searching

Heuristic Alignment and Searching 3/28/2012 Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch). Local Alignment An optimal pair of subsequences is taken from the two

More information

Comparing whole genomes

Comparing whole genomes BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information