Advanced practical course in genome bioinformatics DAY 6: Functional annotation. Petri Törönen Earlier version Patrik Koskinen

Size: px
Start display at page:

Download "Advanced practical course in genome bioinformatics DAY 6: Functional annotation. Petri Törönen Earlier version Patrik Koskinen"

Transcription

1 Advanced practical course in genome bioinformatics DAY 6: Functional annotation Petri Törönen Earlier version Patrik Koskinen

2 Genome project roadmap After experimental design and preparations a genome project can be roughly split into the following steps: 1. Sequencing 2. (de novo) assembly, scaffolding 3. RNA-sequencing and mapping 4. Gene prediction 5. Manual & functional annotation You are herej! 6. Submission and publication of the genome in a biodatabase 7. Further downstream analysis

3 Three sections Background Methods Demonstration of tools

4 Outline of Background Goal How function can be defined Description Gene Ontology What can be used to predict function What I am omitting here Why these two matter

5 GOAL You have an unknown protein sequence It should be functionally annotated In wet lab (precise, slow) In silico (less precise, faster) When having thousands of sequences, wet lab is not an option Collective manual in-silico annotation of sequences (hundreds of scientists) Combined use of few automated in-silico methods

6 GOAL Protein sequence: MAVQISKKRKFVADGIFKAELNEFLTRELAEDGYSGVEVRVTPTRTEIIILATRTQNVLG EKGRRIRELTAVVQKRFGFPEGSVELYAEKVATRGLCAIAQAESLRYKLLGGLAVRRACY GVLRFIMESGAKGCEVVVSGKLRGQRAKSMKFVDGLMIHSGDPVNYYVDTAVRHVLLRQG VLGIKVKIMLPWDPTGKIGPKKPLPDHVSIVEPKDEILPTTPISEQKGGKPEPPAMPQPV PTA Is this informative?

7 GOAL Is this better? (Various annotation types) 40S ribosomal protein S3 Involved in translation as a component of the 40S small ribosomal subunit (PubMed: ). Has endonuclease activity and plays a role in repair of damaged DNA (PubMed: ) GO: structural constituent of ribosome GO: damaged DNA binding

8 Examples (Good, Bad, Ugly) tartrate-resistant acid phosphatase type 5 precursor [Homo sapiens] >NP_ tartrate-resistant acid phosphatase type 5 precursor [Homo sapiens] hypothetical protein PANDA_021498, partial [Ailuropoda melanoleuca] GenBank: EFB >EFB hypothetical protein PANDA_021498, partial [Ailuropoda melanoleuca]

9 How function is defined How can we describe a function for a gene?

10 How function is defined Functional description as human readable text Linking gene to Key Words (Uniprot) Linking gene to Gene Ontology classes Linking gene to Enzyme categories Linking gene to Signalling Pathways or Biochemical Pathways (KEGG) Linking Domain to functional activity Focus on description and on Gene Ontology

11 Human readable descriptions tartrate-resistant acid phosphatase type 5 precursor [Homo sapiens] Summary: This gene encodes an iron containing glycoprotein which catalyzes the conversion of orthophosphoric monoester to alcohol and orthophosphate. It is the most basic of the acid phosphatases and is the only form not inhibited by L(+)- tartrate. [provided by RefSeq, Aug 2008].

12 Gene Ontology (GO) GO represents a popular standard currently in the gene annotation GO represents classes that represent gene function Genes in same process are grouped to same class Easy summary for genes with similar function Easier to predict than text descriptions

13 Gene Ontology (GO) 3 sub-parts: Biological Process, Molecular Function, Cellular Localization Molecular Function => chemical activity Biological Process => Biology, cellular process Cellular localization => Location of gene in cell Hierarchical structure Categories with very precise function Categories with less precise function Categories with very broad function

14 Gene Ontology tartrate-resistant acid phosphatase type 5 precursor [Homo sapiens]

15 Advantages of GO Cross species comparison Already used by several databases Comprehensive GO covers all biological and chemical processes Many terms per gene product Simplify querying Uses restricted vocabulary developed by curators and annotators Use of evidence code How reliable is the given information

16 Disadvantages of GO Class gives only an approximate description of the gene Misannotations / Unreliable annotations This problem occurs Pathways are sometimes better representations KEGG database

17 Methods outline Stupid method What can be used to predict function Current in-silico methods Method Comparisons

18 Stupid method Run BLAST search Take the first sequence hit

19 Stupid Functional annotation Traditional way to go: Nearest neighbour = query sequence Threshold in search space (e.g. Blast e-val 1e-5)

20 Example Blast results from the Meliteae Cinxia (The Glanville fritillary butterfly) Sequences producing significant alignments: AGAP PA Score (bits) E-Value tr Q7QF43 Q7QF43_ANOGA AGAP PA OS=Anopheles gambiae GN=AGA e-35 tr B4JK90 B4JK90_DROGR GH12613 OS=Drosophila grimshawi GN=GH e-31 tr Q29H35 Q29H35_DROPS GA13573 OS=Drosophila pseudoobscura pseud e-30 tr B4N1J8 B4N1J8_DROWI GK16321 OS=Drosophila willistoni GN=GK e-29 tr B4PXZ6 B4PXZ6_DROYA GE17857 OS=Drosophila yakuba GN=GE17857 P e-29 tr Q9VZ71 Q9VZ71_DROME CG15211, isoform A OS=Drosophila melanoga e-29 tr B4R7M7 B4R7M7_DROSI GD16987 OS=Drosophila simulans GN=GD e-29 tr B4IDT9 B4IDT9_DROSE GM11433 OS=Drosophila sechellia GN=GM e-29 tr B3NVE7 B3NVE7_DROER GG18369 OS=Drosophila erecta GN=GG18369 P e-29 tr B3MQD3 B3MQD3_DROAN GF20425 OS=Drosophila ananassae GN=GF e-29 tr B4L7Y1 B4L7Y1_DROMO GI11027 OS=Drosophila mojavensis GN=GI e-28 tr Q16VM8 Q16VM8_AEDAE Putative uncharacterized protein (Fragmen e-28 tr B0WJ18 B0WJ18_CULQU Putative uncharacterized protein OS=Culex e-27 tr C1C2H5 C1C2H5_9MAXI Plasmolipin OS=Caligus clemensi GN=PLLP P e-20 tr Q1DGM2 Q1DGM2_AEDAE Putative uncharacterized protein (Fragmen e-19 tr C3ZW39 C3ZW39_BRAFL Putative uncharacterized protein OS=Branc e-11 tr A3KQ86 A3KQ86_DANRE Novel protein similar to vertebrate trans e-10 tr C3YLB7 C3YLB7_BRAFL Putative uncharacterized protein OS=Branc e-10 sp Q8CJ61 CKLF4_MOUSE CKLF-like MARVEL transmembrane domain-cont e-09 tr A4IFB9 A4IFB9_BOVIN CMTM4 protein OS=Bos taurus GN=CMTM4 PE= e-09 sp P47987 PLLP_RAT Plasmolipin OS=Rattus norvegicus GN=Pllp PE= e-08 sp Q9DCU2 PLLP_MOUSE Plasmolipin OS=Mus musculus GN=Pllp PE=2 SV=1 66 1e-08 tr Q4SI17 Q4SI17_TETNG Chromosome 5 SCAF14581, whole genome shot e-08 tr B1H2E6 B1H2E6_XENTR LOC protein OS=Xenopus tropicali e-08 tr A7YYE5 A7YYE5_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C4Q9A0 C4Q9A0_SCHMA Marvel-containing potential lipid raft-as e-08 tr Q6DGM6 Q6DGM6_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C3ZW41 C3ZW41_BRAFL Putative uncharacterized protein (Fragmen e-08 tr C1BJ60 C1BJ60_OSMMO Plasmolipin OS=Osmerus mordax GN=PLLP PE= e-07 sp Q8IZR5 CKLF4_HUMAN CKLF-like MARVEL transmembrane domain-cont e-07 Is this an informative annotation to adopt?

21 Example Blast results from the Meliteae Cinxia (The Glanville fritillary butterfly) Sequences producing significant alignments: Score (bits) E-Value tr Q7QF43 Q7QF43_ANOGA AGAP PA OS=Anopheles gambiae GN=AGA e-35 tr B4JK90 B4JK90_DROGR GH12613 OS=Drosophila grimshawi GN=GH e-31 tr Q29H35 Q29H35_DROPS GA13573 OS=Drosophila pseudoobscura pseud e-30 tr B4N1J8 B4N1J8_DROWI GK16321 OS=Drosophila willistoni GN=GK e-29 tr B4PXZ6 B4PXZ6_DROYA GE17857 OS=Drosophila yakuba GN=GE17857 P e-29 tr Q9VZ71 Q9VZ71_DROME CG15211, isoform A OS=Drosophila melanoga e-29 tr B4R7M7 B4R7M7_DROSI GD16987 OS=Drosophila simulans GN=GD e-29 tr B4IDT9 B4IDT9_DROSE GM11433 OS=Drosophila sechellia GN=GM e-29 tr B3NVE7 B3NVE7_DROER GG18369 OS=Drosophila erecta GN=GG18369 P e-29 tr B3MQD3 B3MQD3_DROAN GF20425 OS=Drosophila ananassae GN=GF e-29 tr B4L7Y1 B4L7Y1_DROMO GI11027 OS=Drosophila mojavensis GN=GI e-28 tr Q16VM8 Q16VM8_AEDAE Putative uncharacterized protein (Fragmen e-28 tr B0WJ18 B0WJ18_CULQU Putative uncharacterized protein OS=Culex e-27 tr C1C2H5 C1C2H5_9MAXI Plasmolipin OS=Caligus clemensi GN=PLLP P e-20 tr Q1DGM2 Q1DGM2_AEDAE Putative uncharacterized protein (Fragmen e-19 tr C3ZW39 C3ZW39_BRAFL Putative uncharacterized protein OS=Branc e-11 tr A3KQ86 A3KQ86_DANRE Novel protein similar to vertebrate trans e-10 tr C3YLB7 C3YLB7_BRAFL Putative uncharacterized protein OS=Branc e-10 sp Q8CJ61 CKLF4_MOUSE CKLF-like MARVEL transmembrane domain-cont e-09 tr A4IFB9 A4IFB9_BOVIN CMTM4 protein OS=Bos taurus GN=CMTM4 PE= e-09 sp P47987 PLLP_RAT Plasmolipin OS=Rattus norvegicus GN=Pllp PE= e-08 sp Q9DCU2 PLLP_MOUSE Plasmolipin OS=Mus musculus GN=Pllp PE=2 SV=1 66 1e-08 tr Q4SI17 Q4SI17_TETNG Chromosome 5 SCAF14581, whole genome shot e-08 tr B1H2E6 B1H2E6_XENTR LOC protein OS=Xenopus tropicali e-08 tr A7YYE5 A7YYE5_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C4Q9A0 C4Q9A0_SCHMA Marvel-containing potential lipid raft-as e-08 tr Q6DGM6 Q6DGM6_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C3ZW41 C3ZW41_BRAFL Putative uncharacterized protein (Fragmen e-08 tr C1BJ60 C1BJ60_OSMMO Plasmolipin OS=Osmerus mordax GN=PLLP PE= e-07 sp Q8IZR5 CKLF4_HUMAN CKLF-like MARVEL transmembrane domain-cont e-07 Plasmolipins Chemokine - like factor superfamily members

22 Why traditional method is stupid? Blind copying of the nearest neighbor annotation is creating errors in database annotations

23 Errors in public databases Schnoes AM, Brown SD, Dodevski I, Babbitt PC, Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies. PLoS Computational Biology 5(12)

24 Errors in databases So all the annotations in the databases are not correct What can be done?

25 Rational Functional annotation Collect some features for analyzed sequence Compare these features to features in known sequences Estimate the function based on the similarity with many sequences

26 What can be used to predict function Where computer programs and researchers can get the information for the sequence?

27 Function Prediction: What can we use to predict function This lecture discusses sequence based features! Sequence homology (BLAST result list) Phylogenetic tree of sequences Protein Domains (PFAM domains) Short sequence patterns motifs Sequence features

28 Function Prediction: What I am omitting Literature search Gene expression (in tissues, developmental stages..) Protein Protein interaction Genetic interactions These are also important source. But only if they are available.

29 Why the two previous groups matter? Assume you want to predict genes that activate neuronal growth in mammalian fetus Assume you want to predict genes that bind oxygen

30 Sequence Homology Methods Do a BLAST search with a query sequence Collect GO classes for genes in the BLAST result hit Give a weight to each BLAST hit often log(e-value) Combine the scores from the genes that belong to same GO class Report the top best / significant GO classes

31 BLAST example re-visited Sequences producing significant alignments: Score (bits) E-Value tr Q7QF43 Q7QF43_ANOGA AGAP PA OS=Anopheles gambiae GN=AGA e-35 tr B4JK90 B4JK90_DROGR GH12613 OS=Drosophila grimshawi GN=GH e-31 tr Q29H35 Q29H35_DROPS GA13573 OS=Drosophila pseudoobscura pseud e-30 tr B4N1J8 B4N1J8_DROWI GK16321 OS=Drosophila willistoni GN=GK e-29 tr B4PXZ6 B4PXZ6_DROYA GE17857 OS=Drosophila yakuba GN=GE17857 P e-29 tr Q9VZ71 Q9VZ71_DROME CG15211, isoform A OS=Drosophila melanoga e-29 tr B4R7M7 B4R7M7_DROSI GD16987 OS=Drosophila simulans GN=GD e-29 tr B4IDT9 B4IDT9_DROSE GM11433 OS=Drosophila sechellia GN=GM e-29 tr B3NVE7 B3NVE7_DROER GG18369 OS=Drosophila erecta GN=GG18369 P e-29 tr B3MQD3 B3MQD3_DROAN GF20425 OS=Drosophila ananassae GN=GF e-29 tr B4L7Y1 B4L7Y1_DROMO GI11027 OS=Drosophila mojavensis GN=GI e-28 tr Q16VM8 Q16VM8_AEDAE Putative uncharacterized protein (Fragmen e-28 tr B0WJ18 B0WJ18_CULQU Putative uncharacterized protein OS=Culex e-27 tr C1C2H5 C1C2H5_9MAXI Plasmolipin OS=Caligus clemensi GN=PLLP P e-20 tr Q1DGM2 Q1DGM2_AEDAE Putative uncharacterized protein (Fragmen e-19 tr C3ZW39 C3ZW39_BRAFL Putative uncharacterized protein OS=Branc e-11 tr A3KQ86 A3KQ86_DANRE Novel protein similar to vertebrate trans e-10 tr C3YLB7 C3YLB7_BRAFL Putative uncharacterized protein OS=Branc e-10 sp Q8CJ61 CKLF4_MOUSE CKLF-like MARVEL transmembrane domain-cont e-09 tr A4IFB9 A4IFB9_BOVIN CMTM4 protein OS=Bos taurus GN=CMTM4 PE= e-09 sp P47987 PLLP_RAT Plasmolipin OS=Rattus norvegicus GN=Pllp PE= e-08 sp Q9DCU2 PLLP_MOUSE Plasmolipin OS=Mus musculus GN=Pllp PE=2 SV=1 66 1e-08 tr Q4SI17 Q4SI17_TETNG Chromosome 5 SCAF14581, whole genome shot e-08 tr B1H2E6 B1H2E6_XENTR LOC protein OS=Xenopus tropicali e-08 tr A7YYE5 A7YYE5_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C4Q9A0 C4Q9A0_SCHMA Marvel-containing potential lipid raft-as e-08 tr Q6DGM6 Q6DGM6_DANRE CKLF-like MARVEL transmembrane domain con e-08 tr C3ZW41 C3ZW41_BRAFL Putative uncharacterized protein (Fragmen e-08 tr C1BJ60 C1BJ60_OSMMO Plasmolipin OS=Osmerus mordax GN=PLLP PE= e-07 sp Q8IZR5 CKLF4_HUMAN CKLF-like MARVEL transmembrane domain-cont e-07 Plasmolipins Chemokine - like factor superfamily members

32 Sequence Homology Methods Simple method Can fail to detect some similarities Programs BLAST2GO ( GOTCHA ( ARGOT 2 ( PFP ( PANNZER (

33 Phylogenetic tree methods Create the pair-wise distances for the set of genes Do a hierarchical clustering of genes Map the know GO functions to cluster tree Look for unknown genes in a cluster with many genes from the same GO class Report the top best / significant GO classes More =>

34 Phylogenetic tree methods These should outperform sequence homology methods Require a set of related genes Often much heavier calculations Programs: SIFTER (

35 Prediction with Protein domains Look what protein domains there are in query protein (PFAM) Map the functions that are linked to domains to your query sequence PFAM2GO Programs: InterProScan + PFAM2GO Drawbacks: This mapping is same in plant, mammal, bacteria Many domains to specific function

36 Prediction with Protein domains Benefits: Can create annotation from separate domains Similar seq:s do not have to be in database Programs: InterProScan ( Drawbacks: The mapping is same in plant, mammal, bacteria Many domains to specific function

37 Our contribution: PANNZER Use BLAST result list Add Taxonomic information Score GO classes using a score that takes the frequency of GO class in seq. DB into account Method is used to predict: GO Classes Description line

38 Our contribution: PANNZER Benefits: Taking the species taxonomy into account Improved use of statistics Drawbacks: Use (currently) only sequence similarity

39 Method comparisons Many annotation methods available All methods claim to be best available What methods are really the best?

40 Critical Assessment of Function Annotations (CAFA) Select a set of unknown genes Ask research groups to predict GO terms After dead line start collecting new annotations for genes Next evaluate the methods

41 Critical Assessment of Function Annotations (CAFA) Radivojac et al. A large-scale evaluation of computational protein function prediction. Nature Methods Jan 27. doi: /nmeth.2340.

42 CAFA 1 Most successful methods JonesUCL Argot 2 Pannzer

43 Method demonstrations InterProScan Argot Pannzer

44 Demo sequences 5 sequences 3 from eukaryota, 2 in prokaryota All are annotated as unknown in DB They can be still annotated based on the sequence Sequences are here: /Exercise_material_day7.txt

45 InterProScan InterProScan is metaserver that looks many sequence features Many of these features can be used to annotate sequences You have to give one sequence at the time Results include detailed description of domain functions

46 InterProScan

47 Results S es 5-S es 5-S oy 5-S es 5-S es

48 Argot2 Tool that processes BLAST output One of the best methods in CAFA 1

49 Argot2 Results: uences.php?js=13038

50 PANNZER2 Processes results similar to BLAST Predicts text and Gene Ontology classes Significantly faster than the other tools

51 Conclusion These methods increasingly needed Some methods exist Unfortunately no clear evaluation Remember: These are predictions. No certain info until they are tested in wet lab

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010

Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010 Annotation Error in Public Databases ALEXANDRA SCHNOES UNIVERSITY OF CALIFORNIA, SAN FRANCISCO OCTOBER 25, 2010 1 New genomes (and metagenomes) sequenced every day... 2 3 3 3 3 3 3 3 3 3 Computational

More information

Lecture 2. The Blast2GO annotation framework

Lecture 2. The Blast2GO annotation framework Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference

More information

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Towards a Comprehensive Annotation of Structured RNAs in Drosophila Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

BMD645. Integration of Omics

BMD645. Integration of Omics BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study

More information

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor Biological Networks:,, and via Relative Description Length By: Tamir Tuller & Benny Chor Presented by: Noga Grebla Content of the presentation Presenting the goals of the research Reviewing basic terms

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

MiGA: The Microbial Genome Atlas

MiGA: The Microbial Genome Atlas December 12 th 2017 MiGA: The Microbial Genome Atlas Jim Cole Center for Microbial Ecology Dept. of Plant, Soil & Microbial Sciences Michigan State University East Lansing, Michigan U.S.A. Where I m From

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

Gene Ontology and overrepresentation analysis

Gene Ontology and overrepresentation analysis Gene Ontology and overrepresentation analysis Kjell Petersen J Express Microarray analysis course Oslo December 2009 Presentation adapted from Endre Anderssen and Vidar Beisvåg NMC Trondheim Overview How

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Gene function annotation

Gene function annotation Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Computational Structural Bioinformatics

Computational Structural Bioinformatics Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite

More information

Microbiome: 16S rrna Sequencing 3/30/2018

Microbiome: 16S rrna Sequencing 3/30/2018 Microbiome: 16S rrna Sequencing 3/30/2018 Skills from Previous Lectures Central Dogma of Biology Lecture 3: Genetics and Genomics Lecture 4: Microarrays Lecture 12: ChIP-Seq Phylogenetics Lecture 13: Phylogenetics

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

Chapter 26 Phylogeny and the Tree of Life

Chapter 26 Phylogeny and the Tree of Life Chapter 26 Phylogeny and the Tree of Life Chapter focus Shifting from the process of how evolution works to the pattern evolution produces over time. Phylogeny Phylon = tribe, geny = genesis or origin

More information

Ontology Alignment in the Presence of a Domain Ontology

Ontology Alignment in the Presence of a Domain Ontology Ontology Alignment in the Presence of a Domain Ontology Finding Protein Homology by Andrew August Carbonetto B.Sc., McGill University, 2005 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution Background How does an evolutionary biologist decide how closely related two different species are? The simplest way is to compare

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. aelfarash@aun.edu.eg Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

In-Silico Approach for Hypothetical Protein Function Prediction

In-Silico Approach for Hypothetical Protein Function Prediction In-Silico Approach for Hypothetical Protein Function Prediction Shabanam Khatoon Department of Computer Science, Faculty of Natural Sciences Jamia Millia Islamia, New Delhi Suraiya Jabin Department of

More information

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database

SoyBase, the USDA-ARS Soybean Genetics and Genomics Database SoyBase, the USDA-ARS Soybean Genetics and Genomics Database David Grant Victoria Carollo Blake Steven B. Cannon Kevin Feeley Rex T. Nelson Nathan Weeks SoyBase Site Map and Navigation Video Tutorials:

More information

Biased amino acid composition in warm-blooded animals

Biased amino acid composition in warm-blooded animals Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino

More information

AP BIOLOGY SUMMER ASSIGNMENT

AP BIOLOGY SUMMER ASSIGNMENT AP BIOLOGY SUMMER ASSIGNMENT Welcome to EDHS Advanced Placement Biology! The attached summer assignment is required for all AP Biology students for the 2011-2012 school year. The assignment consists of

More information

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions? 1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 389; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs06.html 1/12/06 CAP5510/CGS5166 1 Evaluation

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Annotation of Drosophila grimashawi Contig12

Annotation of Drosophila grimashawi Contig12 Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Georgia Standards of Excellence Biology

Georgia Standards of Excellence Biology A Correlation of Foundation Edition 2014 to the A Correlation of Miller & Levine 2014, Foundation Edition to the in Introduction This document demonstrates how Miller & Levine : Foundation Edition 2014

More information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Exploring Phylogenetic Relationships in Drosophila with Ciliate Operations

Exploring Phylogenetic Relationships in Drosophila with Ciliate Operations Exploring Phylogenetic Relationships in Drosophila with Ciliate Operations Jacob Herlin, Anna Nelson, and Dr. Marion Scheepers Department of Mathematical Sciences, University of Northern Colorado, Department

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

Sequencing alignment Ameer Effat M. Elfarash

Sequencing alignment Ameer Effat M. Elfarash Sequencing alignment Ameer Effat M. Elfarash Dept. of Genetics Fac. of Agriculture, Assiut Univ. amir_effat@yahoo.com Why perform a multiple sequence alignment? MSAs are at the heart of comparative genomics

More information

Mitochondrial Genome Annotation

Mitochondrial Genome Annotation Protein Genes 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015 Outline Introduction Mitochondrial DNA Problem Tools Training Annotation

More information

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Networks & pathways. Hedi Peterson MTAT Bioinformatics Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

A Correlation of. to the. Georgia Standards of Excellence Biology

A Correlation of. to the. Georgia Standards of Excellence Biology A Correlation of to the Introduction The following document demonstrates how Miller & Levine aligns to the Georgia Standards of Excellence in. Correlation references are to the Student Edition (SE) and

More information

Written Exam 15 December Course name: Introduction to Systems Biology Course no

Written Exam 15 December Course name: Introduction to Systems Biology Course no Technical University of Denmark Written Exam 15 December 2008 Course name: Introduction to Systems Biology Course no. 27041 Aids allowed: Open book exam Provide your answers and calculations on separate

More information

Miller & Levine Biology 2014

Miller & Levine Biology 2014 A Correlation of Miller & Levine Biology To the Essential Standards for Biology High School Introduction This document demonstrates how meets the North Carolina Essential Standards for Biology, grades

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria

Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Bacterial Communities in Women with Bacterial Vaginosis: High Resolution Phylogenetic Analyses Reveal Relationships of Microbiota to Clinical Criteria Seminar presentation Pierre Barbera Supervised by:

More information

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Integration of functional genomics data

Integration of functional genomics data Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics

More information

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name.

A. Incorrect! In the binomial naming convention the Kingdom is not part of the name. Microbiology Problem Drill 08: Classification of Microorganisms No. 1 of 10 1. In the binomial system of naming which term is always written in lowercase? (A) Kingdom (B) Domain (C) Genus (D) Specific

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Biology Assessment. Eligible Texas Essential Knowledge and Skills

Biology Assessment. Eligible Texas Essential Knowledge and Skills Biology Assessment Eligible Texas Essential Knowledge and Skills STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules

More information

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm What kind of reaction produces large molecules by linking small molecules? molecules carry protein assembly

More information

STAAR Biology Assessment

STAAR Biology Assessment STAAR Biology Assessment Reporting Category 1: Cell Structure and Function The student will demonstrate an understanding of biomolecules as building blocks of cells, and that cells are the basic unit of

More information

Biology Science Crosswalk

Biology Science Crosswalk SB1. Students will analyze the nature of the relationships between structures and functions in living cells. a. Explain the role of cell organelles for both prokaryotic and eukaryotic cells, including

More information

ProtoNet 4.0: A hierarchical classification of one million protein sequences

ProtoNet 4.0: A hierarchical classification of one million protein sequences ProtoNet 4.0: A hierarchical classification of one million protein sequences Noam Kaplan 1*, Ori Sasson 2, Uri Inbar 2, Moriah Friedlich 2, Menachem Fromer 2, Hillel Fleischer 2, Elon Portugaly 2, Nathan

More information

objective functions...

objective functions... objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set

More information

Protein Interaction Mapping: Use of Osprey to map Survival of Motor Neuron Protein interactions

Protein Interaction Mapping: Use of Osprey to map Survival of Motor Neuron Protein interactions Protein Interaction Mapping: Use of Osprey to map Survival of Motor Neuron Protein interactions Presented by: Meg Barnhart Computational Biosciences Arizona State University The Spinal Muscular Atrophy

More information