Protein function prediction based on sequence analysis

Size: px
Start display at page:

Download "Protein function prediction based on sequence analysis"

Transcription

1 Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005 Rein Aasland Department of Molecular Biology University of Bergen Please do not distribute without the author s consent! MOL204 Applied Bioinformatics Lecture 8 1 MOL204 Applied Bioinformatics Lecture 8 2 Sequence vs structure Sequence vs structure & function Chothia & Lesk, (1986) EMBO J. 5: Devos et al., (2000) Proteins: Structure, Function, and Genetics 41: MOL204 Applied Bioinformatics Lecture 8 4 MOL204 Applied Bioinformatics Lecture 8 3 Complete genome sequences gives new meaning to database searches Sequence similarities can be used as basis for evaluating hypotheses of homology Sequences are HOMOLOGOUS if they have a common ancestor Homologus genes/proteins diverge during evolution Sequence similarity searches is one of our most powerful means for protein function prediction and genome annotation Similar sequences are ANALOGOUS if they do not have a common ancestor Analogous sequences can CONVERGE during evolution and become more similar MOL204 Applied Bioinformatics Lecture 8 5 MOL204 Applied Bioinformatics Lecture 8 6 1

2 Sequence similarities can be used as basis for evaluating hypotheses of homology Two genes are HOMOLOGS if they have a common ancestor Two genes in different species are ORTHOLOGS if they have evolved from a common ancestor by speciation The homologous sequence space orthologs in different species a protein (super)family paralogs in one species Two genes in one species are PARALOGS if they have evolved from a common ancestor by duplication NOTE 1: genes are either homologous or not! NOTE 2: two genes can be considered partially homolous if they share one homologous domain. sequence similarity Not always possible to distinguish between orthologs and paralogs or just distant homolgs MOL204 Applied Bioinformatics Lecture 8 7 MOL204 Applied Bioinformatics Lecture 8 8 Reasons for performing Database Searches Sequence Alignments and Database Searches Find a particular sequence - very close homologues trivial database searches Reveal clues to function - identify functional modules FIRST: use SMART, Pfam, CDD and InterPro to search for known globular modules THEN: use Blast database searches to search for distant relatives which may reveal additional (unknown) globular domains Often difficult to distinguish TP from FP A hit in a database search, even if apparently significant, may be a false positive; it is a hypothesis of homology! Questions relating to database searches: What is the architecture of my protein? Does it contain globular domains belonging to families with known function? Is the sequence similarity strong engough to allow for a precise prediction of function? How can I trust the similarities I find? In many cases, only structural comparison can prove homology. MOL204 Applied Bioinformatics Lecture 8 9 MOL204 Applied Bioinformatics Lecture 8 10 Globular Different types of protein Trans-membrane Globular Domains topic from lecture 4 Secondary structure elements: helices, strands, loops Structural motifs: primary organisatin of 2nd ary structure elements Folds: basic structural elements, one or more motifs Random coil Coiled coil Domains: Elaborated folds - as found in real proteins. require different types of bioinformatical analysis MOL204 Applied Bioinformatics Lecture 8 11 MOL204 Applied Bioinformatics Lecture

3 Globular domains have hydrophobic cores Conserved motifs often correspond to core secondary structure elements b14 α8 MOL204 Applied Bioinformatics Lecture 8 13 MOL204 Applied Bioinformatics Lecture 8 14 Search for known domains Database searches by comparison to databases of multiple alignments of domains SMART (EMBL, Heidelberg) Pfam (St. Luis, Stockholm, Cambridge, Jouy) InterPro (EBI, Cambrdige) CDD (NCBI, US) Scoring and statistical significance Scoring matrices, gap penalites, E- and P-values Different methods Blast, Fasta, Smith-Waterman Psi-Blast Check for repeats Dot plots (and Pfam, Smart etc.) Interpretation Visual inspection, reciprocal searches Multiple alignment Clustal_X, T-Coffee, Muscle. MOL204 Applied Bioinformatics Lecture 8 15 MOL204 Applied Bioinformatics Lecture 8 16 Database Searches Scoring matrices PAM250 Dayhof matrix A R N D C Q E G H I L K M F P S T W Y V A 2 R -2 6 N D Dayhof C -2-4 matrices were built in 1978 and based on Q E G groups of -1 sequences 0 5 (+85% identity) H I - -1 assuming evolutionary model 5 where L - -2 every -3-3 mutation is -3 independent K molecular clock is constant M F P PAM 1 := 0-1 Percent -1-3 Accepted Mutations S T W Y V MOL204 Applied Bioinformatics Lecture 8 17 Database Searches Other scoring matrices BLOSUM series (Henikoff, 1992) BLOSUM62 (for about 62% identities) is one of the most commonly used matrices Gonnet series (Gonnet, 1992) Similar to PAM matrices, Often superior, but less frequently used. Each matrix requires optimised gap penalties Advice: try searches with several matrices and different gap penalites. MOL204 Applied Bioinformatics Lecture

4 Database Searches Heuristic methods Database Searches Statistical significance FASTA Uses word search in look-up tables followed by Smith-Waterman alignment of best hits BLAST Uses word search in look-up tables, gapped extension followed by Smith-Waterman alignment of best hits More powerful implemenation - and fast server at NCBI. Both methods are fast and sensitive, but do not formally guarentee the best alignments. E-value (for the score of a match) The number of matches with at least this score that can be expected with the same querey in a database of random sequences with the same size. Tentative recommendations: P-value (for the score of a match) The probablilty that a match with at least this score E-value range will appear with interpreation same querey in Smaller than e-100 a database of are random exact matches sequences (same with the gene, same same size. species). Between e-50 to e-100 are nearly identical genes Beetween e-10 to e-50 are interesting closely related sequences. Between 1 and Current e-5 version CAN of be Blast real homologues. uses E-values Greater than 1 are most likely not relevant MOL204 Applied Bioinformatics Lecture 8 19 MOL204 Applied Bioinformatics Lecture 8 20 Database Searches: Blastp Blast at NCBI Database Searches: Blastp Limit search by species Paste in your sequence here Filter removes low complexity regions (GGGSGGGS ) Increase E value for higher sensitivity and shorter sequences Use the smallest database needed for your purpose Try different matrices and Gap costs MOL204 Applied Bioinformatics Lecture 8 21 MOL204 Applied Bioinformatics Lecture 8 22 Database Searches: Blastp Database Searches: Blastp Increas list size for large protein families sequence length Check here if you want PSI- BLAST Result from CDD Limit hits to an E-value range for large families MOL204 Applied Bioinformatics Lecture 8 23 request ID Press format button to continue; - but you may choose to alter setting! MOL204 Applied Bioinformatics Lecture

5 Database Searches: Blastp choose style of output type of alignment Check to format for PSI-Blast Other Database Search Sites The Blast family: Blastn DNA-DNA Blastp Protein-Protein Blastx DNA-Protein good for new cdnas! Tblastn Protein-DNA if gene is not predicted Tblastx 6-frame x 6frame ExPasy Blast WU-Blast2 at EBI and at Washington U. easier to try different options FASTA3 at Swiss Bioinformatics Centre (SIB) quick because less used; UniProt! at EBI - the current best FASTA implementation If needed, restrict to a range of E-values BIC - the Bioccelerator ParAlign (in Oslo!) Smith-Waterman on special chip Novel fast heuristic method MOL204 Applied Bioinformatics Lecture 8 25 MOL204 Applied Bioinformatics Lecture 8 26 Other Database Search Sites Other Database Search Sites Uses UNIPROT MOL204 Applied Bioinformatics Lecture 8 27 MOL204 Applied Bioinformatics Lecture 8 28 Choice of databases Quality of databases SwissProt entries TREMBL REFSEQ entries PDB entries The best quality and best annotated database - but also rather incomplete Translation of EMBL DNA database ~equivalent to GenPept A reference database one entry per object The ULTIMATE database in theory! A reference database one entry per object The ULTIMATE database in theory! Sequencing errors Gene prediction errors Very common for large complete genomes c.f. NURF P301, TOUTATIS Annotation errors Primary databases are not corrected unless authors agree. c.f. FSH Redundancy Even the non-redunant databases are significantly redundant. Organism-specific databases. MOL204 Applied Bioinformatics Lecture 8 29 MOL204 Applied Bioinformatics Lecture

6 A case: Search for yeast SET domains A case: Search for yeast SET domains Blast Default Blast Blast Blosum62 Blosum45 no filter no filter (>gap) Bic Default Bic (>gap) 1e-155 1e-155 1e-126 7e-171 3e-163 2e-14 2e-14 2e-13 2e-15 4e MOL204 Applied Bioinformatics Lecture hits 8 hits 8 hits 41 hits 34 hits MOL204 Applied Bioinformatics Lecture 8 32 A case: Search for yeast SET domains Always perform reciprocal searches A case: Search for yeast SET domains Check that alignments are sensible 2e e-53 1e-153 >gi ref NP_ transcription factor containing a SET domain; Set2p Length = 733 Score = 27.3 bits (59), Expect = 2.4 Identities = 39/152 (25%), Positives = 64/152 (41%), Gaps = 32/152 (21%) Query: 37 CSN--WESSRSADIEVRKSSNERDFGVFAADSCVKGELIQEYLGKIDFQKNYQTDPNNDY 94 C N ++ + A I + K GV A + I EY G D DY Sbjct: 109 CQNQRFQKKQYAPIAIFKTKH-KGYGVRAEQDIEANQFIYEYKGEVIEEMEFR-DRLIDY 166 Query: 95 RLMGTTKPKVLFHPHWPL-----YIDSRETGGLTRYIRRSCEPNVELVTVRPLDEKPRGD H ID+ G L R+ SC PN + Sbjct: DQRHFKHFYFMMLQNGEFIDATIKGSLARFCNHSCSPNAYV Query: 150 NDCRVKFVLR----AIRDIRKGEEISVEWQWD 177 N VK LR A R I KGEEI+ ++ D Sbjct: 208 NKWVVKDKLRMGIFAQRKILKGEEITFDYNVD hits MOL204 Applied Bioinformatics Lecture 8 33 MOL204 Applied Bioinformatics Lecture 8 34 A case: Search for yeast SET domains A case: Search for yeast SET domains Check that alignments are sensible A false hit (E=38) >gi ref NP_ Cdc123p Length = 360 Score = 23.5 bits (49), Expect = 38 Identities = 23/97 (23%), Positives = 42/97 (42%), Gaps = 10/97 (10%) Query: 10 KAITISEYKDKYVKMFIDNHYDDDWVVCSNWESSRSADIEVRKSSNERDFGVFAADSCVK 69 K+I + K D + E+SRS E + + D+ + D Sbjct: 37 KSIVLKSLPKKFIQ-----YLEQDGIKLPQEENSRSVYTEEIIRNEDNDYSDWEDDEDTA 91 Query: 37 CSN--WESSRSADIEVRKSSNERDFGVFAADSCVKGELIQEYLGKIDFQKNYQTDPNNDY 94 C N ++ + A I + K GV A + I EY G D DY Sbjct: 109 CQNQRFQKKQYAPIAIFKTKH-KGYGVRAEQDIEANQFIYEYKGEVIEEMEFR-DRLIDY 166 Query: 70 GELIQEYLGKIDFQKNYQ--TDPNNDYRLMGTTKPKV 104 E +QE IDF + +Q D N+ +G PK+ Sbjct: 92 TEFVQEVEPLIDFPELHQKLKDALNE---LGAVAPKL 125 MOL204 Applied Bioinformatics Lecture 8 35 MOL204 Applied Bioinformatics Lecture

7 Database Searches Using low sequence complexity filter (SEG) [p254] Many proteins contain extensive regions with a low sequence complexity Example: C-ter third of FSH_DROME HLMQPAGPQQ QQQQQQQQPF GHQQQQQQQQ QQQQQQQQQH MDYVTELLSK GAENVGGMNG NHLLNFNLDM AAAYQQKHPQ QQQQQAHNNG FNVADFGMAG FDGLNMTAAS FLDLEPSLQQ QQMQQMQLQQ QHHQQQQQQT HQQQQQHQQQ HHQQQQQQLT QQQLQQQQQQ QQQQQHLQQQ QHQQQHHQAA NKLLIIPKPI ESMMPSPPDK QQLQQHQKVL PPQQSPSDMK LHPNAAAAAA VASAQAKLVQ TFKANEQNLK NASSWSSLAS ANSPQSHTSS SSSSSKAKPA MDSFQQFRNK AKERDRLKLL EAAEKEKKNQ KEAAEKEQQR KHHKSSSSSL TSAAVAQAAA IAAATAAAAV TLGAAAAAAL ASSASNPSGG SSSGGAGSTS QQAITGDRDR DRDRERERER SGSGGGQSGN GNNSSNSANS NGPGSAGSGG SGGGGGSGPA SAGGPNSGGG GTANSNSGGG GGGGGPALLN AGSNSNSGVG SGGAASSNSN SSVGGIVGSG GPGSNSQGSS GGGGGGPASG GGMGSGAIDY GQQVAVLTQV AANAQAQHVA AAVAAQAILA ASPLGAMESG RKSVHDAQPQ ISRVEDIKAS Using multiple alignments as basis for similarity searches: PSI-Blast You can use Blast2sequences to see what is filtered out MOL204 Applied Bioinformatics Lecture 8 37 MOL204 Applied Bioinformatics Lecture 8 38 PSI-BLAST Position-specific iterated BLAST A very sensitive method for finding distant homologues PSI-BLAST All 6 yeast SET domains are easily identified after 2 rounds Hits after a BLAST search are selected for alignment and subsequent profile-search Conserved positions are emphasised. The procedure can be repeated until convergence is achieved; i.e. no new matches MOL204 Applied Bioinformatics Lecture 8 39 MOL204 Applied Bioinformatics Lecture 8 40 PSI-BLAST Similarity between SET domains and Methyltransferases The similarity between SET domains and a group of plant Methyltransferases was found after 9 rounds of PSI-BLAST (Rea et al. 2000) MOL204 Applied Bioinformatics Lecture 8 41 MOL204 Applied Bioinformatics Lecture

8 SET domains ARE smilar to Rubisco MTase PSI-BLAST PSI-BLAST involves the manual inclusion of new candidate relatives A powerful program that must be used with great care!!! MOL204 Applied Bioinformatics Lecture 8 43 MOL204 Applied Bioinformatics Lecture 8 44 Using multiple alignments as basis for similarity searches: Profiles and HMMs MOL204 Applied Bioinformatics Lecture 8 45 MOL204 Applied Bioinformatics Lecture 8 46 MOL204 Applied Bioinformatics Lecture 8 47 MOL204 Applied Bioinformatics Lecture

9 MOL204 Applied Bioinformatics Lecture 8 49 MOL204 Applied Bioinformatics Lecture 8 50 Domain families in SMART Bork et al., EMBL, Heidelberg Domain families in Pfam Eddy, Bateman, Sonnhammer et al., et al., St.Louis, Cambridge, Stockholm Collection of carefully aligned domains (665) Collection of carefully aligned domains (7503 = Pfam-A) + the automatically aligned ProDom families ( = Pfam-B) ~73% of sequences contains at least one hit with Pfam A (or B) Implementations at St. Louis and Sanger Centre are different! and both have nice features! - including MOL204 Applied Bioinformatics Lecture 8 51 MOL204 Applied Bioinformatics Lecture 8 52 Domain families in Pfam Eddy, Bateman, Sonnhammer et al., et al., St.Louis, Cambridge, Stockholm Domain families in CDD NCBI team Pfam + SMART + NCBI + COG (11088 ) Search by Reverse Position Specific BLAST Automatically done in Blastp MOL204 Applied Bioinformatics Lecture 8 53 MOL204 Applied Bioinformatics Lecture

10 Domain families in CDD NCBI team Domain families and more in InterPro Appweiler et al., EBI Domains, motifs, familes, superfamilies (11007 entries, 2573 domains 8166 families ) Well integrated with many other databases, and well mapped onto GO (Gene Ontology) Problem: collection of data from various sources of varying quality. I.e. check where data comes from! MOL204 Applied Bioinformatics Lecture 8 55 MOL204 Applied Bioinformatics Lecture 8 56 Domain families and more in InterPro Appweiler et al., EBI Dot plots to reveal repeats MOL204 Applied Bioinformatics Lecture 8 57 MOL204 Applied Bioinformatics Lecture 8 58 Different types of protein Prediction of TM regions Globular Trans-membrane Random coil Coiled coil. require different types of bioinformatical analysis MOL204 Applied Bioinformatics Lecture 8 59 MOL204 Applied Bioinformatics Lecture

11 Prediction of TM regions Prediction of TM regions TMpred (ISREC, Geneva) Human EGF Receptor TMpred (ISREC, Geneva) Human Rhodopsin A B C D E F G MOL204 Applied Bioinformatics Lecture 8 61 MOL204 Applied Bioinformatics Lecture 8 62 Prediction of coiled-coils Prediction of coiled-coils Coils (ISREC, Geneva) Human EEA1 MOL204 Applied Bioinformatics Lecture 8 63 MOL204 Applied Bioinformatics Lecture 8 64 GlobPlot: a method for predicting structure and unstructure MOL204 Applied Bioinformatics Lecture 8 65 MOL204 Applied Bioinformatics Lecture

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Bioinformatics for Biologists

Bioinformatics for Biologists Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Fundamentals of database searching

Fundamentals of database searching Fundamentals of database searching Aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins. The principles

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

More information

Alignment & BLAST. By: Hadi Mozafari KUMS

Alignment & BLAST. By: Hadi Mozafari KUMS Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

More information

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Similarity searching summary (2)

Similarity searching summary (2) Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Sequences, Structures, and Gene Regulatory Networks

Sequences, Structures, and Gene Regulatory Networks Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences Start with Biology This will motivate the choices we make

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Introduction to protein alignments

Introduction to protein alignments Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Domain-based computational approaches to understand the molecular basis of diseases

Domain-based computational approaches to understand the molecular basis of diseases Domain-based computational approaches to understand the molecular basis of diseases Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC http://bioinf.umbc.edu Research at Kann s Lab.

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Biochemistry 324 Bioinformatics. Pairwise sequence alignment Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

More information

Practical search strategies

Practical search strategies Computational and Comparative Genomics Similarity Searching II Practical search strategies Bill Pearson wrp@virginia.edu 1 Protein Evolution and Sequence Similarity Similarity Searching I What is Homology

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Lecture 2. The Blast2GO annotation framework

Lecture 2. The Blast2GO annotation framework Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin Fall 2015 h.p://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to h.p://www.ebi.ac.uk/interpro/training.html and finish the second online training

More information

Collected Works of Charles Dickens

Collected Works of Charles Dickens Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

More information

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins

Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire

More information

Heuristic Alignment and Searching

Heuristic Alignment and Searching 3/28/2012 Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch). Local Alignment An optimal pair of subsequences is taken from the two

More information

Substitution matrices

Substitution matrices Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM

More information

Session 5: Phylogenomics

Session 5: Phylogenomics Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

7.36/7.91 recitation CB Lecture #4

7.36/7.91 recitation CB Lecture #4 7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Functional Annotation

Functional Annotation Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance

More information

Procedure to Create NCBI KOGS

Procedure to Create NCBI KOGS Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based

More information

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are

More information

HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION

HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION From THE CENTER FOR GENOMICS AND BIOINFORMATICS Karolinska Institutet, Stockholm, Sweden HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION Markus Wistrand Stockholm 2005 All previously published

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information