Protein function prediction based on sequence analysis
|
|
- Corey Terence Hawkins
- 6 years ago
- Views:
Transcription
1 Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005 Rein Aasland Department of Molecular Biology University of Bergen Please do not distribute without the author s consent! MOL204 Applied Bioinformatics Lecture 8 1 MOL204 Applied Bioinformatics Lecture 8 2 Sequence vs structure Sequence vs structure & function Chothia & Lesk, (1986) EMBO J. 5: Devos et al., (2000) Proteins: Structure, Function, and Genetics 41: MOL204 Applied Bioinformatics Lecture 8 4 MOL204 Applied Bioinformatics Lecture 8 3 Complete genome sequences gives new meaning to database searches Sequence similarities can be used as basis for evaluating hypotheses of homology Sequences are HOMOLOGOUS if they have a common ancestor Homologus genes/proteins diverge during evolution Sequence similarity searches is one of our most powerful means for protein function prediction and genome annotation Similar sequences are ANALOGOUS if they do not have a common ancestor Analogous sequences can CONVERGE during evolution and become more similar MOL204 Applied Bioinformatics Lecture 8 5 MOL204 Applied Bioinformatics Lecture 8 6 1
2 Sequence similarities can be used as basis for evaluating hypotheses of homology Two genes are HOMOLOGS if they have a common ancestor Two genes in different species are ORTHOLOGS if they have evolved from a common ancestor by speciation The homologous sequence space orthologs in different species a protein (super)family paralogs in one species Two genes in one species are PARALOGS if they have evolved from a common ancestor by duplication NOTE 1: genes are either homologous or not! NOTE 2: two genes can be considered partially homolous if they share one homologous domain. sequence similarity Not always possible to distinguish between orthologs and paralogs or just distant homolgs MOL204 Applied Bioinformatics Lecture 8 7 MOL204 Applied Bioinformatics Lecture 8 8 Reasons for performing Database Searches Sequence Alignments and Database Searches Find a particular sequence - very close homologues trivial database searches Reveal clues to function - identify functional modules FIRST: use SMART, Pfam, CDD and InterPro to search for known globular modules THEN: use Blast database searches to search for distant relatives which may reveal additional (unknown) globular domains Often difficult to distinguish TP from FP A hit in a database search, even if apparently significant, may be a false positive; it is a hypothesis of homology! Questions relating to database searches: What is the architecture of my protein? Does it contain globular domains belonging to families with known function? Is the sequence similarity strong engough to allow for a precise prediction of function? How can I trust the similarities I find? In many cases, only structural comparison can prove homology. MOL204 Applied Bioinformatics Lecture 8 9 MOL204 Applied Bioinformatics Lecture 8 10 Globular Different types of protein Trans-membrane Globular Domains topic from lecture 4 Secondary structure elements: helices, strands, loops Structural motifs: primary organisatin of 2nd ary structure elements Folds: basic structural elements, one or more motifs Random coil Coiled coil Domains: Elaborated folds - as found in real proteins. require different types of bioinformatical analysis MOL204 Applied Bioinformatics Lecture 8 11 MOL204 Applied Bioinformatics Lecture
3 Globular domains have hydrophobic cores Conserved motifs often correspond to core secondary structure elements b14 α8 MOL204 Applied Bioinformatics Lecture 8 13 MOL204 Applied Bioinformatics Lecture 8 14 Search for known domains Database searches by comparison to databases of multiple alignments of domains SMART (EMBL, Heidelberg) Pfam (St. Luis, Stockholm, Cambridge, Jouy) InterPro (EBI, Cambrdige) CDD (NCBI, US) Scoring and statistical significance Scoring matrices, gap penalites, E- and P-values Different methods Blast, Fasta, Smith-Waterman Psi-Blast Check for repeats Dot plots (and Pfam, Smart etc.) Interpretation Visual inspection, reciprocal searches Multiple alignment Clustal_X, T-Coffee, Muscle. MOL204 Applied Bioinformatics Lecture 8 15 MOL204 Applied Bioinformatics Lecture 8 16 Database Searches Scoring matrices PAM250 Dayhof matrix A R N D C Q E G H I L K M F P S T W Y V A 2 R -2 6 N D Dayhof C -2-4 matrices were built in 1978 and based on Q E G groups of -1 sequences 0 5 (+85% identity) H I - -1 assuming evolutionary model 5 where L - -2 every -3-3 mutation is -3 independent K molecular clock is constant M F P PAM 1 := 0-1 Percent -1-3 Accepted Mutations S T W Y V MOL204 Applied Bioinformatics Lecture 8 17 Database Searches Other scoring matrices BLOSUM series (Henikoff, 1992) BLOSUM62 (for about 62% identities) is one of the most commonly used matrices Gonnet series (Gonnet, 1992) Similar to PAM matrices, Often superior, but less frequently used. Each matrix requires optimised gap penalties Advice: try searches with several matrices and different gap penalites. MOL204 Applied Bioinformatics Lecture
4 Database Searches Heuristic methods Database Searches Statistical significance FASTA Uses word search in look-up tables followed by Smith-Waterman alignment of best hits BLAST Uses word search in look-up tables, gapped extension followed by Smith-Waterman alignment of best hits More powerful implemenation - and fast server at NCBI. Both methods are fast and sensitive, but do not formally guarentee the best alignments. E-value (for the score of a match) The number of matches with at least this score that can be expected with the same querey in a database of random sequences with the same size. Tentative recommendations: P-value (for the score of a match) The probablilty that a match with at least this score E-value range will appear with interpreation same querey in Smaller than e-100 a database of are random exact matches sequences (same with the gene, same same size. species). Between e-50 to e-100 are nearly identical genes Beetween e-10 to e-50 are interesting closely related sequences. Between 1 and Current e-5 version CAN of be Blast real homologues. uses E-values Greater than 1 are most likely not relevant MOL204 Applied Bioinformatics Lecture 8 19 MOL204 Applied Bioinformatics Lecture 8 20 Database Searches: Blastp Blast at NCBI Database Searches: Blastp Limit search by species Paste in your sequence here Filter removes low complexity regions (GGGSGGGS ) Increase E value for higher sensitivity and shorter sequences Use the smallest database needed for your purpose Try different matrices and Gap costs MOL204 Applied Bioinformatics Lecture 8 21 MOL204 Applied Bioinformatics Lecture 8 22 Database Searches: Blastp Database Searches: Blastp Increas list size for large protein families sequence length Check here if you want PSI- BLAST Result from CDD Limit hits to an E-value range for large families MOL204 Applied Bioinformatics Lecture 8 23 request ID Press format button to continue; - but you may choose to alter setting! MOL204 Applied Bioinformatics Lecture
5 Database Searches: Blastp choose style of output type of alignment Check to format for PSI-Blast Other Database Search Sites The Blast family: Blastn DNA-DNA Blastp Protein-Protein Blastx DNA-Protein good for new cdnas! Tblastn Protein-DNA if gene is not predicted Tblastx 6-frame x 6frame ExPasy Blast WU-Blast2 at EBI and at Washington U. easier to try different options FASTA3 at Swiss Bioinformatics Centre (SIB) quick because less used; UniProt! at EBI - the current best FASTA implementation If needed, restrict to a range of E-values BIC - the Bioccelerator ParAlign (in Oslo!) Smith-Waterman on special chip Novel fast heuristic method MOL204 Applied Bioinformatics Lecture 8 25 MOL204 Applied Bioinformatics Lecture 8 26 Other Database Search Sites Other Database Search Sites Uses UNIPROT MOL204 Applied Bioinformatics Lecture 8 27 MOL204 Applied Bioinformatics Lecture 8 28 Choice of databases Quality of databases SwissProt entries TREMBL REFSEQ entries PDB entries The best quality and best annotated database - but also rather incomplete Translation of EMBL DNA database ~equivalent to GenPept A reference database one entry per object The ULTIMATE database in theory! A reference database one entry per object The ULTIMATE database in theory! Sequencing errors Gene prediction errors Very common for large complete genomes c.f. NURF P301, TOUTATIS Annotation errors Primary databases are not corrected unless authors agree. c.f. FSH Redundancy Even the non-redunant databases are significantly redundant. Organism-specific databases. MOL204 Applied Bioinformatics Lecture 8 29 MOL204 Applied Bioinformatics Lecture
6 A case: Search for yeast SET domains A case: Search for yeast SET domains Blast Default Blast Blast Blosum62 Blosum45 no filter no filter (>gap) Bic Default Bic (>gap) 1e-155 1e-155 1e-126 7e-171 3e-163 2e-14 2e-14 2e-13 2e-15 4e MOL204 Applied Bioinformatics Lecture hits 8 hits 8 hits 41 hits 34 hits MOL204 Applied Bioinformatics Lecture 8 32 A case: Search for yeast SET domains Always perform reciprocal searches A case: Search for yeast SET domains Check that alignments are sensible 2e e-53 1e-153 >gi ref NP_ transcription factor containing a SET domain; Set2p Length = 733 Score = 27.3 bits (59), Expect = 2.4 Identities = 39/152 (25%), Positives = 64/152 (41%), Gaps = 32/152 (21%) Query: 37 CSN--WESSRSADIEVRKSSNERDFGVFAADSCVKGELIQEYLGKIDFQKNYQTDPNNDY 94 C N ++ + A I + K GV A + I EY G D DY Sbjct: 109 CQNQRFQKKQYAPIAIFKTKH-KGYGVRAEQDIEANQFIYEYKGEVIEEMEFR-DRLIDY 166 Query: 95 RLMGTTKPKVLFHPHWPL-----YIDSRETGGLTRYIRRSCEPNVELVTVRPLDEKPRGD H ID+ G L R+ SC PN + Sbjct: DQRHFKHFYFMMLQNGEFIDATIKGSLARFCNHSCSPNAYV Query: 150 NDCRVKFVLR----AIRDIRKGEEISVEWQWD 177 N VK LR A R I KGEEI+ ++ D Sbjct: 208 NKWVVKDKLRMGIFAQRKILKGEEITFDYNVD hits MOL204 Applied Bioinformatics Lecture 8 33 MOL204 Applied Bioinformatics Lecture 8 34 A case: Search for yeast SET domains A case: Search for yeast SET domains Check that alignments are sensible A false hit (E=38) >gi ref NP_ Cdc123p Length = 360 Score = 23.5 bits (49), Expect = 38 Identities = 23/97 (23%), Positives = 42/97 (42%), Gaps = 10/97 (10%) Query: 10 KAITISEYKDKYVKMFIDNHYDDDWVVCSNWESSRSADIEVRKSSNERDFGVFAADSCVK 69 K+I + K D + E+SRS E + + D+ + D Sbjct: 37 KSIVLKSLPKKFIQ-----YLEQDGIKLPQEENSRSVYTEEIIRNEDNDYSDWEDDEDTA 91 Query: 37 CSN--WESSRSADIEVRKSSNERDFGVFAADSCVKGELIQEYLGKIDFQKNYQTDPNNDY 94 C N ++ + A I + K GV A + I EY G D DY Sbjct: 109 CQNQRFQKKQYAPIAIFKTKH-KGYGVRAEQDIEANQFIYEYKGEVIEEMEFR-DRLIDY 166 Query: 70 GELIQEYLGKIDFQKNYQ--TDPNNDYRLMGTTKPKV 104 E +QE IDF + +Q D N+ +G PK+ Sbjct: 92 TEFVQEVEPLIDFPELHQKLKDALNE---LGAVAPKL 125 MOL204 Applied Bioinformatics Lecture 8 35 MOL204 Applied Bioinformatics Lecture
7 Database Searches Using low sequence complexity filter (SEG) [p254] Many proteins contain extensive regions with a low sequence complexity Example: C-ter third of FSH_DROME HLMQPAGPQQ QQQQQQQQPF GHQQQQQQQQ QQQQQQQQQH MDYVTELLSK GAENVGGMNG NHLLNFNLDM AAAYQQKHPQ QQQQQAHNNG FNVADFGMAG FDGLNMTAAS FLDLEPSLQQ QQMQQMQLQQ QHHQQQQQQT HQQQQQHQQQ HHQQQQQQLT QQQLQQQQQQ QQQQQHLQQQ QHQQQHHQAA NKLLIIPKPI ESMMPSPPDK QQLQQHQKVL PPQQSPSDMK LHPNAAAAAA VASAQAKLVQ TFKANEQNLK NASSWSSLAS ANSPQSHTSS SSSSSKAKPA MDSFQQFRNK AKERDRLKLL EAAEKEKKNQ KEAAEKEQQR KHHKSSSSSL TSAAVAQAAA IAAATAAAAV TLGAAAAAAL ASSASNPSGG SSSGGAGSTS QQAITGDRDR DRDRERERER SGSGGGQSGN GNNSSNSANS NGPGSAGSGG SGGGGGSGPA SAGGPNSGGG GTANSNSGGG GGGGGPALLN AGSNSNSGVG SGGAASSNSN SSVGGIVGSG GPGSNSQGSS GGGGGGPASG GGMGSGAIDY GQQVAVLTQV AANAQAQHVA AAVAAQAILA ASPLGAMESG RKSVHDAQPQ ISRVEDIKAS Using multiple alignments as basis for similarity searches: PSI-Blast You can use Blast2sequences to see what is filtered out MOL204 Applied Bioinformatics Lecture 8 37 MOL204 Applied Bioinformatics Lecture 8 38 PSI-BLAST Position-specific iterated BLAST A very sensitive method for finding distant homologues PSI-BLAST All 6 yeast SET domains are easily identified after 2 rounds Hits after a BLAST search are selected for alignment and subsequent profile-search Conserved positions are emphasised. The procedure can be repeated until convergence is achieved; i.e. no new matches MOL204 Applied Bioinformatics Lecture 8 39 MOL204 Applied Bioinformatics Lecture 8 40 PSI-BLAST Similarity between SET domains and Methyltransferases The similarity between SET domains and a group of plant Methyltransferases was found after 9 rounds of PSI-BLAST (Rea et al. 2000) MOL204 Applied Bioinformatics Lecture 8 41 MOL204 Applied Bioinformatics Lecture
8 SET domains ARE smilar to Rubisco MTase PSI-BLAST PSI-BLAST involves the manual inclusion of new candidate relatives A powerful program that must be used with great care!!! MOL204 Applied Bioinformatics Lecture 8 43 MOL204 Applied Bioinformatics Lecture 8 44 Using multiple alignments as basis for similarity searches: Profiles and HMMs MOL204 Applied Bioinformatics Lecture 8 45 MOL204 Applied Bioinformatics Lecture 8 46 MOL204 Applied Bioinformatics Lecture 8 47 MOL204 Applied Bioinformatics Lecture
9 MOL204 Applied Bioinformatics Lecture 8 49 MOL204 Applied Bioinformatics Lecture 8 50 Domain families in SMART Bork et al., EMBL, Heidelberg Domain families in Pfam Eddy, Bateman, Sonnhammer et al., et al., St.Louis, Cambridge, Stockholm Collection of carefully aligned domains (665) Collection of carefully aligned domains (7503 = Pfam-A) + the automatically aligned ProDom families ( = Pfam-B) ~73% of sequences contains at least one hit with Pfam A (or B) Implementations at St. Louis and Sanger Centre are different! and both have nice features! - including MOL204 Applied Bioinformatics Lecture 8 51 MOL204 Applied Bioinformatics Lecture 8 52 Domain families in Pfam Eddy, Bateman, Sonnhammer et al., et al., St.Louis, Cambridge, Stockholm Domain families in CDD NCBI team Pfam + SMART + NCBI + COG (11088 ) Search by Reverse Position Specific BLAST Automatically done in Blastp MOL204 Applied Bioinformatics Lecture 8 53 MOL204 Applied Bioinformatics Lecture
10 Domain families in CDD NCBI team Domain families and more in InterPro Appweiler et al., EBI Domains, motifs, familes, superfamilies (11007 entries, 2573 domains 8166 families ) Well integrated with many other databases, and well mapped onto GO (Gene Ontology) Problem: collection of data from various sources of varying quality. I.e. check where data comes from! MOL204 Applied Bioinformatics Lecture 8 55 MOL204 Applied Bioinformatics Lecture 8 56 Domain families and more in InterPro Appweiler et al., EBI Dot plots to reveal repeats MOL204 Applied Bioinformatics Lecture 8 57 MOL204 Applied Bioinformatics Lecture 8 58 Different types of protein Prediction of TM regions Globular Trans-membrane Random coil Coiled coil. require different types of bioinformatical analysis MOL204 Applied Bioinformatics Lecture 8 59 MOL204 Applied Bioinformatics Lecture
11 Prediction of TM regions Prediction of TM regions TMpred (ISREC, Geneva) Human EGF Receptor TMpred (ISREC, Geneva) Human Rhodopsin A B C D E F G MOL204 Applied Bioinformatics Lecture 8 61 MOL204 Applied Bioinformatics Lecture 8 62 Prediction of coiled-coils Prediction of coiled-coils Coils (ISREC, Geneva) Human EEA1 MOL204 Applied Bioinformatics Lecture 8 63 MOL204 Applied Bioinformatics Lecture 8 64 GlobPlot: a method for predicting structure and unstructure MOL204 Applied Bioinformatics Lecture 8 65 MOL204 Applied Bioinformatics Lecture
Algorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More informationBioinformatics for Biologists
Bioinformatics for Biologists Sequence Analysis: Part I. Pairwise alignment and database searching Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Bioinformatics Definitions The use of computational
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationCISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)
CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationFundamentals of database searching
Fundamentals of database searching Aligning novel sequences with previously characterized genes or proteins provides important insights into their common attributes and evolutionary origins. The principles
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationChristian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel
Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a
More informationAlignment & BLAST. By: Hadi Mozafari KUMS
Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More information-max_target_seqs: maximum number of targets to report
Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression
More informationAlignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)
Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationSimilarity searching summary (2)
Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationRELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES
Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that
More informationSequences, Structures, and Gene Regulatory Networks
Sequences, Structures, and Gene Regulatory Networks Learning Outcomes After this class, you will Understand gene expression and protein structure in more detail Appreciate why biologists like to align
More informationGenome Annotation. Qi Sun Bioinformatics Facility Cornell University
Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST
More informationGrundlagen der Bioinformatik, SS 08, D. Huson, May 2,
Grundlagen der Bioinformatik, SS 08, D. Huson, May 2, 2008 39 5 Blast This lecture is based on the following, which are all recommended reading: R. Merkl, S. Waack: Bioinformatik Interaktiv. Chapter 11.4-11.7
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationWe have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences
Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences Start with Biology This will motivate the choices we make
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationIntroduction to protein alignments
Introduction to protein alignments Comparative Analysis of Proteins Experimental evidence from one or more proteins can be used to infer function of related protein(s). Gene A Gene X Protein A compare
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationDomain-based computational approaches to understand the molecular basis of diseases
Domain-based computational approaches to understand the molecular basis of diseases Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC http://bioinf.umbc.edu Research at Kann s Lab.
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationSequence alignment methods. Pairwise alignment. The universe of biological sequence analysis
he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More informationPractical search strategies
Computational and Comparative Genomics Similarity Searching II Practical search strategies Bill Pearson wrp@virginia.edu 1 Protein Evolution and Sequence Similarity Similarity Searching I What is Homology
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationIntroductory course on Multiple Sequence Alignment Part I: Theoretical foundations
Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:
More informationDATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018
DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationOverview Multiple Sequence Alignment
Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationTutorial 4 Substitution matrices and PSI-BLAST
Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about
More informationG4120: Introduction to Computational Biology
ICB Fall 2003 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology Copyright 2003 Oliver Jovanovic, All Rights Reserved. Bioinformatics and
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationLecture 2. The Blast2GO annotation framework
Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall 2015 h.p://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to h.p://www.ebi.ac.uk/interpro/training.html and finish the second online training
More informationCollected Works of Charles Dickens
Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny
More informationMultiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins
Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire
More informationHeuristic Alignment and Searching
3/28/2012 Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch). Local Alignment An optimal pair of subsequences is taken from the two
More informationSubstitution matrices
Introduction to Bioinformatics Substitution matrices Jacques van Helden Jacques.van-Helden@univ-amu.fr Université d Aix-Marseille, France Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More information7.36/7.91 recitation CB Lecture #4
7.36/7.91 recitation 2-19-2014 CB Lecture #4 1 Announcements / Reminders Homework: - PS#1 due Feb. 20th at noon. - Late policy: ½ credit if received within 24 hrs of due date, otherwise no credit - Answer
More informationLecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models
Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationFunctional Annotation
Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance
More informationProcedure to Create NCBI KOGS
Procedure to Create NCBI KOGS full details in: Tatusov et al (2003) BMC Bioinformatics 4:41. 1. Detect and mask typical repetitive domains Reason: masking prevents spurious lumping of non-orthologs based
More informationBiology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan
Biology Tutorial Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan Viruses A T4 bacteriophage injecting DNA into a cell. Influenza A virus Electron micrograph of HIV. Cone-shaped cores are
More informationHIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION
From THE CENTER FOR GENOMICS AND BIOINFORMATICS Karolinska Institutet, Stockholm, Sweden HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION Markus Wistrand Stockholm 2005 All previously published
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationIntroduction to sequence alignment. Local alignment the Smith-Waterman algorithm
Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational
More informationAmino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)
Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings
More information