In Silico Identification and Characterization of Effector Catalogs
|
|
- Gillian Holt
- 6 years ago
- Views:
Transcription
1 Chapter 25 In Silico Identification and Characterization of Effector Catalogs Ronnie de Jonge Abstract Many characterized fungal effector proteins are small secreted proteins. Effectors are defined as those proteins that alter host cell structure and/or function by facilitating pathogen infection. The identification of effectors by molecular and cell biology techniques is a difficult task. However, with the availability of whole-genome sequences, these proteins can now be predicted in silico. Here, we describe in detail how to identify and characterize effectors from a defined fungal proteome using in silico techniques. Key words: Secretome, Effector, Pathogen, Host, Interaction, PHI-base, SignalP, InterProScan, GO Terms, WoLF PSORT 1. Introduction Whole-genome sequencing has become a popular tool for the study of microbe host interactions. Genome sequences are available for many fungi, including plant pathogens, symbiotic fungi, and saprophytic fungi, but also for opportunistic mammalian fungal pathogens. Moreover, sequencing new species and additional strains of particular species has become much faster and cheaper with the introduction of next generation sequencing (NGS). Current genome sequencing projects focus on high-throughput methods, as they favor speed, accuracy, and low price to base pair ratios. Available NGS techniques have been reviewed recently by Metzker ( 1 ). Sequence assembly and subsequent gene model prediction are the next steps in a genome sequencing project. Various tools are available for sequence assembly and gene model prediction, but precise procedures and methods for these tools are Melvin D. Bolton and Bart P.H.J. Thomma (eds.), Plant Fungal Pathogens: Methods and Protocols, Methods in Molecular Biology, vol. 835, DOI / _25, Springer Science+Business Media, LLC
2 416 R. de Jonge not included in this chapter. Genome assembly methods and algorithms have been reviewed extensively by Miller et al. ( 2 ). Furthermore; the SEQanswers wiki ( wiki/software ) and SEQanswers forum ( ) contain many links to, and tips on, programs for NGS sequence assembly. Prediction of genes in fungal genomes (or any other eukaryote) can be performed using a variety of different approaches which were recently reviewed by Martinez et al. ( 3 ). To characterize effector catalogs, first the genome is annotated by assigning putative functions to as many genes as possible. Subsequently, the set of secreted proteins, or secretome, is defined and ultimately the putative effector catalog is identified and characterized. 2. Methods 2.1. Genome Annotation IPS Is Installed Locally on a 64-bit Linux Server Gene annotation describes methods to deduct (putative) functions from gene sequences. Various methods for large-scale annotation exist, including blast analyses against the nonredundant ( nr ), the Uniprot or the Swissprot sequence database, and the use of Hidden Markov Models (HMMs) such as those which are deposited in the Pfam database ( 4 ). At present, various pipelines are available for automated annotation of a large set of protein sequences like that of a fungal proteome. InterProScan (IPS; ( 5 ) ) and Blast2GO (B2G) are regularly used for whole-genome annotation ( 6, 7 ). For IPS the following procedure is used: (a) Info and the download repository can be found through: (b) The initial installation requires the/data/section, the precompiled binaries (32-bit and 64-bit Linux are supported) and IPS itself (Perl architecture). Decompress all files (according to instructions) using % gunzip c filex.tar.gz tar xvf and follow the installation instructions as in the Installing_ InterProScan.txt document (present in the IPS Perl package). (c) IPS has been developed in Perl5, and requires that various Perl modules are installed beforehand. A list of required modules can be found in the installation manual, and installation should be done by CPAN for convenience (manual for Perl CPAN Shell: perlcpan.htm ). (d) The IPS installation is basically a configuration process. Run the %/perl Config.pl from within the iprscan main directory and answer the questions displayed. Options are not permanent; they can later be modified in the configuration files, or by rerunning the Config.pl script.
3 25 In Silico Identification and Characterization of Effector Catalogs Testing the IPS Installation Running an IPS Analysis Running a B2G Analysis (a) The IPS package comes with a set of test sequences, located in the fasta formatted file. Run a test analysis from the./iprscan/ bin/using syntax: %./iprscan -cli -i../test.seq -iprlookup -goterms. Each run produces an output directory, containing all the individual files and a file summarizing all the data (importable to e.g., Excel). (a) To identify as much information as possible, run the IPS analyses using all available modules. The modules typically used are HMMPfam, HMMPanther, BlastProDom, FPrintScan, HMMSmart, HMMPIR, HMMTigr, ProfileScan, HAMAP, patternscan, SuperFamily, and Gene3D. (b) Syntax is: %./bin/iprscan cli i./inputseqs.fasta. If initialized using the iprlookup goterms syntax, IPS tries to retrieve the corresponding InterPro entry and GO term (useful for further analysis). For problems related to computational size (see Note 1). (c) Data output can be analyzed using Excel. B2G ( 6, 7 ) can also be used for automated annotation. As B2G is written in Java, it can be used on multiple platforms (such as Windows OS, Linux, and Mac OS). The software is user-friendly, owing to its graphical interface and intuitive applications. We typically use it for annotation, GO-term assignment, and GO-term enrichment analyses. We use the following procedure (largely adapted from the B2G tutorial; ): (a) Run the B2G suite from the web start, available at: You can run the software by determining the proper amount of memory (depends on the amount available in the machine running the analyses) and clicking the relevant link (e.g., 1,500 or 2,048 MB web start) or by manually changing the link setting (see website). (b) After installation and initialization, protein fasta files can be loaded by {(File), (Load Fasta File)}. Take care to choose the right format (protein fasta formatted) when opening your data file. (c) First step in the analysis includes blasting your data against a database {(Blast), (Run Blast Step)}. Various databases are possible, including nr, Swissprot and Refseq but also custom databases can be used if available and formatted locally using the Blast package ( 8 ). Various options can be changed when running the Blast analyses, including the number of Blast hits that should be recorded (default is 20), the expect-value (default is 1.0E-03, we use 1.0E-06), the blast algorithm (default is BlastP), and the blast mode (depending on whether you are running the analyses locally) (WWW-blast) or over the NCBI web service (QBlast@NCBI). The latter is advantageous since no local database maintenance is required.
4 418 R. de Jonge (d) A run for approximately 10,000 proteins (typical for most fungal genomes) takes around 24 h using this approach. If preferred IPS results can be imported or alternatively, IPS can be run from within B2G (see Note 2). (e) Next, GO-terms can be mapped to your data. To this end, go to {(Mapping), (Run GO-Mapping step)}. (f) Finally merge the data into the annotation by selecting the {(Annotation), (Run Annotation Step)}. (g) Data can be exported to e.g., Excel. B2G contains useful tools to extract statistic information from the various analysis steps (see Note 3) Secretome Prediction Introduction SignalP 3.0 Subcellular localization of protein sequences can be determined using various approaches, including detection of targeting signals (such as the signal peptides, ER retention signals, and nuclear localization signals), but also by a comparative approach (derive the most probable site of activity through homology information). The software programs which are commonly used, and which will be described in this section are SignalP 3.0, Phobius, and WoLF PSORT ( 9 13 ). SignalP 3.0 contains two different methods capable of detecting N-terminal signal peptides in proteins targeted to the extracellular space or the mitochondria. SignalP 3.0 can use two distinct methods for signal peptide prediction, i.e., neural network (NN) and HMM. Phobius uses the HMM method based SignalP3.0 algorithm in combination with the transmembrane domain predictor TMHMM2 to discriminate between intracellular, plasma-membrane bound, and extracellular proteins. A completely different strategy, based on feature-selection and the k nearest neighbors ( k NN) classifier, is used by WoLF PSORT, a recent extension of the well-known and broadly used programs PSORT and PSORT-II. In addition to these programs a number of alternatives are discussed in this section, including Sigcleave, SigPred, Protein Prowler, and SecretomeP. A comprehensive review on the methods available for the computational prediction of subcellular localization has been published in a previous volume of methods in molecular biology ( 14 ). In this section, the most common tools are shortly explained, and a method for genome-scale analysis is proposed. SignalP 3.0 predicts the presence and location of signal peptide cleavage sites in amino acid sequences ( 10, 13 ). The SignalP web server ( ) comes with the following set of options: Organism group (Eukaryotes, Gram-negative, and Gram-positive bacteria) Method (NN, HMM or both). Output format (standard (with graphics), full or short). Truncation (default setting is a cutoff at 70 amino acids).
5 25 In Silico Identification and Characterization of Effector Catalogs 419 Table 1 Typical SignalP3.0 nongraphical output To run a signal peptide prediction for the complete proteome, the short, no-graphics option is most easily applied. Both the NN and HMM prediction method can be used; however, for genomescale analysis the NN method is preferred, as its accuracy has been shown to be higher as compared to the HMM method ( 13 ). With these options, it is possible to load your proteome in subsets of 2000 protein sequences. Using a simple text editor such as Notepad or WordPad the complete proteome can be divided over multiple files, each containing a maximum of 2000 protein sequences. Alternatively, the SignalP 3.0 package can be installed locally on your personal computer, or computer cluster depending on necessity and availability. A download is available on the SignalP website, which can be obtained only after signing of the academic license agreement ( ). Installation instructions and a manual page are found on the same page. However, as the average proteome size of fungi is only around 10,000 15,000 proteins, one would need to run only 5 7 individual web server runs to obtain full results, and therefore running these analyses through the web server is favored for smaller laboratories running these analyses once or for only a few fungal genomes. The nongraphical output of SignalP consists of two defined sets of results: i.e., one table for the neural network (SignalP-NN) predictions and one table for the hidden Markov model (SignalP- HMM) predictions (see Table 1 for example using Cladosporium fulvum Ecp6 data, an extracellular fungal protein involved in inhibition of the chitin-induced plant immunity ( 15 ) ). The NN algorithm uses two features of a typical signal peptide, i.e., the presence/ absence of a signal peptide cleavage site (depicted by the C-score) and the likelihood of a certain amino acid to be part of a signal peptide (depicted by the S-score). The Y-score is derived from both the C-score and the S-score and aims to increase the accuracy of the cleavage site prediction. The S-mean score is derived by averaging the S-score over the signal peptide until the Y-score derived signal peptide cleavage site. With the release of SignalP 3.0 the D-score has been introduced which averages the Y-max and S-mean scores. The D-score (minimum 0.5 for secretory proteins) #SignalP-NN euk predictions # Name Cmax pos? Ymax pos? Smax pos? Smean? D? C. fulvum Ecp Y Y Y Y Y #SignalP-HMM euk predictions # Name! Cmax pos? Sprob? C. fulvum Ecp6 S Y Y
6 420 R. de Jonge is used to discriminate between secretory and nonsecretory proteins. This parameter can be varied between 0.4 and 0.6 without major effects on both sensitivity and specificity. Emanuelsson et al. ( 13 ) reported that within this range (D-score > 0.4 to D-score > 0.6) sensitivity decreases from 98.8 to 95.1% (3.7% difference) and the rate of false positives (Fp) decreased from 1.4 to 0.4% (1% difference). Similar scores are depicted for the SignalP-HMM based predictions, albeit significantly lower than for the SignalP-NN Phobius WoLF PSORT Phobius is a HMM which combines transmembrane topology, signal peptide, and signal peptide cleavage site predictions. It has been developed by the same authors that built the SignalP ( 9, 13 ) and TMHMM ( 16 ) programs, in an attempt to address the issue of overlapping predictions with these two programs ( 11 ). The Phobius web server ( ; ( 17 ) ) contains only one set of options: the output setup, i.e., short, long without graphics or long with graphics. Use the short output options for whole-genome analysis. The web server runs fast and a complete fungal proteome can be uploaded and run at once (no size restrictions are currently in place). A typical output consists of rows describing the sequence ID, the number of transmembrane domains (TMs), the presence or absence of a signal peptide (SP), and the protein topology in tabular format (Table 2 ). The output can easily be exported to Excel and further analyzed. WoLF PSORT ( 12 ) is a recent extension to the well-established PSORT-II program ( 18 ) but it also uses some PSORT ( 19 ) and ipsort ( 20 ) features. WoLF PSORT has been specifically built and trained using various eukaryotic protein sets (including fungal sequences, plant sequences, and animal sequences). The program can predict 12 different compartments or destinations for a protein sequence. It uses information regarding signal peptide sequence, amino acid preference, and homology to other proteins with known subcellular localization. The various features are ranked and summed using a k NN nearest neighbor classifier. At the web server (available at ), only 250 proteins (file size around 200 Kb) can be uploaded. The (only) input option selects for the type of organism from which the sequence was derived, being animal, plant, or fungi. A typical output (shown in Table 3 ) consist of single lines per protein sequence describing in tabular Table 2 Typical short Phobius output Sequence ID TM SP Prediction C. fulvum Ecp6 0 Y n5-13c18/19o*
7 25 In Silico Identification and Characterization of Effector Catalogs 421 Table 3 Typical short WoLF PSORT output k used for knn is: 27 C. fulvum Ecp6 Details extr: 27.0 format the protein identifier, a details link, and the predicted subcellular localization followed by the k NN classifier belonging to this predicted localization. In the case that the predictor is rather uncertain, multiple localizations are shown, each with its calculated k NN classifier. Besides the web server, a stand-alone package, not restricted in the number of input sequences, can be obtained and installed on a UNIX system. For genome-scale analysis the web server cannot be used because of the size restriction; therefore, we run the stand-alone program under Linux. Setting up the system is rather straightforward and consists of the following steps (for more detailed information we refer to the readme and installation documentation contained in the installation package): (a) Download the gunzip tarball from the server web site ( wolfpsort.org/ ). (b) Uncompress the package using e.g., gunzip. (c) Copy the binaries for the appropriate platform (either sparc or i-386; i-386 is standard for most computers) from the bin directory to the common/bin/directory of your distribution (typically./bin/ ). For this step, administrator rights are required (% sudo mv./bin/bin/; fill in password upon request). (d) The installation is now done; however, if the more detailed HTML table output is preferred or required, an additional installation step should be performed, i.e., go to the folder./ bin/psortmodifiedforwolffiles and run psortmodifiedfor- WoLF with the t all.seq option ( %./psortmodifiedfor- WoLF -t all.seq ). (e) The installation directory can now be copied to any preferred location as long as the subdirectory structure is preserved. (f) The software can be run using the following two commands, depending on the output format of choice. Run %./bin/runwolfpsortsummaryonly.pl fungi <./bin/testquery.fasta for a simple text based result. Run %.../bin/runwolfpsorthtmltables.pl fungi testout/queryname <./bin/testquery.fasta for a more elaborate report, containing HTML links to the PSORT-II and ipsort output.
8 422 R. de Jonge Typically, we run the simple text based results and export these to Excel, similarly as for the Phobius and SignalP results Alternative Methods for the Prediction of Subcellular Localization Removal of False Positives Defi ning the Secretome Numerous methods exist for the prediction of subcellular localization of protein sequences. The most commonly used programs are described above, yet a lot more useful tools are available. The types of information used are amino acid content, sequence similarity (homology based), signal peptide prediction, domain signatures, and nonsequence based methods. The different predictors that apply these methods have been extensively reviewed by Nakai and Horton ( 14 ). A recent paper by Casadio et al. ( 21 ) reviewed some of the latest results from comparisons between various predictors. The prime predictors based on their review are TargetP (SignalP) extension for multiple compartments, ( 13 ), Protein Prowler ( 22 ), LocTree ( 23 ), BaCelLo ( 24 ), and WoLF PSORT ( 12 ). In order to minimize the number of falsely identified secreted proteins, a number of methods are employed. Plasma membrane bound proteins are removed by both Phobius and WoLF PSORT. Previously, Klee and Sosa ( 25 ) demonstrated that WoLF PSORT was the best method for discriminating secreted from plasma membrane bound proteins. Also, WoLF PSORT includes some feature-based methods to identify nucleolar proteins (by nucleolar localization signals) and ER retention motifs. In order to define the definitive secretome, an overlap approach is used. The data gathered before using Phobius, SignalP 3.0, and WoLF PSORT is combined and only proteins that are predicted to be extracellular by WoLF PSORT, that have a signal peptide, and signal peptide cleavage site according to SignalP 3.0 with a minimal D-Score of 0.4 and which are predicted to have no internal transmembrane helix (TM = 0 by Phobius) are classified as secreted proteins. This comparative approach has been applied to the Verticillium dahliae and Verticillium albo - atrum genome Klosterman et al. ( 26 ), and similar methods have been used for Postia placenta and Phanerochaete chrysosporium by van den Wymelenberg et al. ( 27 ), and for Candida albicans by Lee et al. ( 28 ). By the analysis of unpublished datasets we found in general that a high accuracy is obtained when using this comparative approach. Alternatively, the secretome can be defined by subsequently adding up all proteins that are predicted to be secreted by any program (or by multiple proteins). This method is in part (the programs are run sequentially, but positively scoring proteins are removed before the next step) deployed within the fungal secretome database (FSD, ( 29 ) ). Typically this method yields high sensitivity but reduced specificity. Similar results were described recently by Lum and Min ( 30 ) which describe another database
9 25 In Silico Identification and Characterization of Effector Catalogs 423 for fungal protein localization predictions based on the same principles as presented in this manuscript Effector Identifi cation In this section a number of methods are described for the characterization and categorization of the secretome, in order to define the set of proteins that may act as effector molecules. The first steps include annotation and categorization. Annotation by Blast, IPS, and B2G has been performed for the complete proteome, and these annotation details can be obtained for the secreted proteins. Categorization is performed by analyses of all forms of annotation. Proteins for which neither domains nor informative BlastP hits are observed (thus for proteins for which no function can be obtained) are defined as hypothetical proteins. This group is further subdivided in hypothetical proteins with only noninformative BlastP hits (conserved hypothetical proteins) to other hypothetical proteins (e.g., hypothetical proteins from other fungi) and proteins with no observed homology in the nr database (nonconserved hypothetical proteins). Further subdivision can be performed on the conserved hypothetical proteins based on a number of classifications, i.e., the level of homology and the broadness of observed homology along the tree of life. Besides hypothetical proteins, we cluster secreted proteins in multiple enzymatic categories, dictated by the carbohydrate-active enzyme database, or CAZY ( ; ( 31 ) ) Further divisions are made based on specific enzymatic groups (noncarbohydrate acting, such as phosphatases and proteases), carbohydrate binding capacity, and the rest of the proteins are (for now) depicted under miscellaneous proteins. For the next step we compare the secreted protein set to the pathogen-host interaction database (PHI-base; ; ( 32 ) ) using stand-alone BlastP. To this end, the protein fasta file containing the PHI-base proteins was downloaded and formatted locally using the formatdb algorithm, which is part of the Blast package ( 8 ). The formatted PHI-base database can then be used to annotate the secretome using BlastP analyses ( P -value < ). Also, using intrinsic properties of the secretome proteins we can predict and categorize an additional set of potential effector molecules. Generally, it has been observed that effector molecules are small in size (typically less than 300 amino acids) and rich in cysteine residues ( 33 ). These features can be used to annotate the secretome and define a set of putative small secreted proteins. 3. Notes 1. Running IPS analyses for a complete proteome (>10,000 proteins) requires a significant amount of memory and processor computing capacity. Data can be chopped in smaller bits and sequentially run using the & command in Linux to prevent
10 424 R. de Jonge overloading (and subsequent crashes). If problems occur with either memory or processor overload, it can be useful to check and alter the settings in the IPS configuration file related to the chunk size. IPS uses a parallelization procedure to effectively cope with bulk requests. This procedure chops the input file in smaller sets which are subsequently analyzed in parallel. The size of these sets, also known as chunks, is defined by the chunk size parameter. Increasing chunk size will limit the amount of parallel jobs and subsequently reduce processor and memory footprint. For a 64-bit server (8 cores, 12 GB of memory) a rather large chunk size of 500 1,000 is advisable. 2. IPS is included in the B2G program, and full genome annotation is performed using web-based service-access to the IPS repository hosted at the European Bioinformatics Institute (EBI). An IPS analysis can be run from {(Annotation), (InterProScan), (Run InterProScan (online))}. It is also possible to import the data from a previous IPS run (e.g., when the analysis was performed on a stand-alone server), by choosing the (Import InterProScan Results (xml)) option. Remember to use the right output format (the default in fact) for the IPS run % -format (raw xml txt ebixml html). 3. After each analysis (Blast, Mapping, Annotation and InterProScan) step statistics can be generated in B2G and visualized by choosing the appropriate statistics from the drop down menu under {(Statistics)}. Acknowledgments This research was supported by a Vidi grant of the Research Council for Earth and Life Sciences (ALW) of the Netherlands Organization for Scientific Research (NWO), by the European Research Area Network (ERA-NET) Plant Genomics and by the Centre for BioSystems Genomics (CBSG), which is part of the Netherlands Genomics Initiative and NWO. References 1. Metkzer ML (2010) Sequencing technologies the next generation. Nat. Rev. Genet. 11, Miller JR, Koren S and Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95, Martinez D, Grigoriev I and Salamov A (2010) Annotation of protein-coding genes in fungal genomes. Appl. Comput. Math. 9, Finn RD, et al (2010) The Pfam protein families database. Nucl. Acid. Res. 38, Zdobnov EM and Apweiler R (2001) InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinform. 17, Conesa A, et al (2005) Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinform. 21, Götz S, et al (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acid. Res. 36,
11 25 In Silico Identification and Characterization of Effector Catalogs Altschul SF, et al (1990) Basic Local Alignment Search Tool. J. Mol. Biol. 215, Nielsen H, et al (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, Bendtsen JD, et al (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, Käll L, Krogh A and Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, Horton P, et al (2007) WoLF PSORT: protein localization predictor. Nucl. Acid. Res. 35, Emanuelsson O, et al (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protocol. 2, Nakai K and Horton P (2007) Computational prediction of subcellular localization. Method. in Mol Biol. 390, de Jonge R, et al (2010) Conserved fungal LysM effector Ecp6 prevents chitin-triggered immunity in plants. Science 329, Krogh A et al (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, Käll L, Krogh A and Sonnhammer ELL (2007) Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucl. Acid. Res. 35, Horton P and Nakai K (1999) Psort: a program for detecting sorting signals in proteins and determining their subcellular localization. TIBS 24, 34 xx 19. Nakai K and Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, Bannai H, et al (2002) Extensive feature detection of N-terminal protein sorting signals. Bioinform. 18, Casadio R, Martelli PL and Pierleoni A (2008) The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief. Func. Genom. Proteom. 7, Hawkings J and Boden M (2006) Detecting and sorting targeting peptides with neural networks and support vector machines. J. Bioinform. Comput. Biol. 4, Nair R and Rost B (2005) Mimicking cellular sorting improves prediction of subcelluar localization. J. Mol. Biol. 348, Pierleoni A, et al (2006) BaCelLo: a balanced subcellular localization predictor. Bioinform. 22, Klee EW and Sosa CP (2007) Computational classification of classically secreted proteins. Drug. Discov. Today 12, Klosterman S, et al (2011) Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathog 7: e van den Wymelenberg A, et al (2006) Computational analysis of the Phanerochaete chrysosporium v2.0 genome database and mass spectrometry identification of peptides in ligninolytic cultures reveal complex mixtures of secreted proteins. Fungal Genet. Biol. 43, Lee SA, et al (2003) An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. Yeast 20, Choi J, et al (2010) Fungal secretome database: Integrated platform for annotation of fungal secretomes. BMC Genomics 11, Lum G and Min XJ (2011) FunSecKB: the fungal secretome knowledgebase. Databases (Oxford) 2011, bar Cantarel BL, et al (2009) The Carbohydrate- Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucl. Acid. Res. 37, Winnenburg R, et al (2006) PHI-base: a new database for pathogen host interactions. Nucl. Acid. Res. 34, Rep M (2005) Small proteins of plant-pathogenic fungi secreted during host colonization. FEMS Microbiol. Lett. 253, 19 27
-max_target_seqs: maximum number of targets to report
Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs:
More informationIntro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models
Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL
More informationToday. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure
Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK
More informationFUNCTION ANNOTATION PRELIMINARY RESULTS
FUNCTION ANNOTATION PRELIMINARY RESULTS FACTION I KAI YUAN KALYANI PATANKAR KIERA BERGER CAMILA MEDRANO HUBERT PAN JUNKE WANG YANXI CHEN AJAY RAMAKRISHNAN MRUNAL DEHANKAR OVERVIEW Introduction Previous
More informationSupplementary Materials for mplr-loc Web-server
Supplementary Materials for mplr-loc Web-server Shibiao Wan and Man-Wai Mak email: shibiao.wan@connect.polyu.hk, enmwmak@polyu.edu.hk June 2014 Back to mplr-loc Server Contents 1 Introduction to mplr-loc
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationTMHMM2.0 User's guide
TMHMM2.0 User's guide This program is for prediction of transmembrane helices in proteins. July 2001: TMHMM has been rated best in an independent comparison of programs for prediction of TM helices: S.
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationSupplementary Materials for R3P-Loc Web-server
Supplementary Materials for R3P-Loc Web-server Shibiao Wan and Man-Wai Mak email: shibiao.wan@connect.polyu.hk, enmwmak@polyu.edu.hk June 2014 Back to R3P-Loc Server Contents 1 Introduction to R3P-Loc
More informationGalaxy in Plant Pathology: Not everything is NGS data
Galaxy in Plant Pathology: Not everything is NGS data Peter Cock & Leighton Pritchard Galaxy Community Conference Lunteren, The Netherlands 25 May 2011 JHI Plant Pathology We work on a range of organisms
More informationPublic Database 의이용 (1) - SignalP (version 4.1)
Public Database 의이용 (1) - SignalP (version 4.1) 2015. 8. KIST 이철주 Secretion pathway prediction ProteinCenter (Proxeon Bioinformatics, Odense, Denmark; http://www.cbs.dtu.dk/services) SignalP (version 4.1)
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationIntroduction to Pattern Recognition. Sequence structure function
Introduction to Pattern Recognition Sequence structure function Prediction in Bioinformatics What do we want to predict? Features from sequence Data mining How can we predict? Homology / Alignment Pattern
More informationPROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES
3251 PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON COMPARTMENT-SPECIFIC BIOLOGICAL FEATURES Chia-Yu Su 1,2, Allan Lo 1,3, Hua-Sheng Chiu 4, Ting-Yi Sung 4, Wen-Lian Hsu 4,* 1 Bioinformatics Program,
More informationFunctional Annotation
Functional Annotation Outline Introduction Strategy Pipeline Databases Now, what s next? Functional Annotation Adding the layers of analysis and interpretation necessary to extract its biological significance
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationYeast ORFan Gene Project: Module 5 Guide
Cellular Localization Data (Part 1) The tools described below will help you predict where your gene s product is most likely to be found in the cell, based on its sequence patterns. Each tool adds an additional
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More informationLecture 2. The Blast2GO annotation framework
Lecture 2 The Blast2GO annotation framework Annotation steps Modulation of annotation intensity Export/Import Functions Sequence Selection Additional Tools Functional assignment Annotation Transference
More informationSUPPLEMENTARY INFORMATION
Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,
More informationGenome Annotation Project Presentation
Halogeometricum borinquense Genome Annotation Project Presentation Loci Hbor_05620 & Hbor_05470 Presented by: Mohammad Reza Najaf Tomaraei Hbor_05620 Basic Information DNA Coordinates: 527,512 528,261
More informationTutorial. Getting started. Sample to Insight. March 31, 2016
Getting started March 31, 2016 Sample to Insight CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.clcbio.com support-clcbio@qiagen.com Getting started
More informationBioinformatics methods COMPUTATIONAL WORKFLOW
Bioinformatics methods COMPUTATIONAL WORKFLOW RAW READ PROCESSING: 1. FastQC on raw reads 2. Kraken on raw reads to ID and remove contaminants 3. SortmeRNA to filter out rrna 4. Trimmomatic to filter by
More informationBioinformatics tools for phylogeny and visualization. Yanbin Yin
Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and
More informationCISC 636 Computational Biology & Bioinformatics (Fall 2016)
CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationImproved Prediction of Signal Peptides: SignalP 3.0
doi:10.1016/j.jmb.2004.05.028 J. Mol. Biol. (2004) 340, 783 795 Improved Prediction of Signal Peptides: SignalP 3.0 Jannick Dyrløv Bendtsen 1, Henrik Nielsen 1, Gunnar von Heijne 2 and Søren Brunak 1 *
More informationAnalysis of N-terminal Acetylation data with Kernel-Based Clustering
Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationDiscriminative Motif Finding for Predicting Protein Subcellular Localization
IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 1 Discriminative Motif Finding for Predicting Protein Subcellular Localization Tien-ho Lin, Robert F. Murphy, Senior Member, IEEE, and
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationComprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationShibiao Wan and Man-Wai Mak December 2013 Back to HybridGO-Loc Server
Shibiao Wan and Man-Wai Mak December 2013 Back to HybridGO-Loc Server Contents 1 Functions of HybridGO-Loc Server 2 1.1 Webserver Interface....................................... 2 1.2 Inputing Protein
More informationPrediction of signal peptides and signal anchors by a hidden Markov model
In J. Glasgow et al., eds., Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, 122-13. AAAI Press, 1998. 1 Prediction of signal peptides and signal anchors by a hidden Markov model Henrik
More informationSignal peptides and protein localization prediction
Downloaded from orbit.dtu.dk on: Jun 30, 2018 Signal peptides and protein localization prediction Nielsen, Henrik Published in: Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics Publication
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationIntegration of functional genomics data
Integration of functional genomics data Laboratoire Bordelais de Recherche en Informatique (UMR) Centre de Bioinformatique de Bordeaux (Plateforme) Rennes Oct. 2006 1 Observations and motivations Genomics
More informationRGP finder: prediction of Genomic Islands
Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication
More informationSUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH
SUB-CELLULAR LOCALIZATION PREDICTION USING MACHINE LEARNING APPROACH Ashutosh Kumar Singh 1, S S Sahu 2, Ankita Mishra 3 1,2,3 Birla Institute of Technology, Mesra, Ranchi Email: 1 ashutosh.4kumar.4singh@gmail.com,
More informationTutorial: Structural Analysis of a Protein-Protein Complex
Molecular Modeling Section (MMS) Department of Pharmaceutical and Pharmacological Sciences University of Padova Via Marzolo 5-35131 Padova (IT) @contact: stefano.moro@unipd.it Tutorial: Structural Analysis
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationProMass Deconvolution User Training. Novatia LLC January, 2013
ProMass Deconvolution User Training Novatia LLC January, 2013 Overview General info about ProMass Features Basics of how ProMass Deconvolution works Example Spectra Manual Deconvolution with ProMass Deconvolution
More informationSupporting online material
Supporting online material Materials and Methods Target proteins All predicted ORFs in the E. coli genome (1) were downloaded from the Colibri data base (2) (http://genolist.pasteur.fr/colibri/). 737 proteins
More informationTMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg
title: short title: TMSEG Michael Bernhofer, Jonas Reeb pp1_tmseg lecture: Protein Prediction 1 (for Computational Biology) Protein structure TUM summer semester 09.06.2016 1 Last time 2 3 Yet another
More informationReaxys Pipeline Pilot Components Installation and User Guide
1 1 Reaxys Pipeline Pilot components for Pipeline Pilot 9.5 Reaxys Pipeline Pilot Components Installation and User Guide Version 1.0 2 Introduction The Reaxys and Reaxys Medicinal Chemistry Application
More informationUsing AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins
Mol Divers (2008) 12:41 45 DOI 10.1007/s11030-008-9073-0 FULL LENGTH PAPER Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins Bing Niu Yu-Huan Jin Kai-Yan
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationHOWTO, example workflow and data files. (Version )
HOWTO, example workflow and data files. (Version 20 09 2017) 1 Introduction: SugarQb is a collection of software tools (Nodes) which enable the automated identification of intact glycopeptides from HCD
More informationEXAMPLE-BASED CLASSIFICATION OF PROTEIN SUBCELLULAR LOCATIONS USING PENTA-GRAM FEATURES
EXAMPLE-BASED CLASSIFICATION OF PROTEIN SUBCELLULAR LOCATIONS USING PENTA-GRAM FEATURES Jinsuk Kim 1, Ho-Eun Park 2, Mi-Nyeong Hwang 1, Hyeon S. Son 2,3 * 1 Information Technology Department, Korea Institute
More informationHydroCalc Proteome: a tool to identify distinct characteristics of effector proteins
HydroCalc Proteome: a tool to identify distinct characteristics of effector proteins G.J. da Silva 1,2, R.G.T.M. da Silva 1,2, V.A. Silva 1,2, E. C. Caritá 1, A.L. Fachin 1 and M. Marins 1 1 Unidade de
More informationWe have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences
Recap We have: Assembled six genomes Made predictions of most likely gene locations We will: Add a layers of biological meaning to the sequences Start with Biology This will motivate the choices we make
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More informationPGA: A Program for Genome Annotation by Comparative Analysis of. Maximum Likelihood Phylogenies of Genes and Species
PGA: A Program for Genome Annotation by Comparative Analysis of Maximum Likelihood Phylogenies of Genes and Species Paulo Bandiera-Paiva 1 and Marcelo R.S. Briones 2 1 Departmento de Informática em Saúde
More informationTUTORIAL EXERCISES WITH ANSWERS
TUTORIAL EXERCISES WITH ANSWERS Tutorial 1 Settings 1. What is the exact monoisotopic mass difference for peptides carrying a 13 C (and NO additional 15 N) labelled C-terminal lysine residue? a. 6.020129
More informationFUSION OF CONDITIONAL RANDOM FIELD AND SIGNALP FOR PROTEIN CLEAVAGE SITE PREDICTION
FUSION OF CONDITIONAL RANDOM FIELD AND SIGNALP FOR PROTEIN CLEAVAGE SITE PREDICTION Man-Wai Mak and Wei Wang Dept. of Electronic and Information Engineering The Hong Kong Polytechnic University, Hong Kong
More informationST-Links. SpatialKit. Version 3.0.x. For ArcMap. ArcMap Extension for Directly Connecting to Spatial Databases. ST-Links Corporation.
ST-Links SpatialKit For ArcMap Version 3.0.x ArcMap Extension for Directly Connecting to Spatial Databases ST-Links Corporation www.st-links.com 2012 Contents Introduction... 3 Installation... 3 Database
More informationIn-Depth Assessment of Local Sequence Alignment
2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationNetworks & pathways. Hedi Peterson MTAT Bioinformatics
Networks & pathways Hedi Peterson (peterson@quretec.com) MTAT.03.239 Bioinformatics 03.11.2010 Networks are graphs Nodes Edges Edges Directed, undirected, weighted Nodes Genes Proteins Metabolites Enzymes
More informationProtein Structure Prediction Using Neural Networks
Protein Structure Prediction Using Neural Networks Martha Mercaldi Kasia Wilamowska Literature Review December 16, 2003 The Protein Folding Problem Evolution of Neural Networks Neural networks originally
More informationLarge Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.
Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion
More informationGIS Software. Evolution of GIS Software
GIS Software The geoprocessing engines of GIS Major functions Collect, store, mange, query, analyze and present Key terms Program collections of instructions to manipulate data Package integrated collection
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationDATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018
DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationINTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA
INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology
More informationMetabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python
PharmaSUG 2018 - Paper AD34 Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python Kristen Cardinal, Colorado Springs, Colorado, United States Hao Sun, Sun
More informationThe File Geodatabase API. Craig Gillgrass Lance Shipman
The File Geodatabase API Craig Gillgrass Lance Shipman Schedule Cell phones and pagers Please complete the session survey we take your feedback very seriously! Overview File Geodatabase API - Introduction
More informationBMD645. Integration of Omics
BMD645 Integration of Omics Shu-Jen Chen, Chang Gung University Dec. 11, 2009 1 Traditional Biology vs. Systems Biology Traditional biology : Single genes or proteins Systems biology: Simultaneously study
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationA Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601
A Brief Introduction To GRTensor On MAPLE Platform A write-up for the presentation delivered on the same topic as a part of the course PHYS 601 March 2012 BY: ARSHDEEP SINGH BHATIA arshdeepsb@gmail.com
More informationGeodatabase An Overview
Federal GIS Conference February 9 10, 2015 Washington, DC Geodatabase An Overview Ralph Denkenberger - esri Session Path The Geodatabase - What is it? - Why use it? - What types are there? Inside the Geodatabase
More informationNINE CHOICE SERIAL REACTION TIME TASK
instrumentation and software for research NINE CHOICE SERIAL REACTION TIME TASK MED-STATE NOTATION PROCEDURE SOF-700RA-8 USER S MANUAL DOC-025 Rev. 1.3 Copyright 2013 All Rights Reserved MED Associates
More information1-D Predictions. Prediction of local features: Secondary structure & surface exposure
1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local
More informationTutorial 1: Setting up your Skyline document
Tutorial 1: Setting up your Skyline document Caution! For using Skyline the number formats of your computer have to be set to English (United States). Open the Control Panel Clock, Language, and Region
More informationSynteny Portal Documentation
Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationSupplementary text for the section Interactions conserved across species: can one select the conserved interactions?
1 Supporting Information: What Evidence is There for the Homology of Protein-Protein Interactions? Anna C. F. Lewis, Nick S. Jones, Mason A. Porter, Charlotte M. Deane Supplementary text for the section
More informationInnovation. The Push and Pull at ESRI. September Kevin Daugherty Cadastral/Land Records Industry Solutions Manager
Innovation The Push and Pull at ESRI September 2004 Kevin Daugherty Cadastral/Land Records Industry Solutions Manager The Push and The Pull The Push is the information technology that drives research and
More informationSupervised Ensembles of Prediction Methods for Subcellular Localization
In Proc. of the 6th Asia-Pacific Bioinformatics Conference (APBC 2008), Kyoto, Japan, pp. 29-38 1 Supervised Ensembles of Prediction Methods for Subcellular Localization Johannes Aßfalg, Jing Gong, Hans-Peter
More informationThe human transmembrane proteome
Dobson et al. Biology Direct (2015) 10:31 DOI 10.1186/s13062-015-0061-x RESEARCH Open Access The human transmembrane proteome László Dobson, István Reményi and Gábor E. Tusnády * Abstract Background: Transmembrane
More informationCross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic
Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic
More informationEasySDM: A Spatial Data Mining Platform
EasySDM: A Spatial Data Mining Platform (User Manual) Authors: Amine Abdaoui and Mohamed Ala Al Chikha, Students at the National Computing Engineering School. Algiers. June 2013. 1. Overview EasySDM is
More informationInvestigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST
Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and
More informationFuncNet a distributed platform for high-throughput protein function analysis. Andrew Clegg University College London. funcnet.eu
FuncNet a distributed platform for high-throughput protein function analysis Andrew Clegg University College London Outline of talk Introduction and background Working with FuncNet APIs and extensions
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationKarsten Vennemann, Seattle. QGIS Workshop CUGOS Spring Fling 2015
Karsten Vennemann, Seattle 2015 a very capable and flexible Desktop GIS QGIS QGIS Karsten Workshop Vennemann, Seattle slide 2 of 13 QGIS - Desktop GIS originally a GIS viewing environment QGIS for the
More informationMeiothermus ruber Genome Analysis Project
Augustana College Augustana Digital Commons Meiothermus ruber Genome Analysis Project Biology 2018 Predicted ortholog pairs between E. coli and M. ruber are b3456 and mrub_2379, b3457 and mrub_2378, b3456
More informationSVM Kernel Optimization: An Example in Yeast Protein Subcellular Localization Prediction
SVM Kernel Optimization: An Example in Yeast Protein Subcellular Localization Prediction Ṭaráz E. Buck Computational Biology Program tebuck@andrew.cmu.edu Bin Zhang School of Public Policy and Management
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationLast updated: Copyright
Last updated: 2012-08-20 Copyright 2004-2012 plabel (v2.4) User s Manual by Bioinformatics Group, Institute of Computing Technology, Chinese Academy of Sciences Tel: 86-10-62601016 Email: zhangkun01@ict.ac.cn,
More informationX!TandemPipeline (Myosine Anabolisée) validating, filtering and grouping MSMS identifications
X!TandemPipeline 3.3.3 (Myosine Anabolisée) validating, filtering and grouping MSMS identifications Olivier Langella and Benoit Valot langella@moulon.inra.fr; valot@moulon.inra.fr PAPPSO - http://pappso.inra.fr/
More information