In Silico Identification and Characterization of Effector Catalogs

Size: px

Start display at page:

Download "In Silico Identification and Characterization of Effector Catalogs"

Gillian Holt
6 years ago
Views:

1 Chapter 25 In Silico Identification and Characterization of Effector Catalogs Ronnie de Jonge Abstract Many characterized fungal effector proteins are small secreted proteins. Effectors are defined as those proteins that alter host cell structure and/or function by facilitating pathogen infection. The identification of effectors by molecular and cell biology techniques is a difficult task. However, with the availability of whole-genome sequences, these proteins can now be predicted in silico. Here, we describe in detail how to identify and characterize effectors from a defined fungal proteome using in silico techniques. Key words: Secretome, Effector, Pathogen, Host, Interaction, PHI-base, SignalP, InterProScan, GO Terms, WoLF PSORT 1. Introduction Whole-genome sequencing has become a popular tool for the study of microbe host interactions. Genome sequences are available for many fungi, including plant pathogens, symbiotic fungi, and saprophytic fungi, but also for opportunistic mammalian fungal pathogens. Moreover, sequencing new species and additional strains of particular species has become much faster and cheaper with the introduction of next generation sequencing (NGS). Current genome sequencing projects focus on high-throughput methods, as they favor speed, accuracy, and low price to base pair ratios. Available NGS techniques have been reviewed recently by Metzker ( 1 ). Sequence assembly and subsequent gene model prediction are the next steps in a genome sequencing project. Various tools are available for sequence assembly and gene model prediction, but precise procedures and methods for these tools are Melvin D. Bolton and Bart P.H.J. Thomma (eds.), Plant Fungal Pathogens: Methods and Protocols, Methods in Molecular Biology, vol. 835, DOI / _25, Springer Science+Business Media, LLC

2 416 R. de Jonge not included in this chapter. Genome assembly methods and algorithms have been reviewed extensively by Miller et al. ( 2 ). Furthermore; the SEQanswers wiki ( wiki/software ) and SEQanswers forum ( ) contain many links to, and tips on, programs for NGS sequence assembly. Prediction of genes in fungal genomes (or any other eukaryote) can be performed using a variety of different approaches which were recently reviewed by Martinez et al. ( 3 ). To characterize effector catalogs, first the genome is annotated by assigning putative functions to as many genes as possible. Subsequently, the set of secreted proteins, or secretome, is defined and ultimately the putative effector catalog is identified and characterized. 2. Methods 2.1. Genome Annotation IPS Is Installed Locally on a 64-bit Linux Server Gene annotation describes methods to deduct (putative) functions from gene sequences. Various methods for large-scale annotation exist, including blast analyses against the nonredundant ( nr ), the Uniprot or the Swissprot sequence database, and the use of Hidden Markov Models (HMMs) such as those which are deposited in the Pfam database ( 4 ). At present, various pipelines are available for automated annotation of a large set of protein sequences like that of a fungal proteome. InterProScan (IPS; ( 5 ) ) and Blast2GO (B2G) are regularly used for whole-genome annotation ( 6, 7 ). For IPS the following procedure is used: (a) Info and the download repository can be found through: (b) The initial installation requires the/data/section, the precompiled binaries (32-bit and 64-bit Linux are supported) and IPS itself (Perl architecture). Decompress all files (according to instructions) using % gunzip c filex.tar.gz tar xvf and follow the installation instructions as in the Installing_ InterProScan.txt document (present in the IPS Perl package). (c) IPS has been developed in Perl5, and requires that various Perl modules are installed beforehand. A list of required modules can be found in the installation manual, and installation should be done by CPAN for convenience (manual for Perl CPAN Shell: perlcpan.htm ). (d) The IPS installation is basically a configuration process. Run the %/perl Config.pl from within the iprscan main directory and answer the questions displayed. Options are not permanent; they can later be modified in the configuration files, or by rerunning the Config.pl script.

3 25 In Silico Identification and Characterization of Effector Catalogs Testing the IPS Installation Running an IPS Analysis Running a B2G Analysis (a) The IPS package comes with a set of test sequences, located in the fasta formatted file. Run a test analysis from the./iprscan/ bin/using syntax: %./iprscan -cli -i../test.seq -iprlookup -goterms. Each run produces an output directory, containing all the individual files and a file summarizing all the data (importable to e.g., Excel). (a) To identify as much information as possible, run the IPS analyses using all available modules. The modules typically used are HMMPfam, HMMPanther, BlastProDom, FPrintScan, HMMSmart, HMMPIR, HMMTigr, ProfileScan, HAMAP, patternscan, SuperFamily, and Gene3D. (b) Syntax is: %./bin/iprscan cli i./inputseqs.fasta. If initialized using the iprlookup goterms syntax, IPS tries to retrieve the corresponding InterPro entry and GO term (useful for further analysis). For problems related to computational size (see Note 1). (c) Data output can be analyzed using Excel. B2G ( 6, 7 ) can also be used for automated annotation. As B2G is written in Java, it can be used on multiple platforms (such as Windows OS, Linux, and Mac OS). The software is user-friendly, owing to its graphical interface and intuitive applications. We typically use it for annotation, GO-term assignment, and GO-term enrichment analyses. We use the following procedure (largely adapted from the B2G tutorial; ): (a) Run the B2G suite from the web start, available at: You can run the software by determining the proper amount of memory (depends on the amount available in the machine running the analyses) and clicking the relevant link (e.g., 1,500 or 2,048 MB web start) or by manually changing the link setting (see website). (b) After installation and initialization, protein fasta files can be loaded by {(File), (Load Fasta File)}. Take care to choose the right format (protein fasta formatted) when opening your data file. (c) First step in the analysis includes blasting your data against a database {(Blast), (Run Blast Step)}. Various databases are possible, including nr, Swissprot and Refseq but also custom databases can be used if available and formatted locally using the Blast package ( 8 ). Various options can be changed when running the Blast analyses, including the number of Blast hits that should be recorded (default is 20), the expect-value (default is 1.0E-03, we use 1.0E-06), the blast algorithm (default is BlastP), and the blast mode (depending on whether you are running the analyses locally) (WWW-blast) or over the NCBI web service (QBlast@NCBI). The latter is advantageous since no local database maintenance is required.

4 418 R. de Jonge (d) A run for approximately 10,000 proteins (typical for most fungal genomes) takes around 24 h using this approach. If preferred IPS results can be imported or alternatively, IPS can be run from within B2G (see Note 2). (e) Next, GO-terms can be mapped to your data. To this end, go to {(Mapping), (Run GO-Mapping step)}. (f) Finally merge the data into the annotation by selecting the {(Annotation), (Run Annotation Step)}. (g) Data can be exported to e.g., Excel. B2G contains useful tools to extract statistic information from the various analysis steps (see Note 3) Secretome Prediction Introduction SignalP 3.0 Subcellular localization of protein sequences can be determined using various approaches, including detection of targeting signals (such as the signal peptides, ER retention signals, and nuclear localization signals), but also by a comparative approach (derive the most probable site of activity through homology information). The software programs which are commonly used, and which will be described in this section are SignalP 3.0, Phobius, and WoLF PSORT ( 9 13 ). SignalP 3.0 contains two different methods capable of detecting N-terminal signal peptides in proteins targeted to the extracellular space or the mitochondria. SignalP 3.0 can use two distinct methods for signal peptide prediction, i.e., neural network (NN) and HMM. Phobius uses the HMM method based SignalP3.0 algorithm in combination with the transmembrane domain predictor TMHMM2 to discriminate between intracellular, plasma-membrane bound, and extracellular proteins. A completely different strategy, based on feature-selection and the k nearest neighbors ( k NN) classifier, is used by WoLF PSORT, a recent extension of the well-known and broadly used programs PSORT and PSORT-II. In addition to these programs a number of alternatives are discussed in this section, including Sigcleave, SigPred, Protein Prowler, and SecretomeP. A comprehensive review on the methods available for the computational prediction of subcellular localization has been published in a previous volume of methods in molecular biology ( 14 ). In this section, the most common tools are shortly explained, and a method for genome-scale analysis is proposed. SignalP 3.0 predicts the presence and location of signal peptide cleavage sites in amino acid sequences ( 10, 13 ). The SignalP web server ( ) comes with the following set of options: Organism group (Eukaryotes, Gram-negative, and Gram-positive bacteria) Method (NN, HMM or both). Output format (standard (with graphics), full or short). Truncation (default setting is a cutoff at 70 amino acids).

5 25 In Silico Identification and Characterization of Effector Catalogs 419 Table 1 Typical SignalP3.0 nongraphical output To run a signal peptide prediction for the complete proteome, the short, no-graphics option is most easily applied. Both the NN and HMM prediction method can be used; however, for genomescale analysis the NN method is preferred, as its accuracy has been shown to be higher as compared to the HMM method ( 13 ). With these options, it is possible to load your proteome in subsets of 2000 protein sequences. Using a simple text editor such as Notepad or WordPad the complete proteome can be divided over multiple files, each containing a maximum of 2000 protein sequences. Alternatively, the SignalP 3.0 package can be installed locally on your personal computer, or computer cluster depending on necessity and availability. A download is available on the SignalP website, which can be obtained only after signing of the academic license agreement ( ). Installation instructions and a manual page are found on the same page. However, as the average proteome size of fungi is only around 10,000 15,000 proteins, one would need to run only 5 7 individual web server runs to obtain full results, and therefore running these analyses through the web server is favored for smaller laboratories running these analyses once or for only a few fungal genomes. The nongraphical output of SignalP consists of two defined sets of results: i.e., one table for the neural network (SignalP-NN) predictions and one table for the hidden Markov model (SignalP- HMM) predictions (see Table 1 for example using Cladosporium fulvum Ecp6 data, an extracellular fungal protein involved in inhibition of the chitin-induced plant immunity ( 15 ) ). The NN algorithm uses two features of a typical signal peptide, i.e., the presence/ absence of a signal peptide cleavage site (depicted by the C-score) and the likelihood of a certain amino acid to be part of a signal peptide (depicted by the S-score). The Y-score is derived from both the C-score and the S-score and aims to increase the accuracy of the cleavage site prediction. The S-mean score is derived by averaging the S-score over the signal peptide until the Y-score derived signal peptide cleavage site. With the release of SignalP 3.0 the D-score has been introduced which averages the Y-max and S-mean scores. The D-score (minimum 0.5 for secretory proteins) #SignalP-NN euk predictions # Name Cmax pos? Ymax pos? Smax pos? Smean? D? C. fulvum Ecp Y Y Y Y Y #SignalP-HMM euk predictions # Name! Cmax pos? Sprob? C. fulvum Ecp6 S Y Y

6 420 R. de Jonge is used to discriminate between secretory and nonsecretory proteins. This parameter can be varied between 0.4 and 0.6 without major effects on both sensitivity and specificity. Emanuelsson et al. ( 13 ) reported that within this range (D-score > 0.4 to D-score > 0.6) sensitivity decreases from 98.8 to 95.1% (3.7% difference) and the rate of false positives (Fp) decreased from 1.4 to 0.4% (1% difference). Similar scores are depicted for the SignalP-HMM based predictions, albeit significantly lower than for the SignalP-NN Phobius WoLF PSORT Phobius is a HMM which combines transmembrane topology, signal peptide, and signal peptide cleavage site predictions. It has been developed by the same authors that built the SignalP ( 9, 13 ) and TMHMM ( 16 ) programs, in an attempt to address the issue of overlapping predictions with these two programs ( 11 ). The Phobius web server ( ; ( 17 ) ) contains only one set of options: the output setup, i.e., short, long without graphics or long with graphics. Use the short output options for whole-genome analysis. The web server runs fast and a complete fungal proteome can be uploaded and run at once (no size restrictions are currently in place). A typical output consists of rows describing the sequence ID, the number of transmembrane domains (TMs), the presence or absence of a signal peptide (SP), and the protein topology in tabular format (Table 2 ). The output can easily be exported to Excel and further analyzed. WoLF PSORT ( 12 ) is a recent extension to the well-established PSORT-II program ( 18 ) but it also uses some PSORT ( 19 ) and ipsort ( 20 ) features. WoLF PSORT has been specifically built and trained using various eukaryotic protein sets (including fungal sequences, plant sequences, and animal sequences). The program can predict 12 different compartments or destinations for a protein sequence. It uses information regarding signal peptide sequence, amino acid preference, and homology to other proteins with known subcellular localization. The various features are ranked and summed using a k NN nearest neighbor classifier. At the web server (available at ), only 250 proteins (file size around 200 Kb) can be uploaded. The (only) input option selects for the type of organism from which the sequence was derived, being animal, plant, or fungi. A typical output (shown in Table 3 ) consist of single lines per protein sequence describing in tabular Table 2 Typical short Phobius output Sequence ID TM SP Prediction C. fulvum Ecp6 0 Y n5-13c18/19o*

7 25 In Silico Identification and Characterization of Effector Catalogs 421 Table 3 Typical short WoLF PSORT output k used for knn is: 27 C. fulvum Ecp6 Details extr: 27.0 format the protein identifier, a details link, and the predicted subcellular localization followed by the k NN classifier belonging to this predicted localization. In the case that the predictor is rather uncertain, multiple localizations are shown, each with its calculated k NN classifier. Besides the web server, a stand-alone package, not restricted in the number of input sequences, can be obtained and installed on a UNIX system. For genome-scale analysis the web server cannot be used because of the size restriction; therefore, we run the stand-alone program under Linux. Setting up the system is rather straightforward and consists of the following steps (for more detailed information we refer to the readme and installation documentation contained in the installation package): (a) Download the gunzip tarball from the server web site ( wolfpsort.org/ ). (b) Uncompress the package using e.g., gunzip. (c) Copy the binaries for the appropriate platform (either sparc or i-386; i-386 is standard for most computers) from the bin directory to the common/bin/directory of your distribution (typically./bin/ ). For this step, administrator rights are required (% sudo mv./bin/bin/; fill in password upon request). (d) The installation is now done; however, if the more detailed HTML table output is preferred or required, an additional installation step should be performed, i.e., go to the folder./ bin/psortmodifiedforwolffiles and run psortmodifiedfor- WoLF with the t all.seq option ( %./psortmodifiedfor- WoLF -t all.seq ). (e) The installation directory can now be copied to any preferred location as long as the subdirectory structure is preserved. (f) The software can be run using the following two commands, depending on the output format of choice. Run %./bin/runwolfpsortsummaryonly.pl fungi <./bin/testquery.fasta for a simple text based result. Run %.../bin/runwolfpsorthtmltables.pl fungi testout/queryname <./bin/testquery.fasta for a more elaborate report, containing HTML links to the PSORT-II and ipsort output.

8 422 R. de Jonge Typically, we run the simple text based results and export these to Excel, similarly as for the Phobius and SignalP results Alternative Methods for the Prediction of Subcellular Localization Removal of False Positives Defi ning the Secretome Numerous methods exist for the prediction of subcellular localization of protein sequences. The most commonly used programs are described above, yet a lot more useful tools are available. The types of information used are amino acid content, sequence similarity (homology based), signal peptide prediction, domain signatures, and nonsequence based methods. The different predictors that apply these methods have been extensively reviewed by Nakai and Horton ( 14 ). A recent paper by Casadio et al. ( 21 ) reviewed some of the latest results from comparisons between various predictors. The prime predictors based on their review are TargetP (SignalP) extension for multiple compartments, ( 13 ), Protein Prowler ( 22 ), LocTree ( 23 ), BaCelLo ( 24 ), and WoLF PSORT ( 12 ). In order to minimize the number of falsely identified secreted proteins, a number of methods are employed. Plasma membrane bound proteins are removed by both Phobius and WoLF PSORT. Previously, Klee and Sosa ( 25 ) demonstrated that WoLF PSORT was the best method for discriminating secreted from plasma membrane bound proteins. Also, WoLF PSORT includes some feature-based methods to identify nucleolar proteins (by nucleolar localization signals) and ER retention motifs. In order to define the definitive secretome, an overlap approach is used. The data gathered before using Phobius, SignalP 3.0, and WoLF PSORT is combined and only proteins that are predicted to be extracellular by WoLF PSORT, that have a signal peptide, and signal peptide cleavage site according to SignalP 3.0 with a minimal D-Score of 0.4 and which are predicted to have no internal transmembrane helix (TM = 0 by Phobius) are classified as secreted proteins. This comparative approach has been applied to the Verticillium dahliae and Verticillium albo - atrum genome Klosterman et al. ( 26 ), and similar methods have been used for Postia placenta and Phanerochaete chrysosporium by van den Wymelenberg et al. ( 27 ), and for Candida albicans by Lee et al. ( 28 ). By the analysis of unpublished datasets we found in general that a high accuracy is obtained when using this comparative approach. Alternatively, the secretome can be defined by subsequently adding up all proteins that are predicted to be secreted by any program (or by multiple proteins). This method is in part (the programs are run sequentially, but positively scoring proteins are removed before the next step) deployed within the fungal secretome database (FSD, ( 29 ) ). Typically this method yields high sensitivity but reduced specificity. Similar results were described recently by Lum and Min ( 30 ) which describe another database

9 25 In Silico Identification and Characterization of Effector Catalogs 423 for fungal protein localization predictions based on the same principles as presented in this manuscript Effector Identifi cation In this section a number of methods are described for the characterization and categorization of the secretome, in order to define the set of proteins that may act as effector molecules. The first steps include annotation and categorization. Annotation by Blast, IPS, and B2G has been performed for the complete proteome, and these annotation details can be obtained for the secreted proteins. Categorization is performed by analyses of all forms of annotation. Proteins for which neither domains nor informative BlastP hits are observed (thus for proteins for which no function can be obtained) are defined as hypothetical proteins. This group is further subdivided in hypothetical proteins with only noninformative BlastP hits (conserved hypothetical proteins) to other hypothetical proteins (e.g., hypothetical proteins from other fungi) and proteins with no observed homology in the nr database (nonconserved hypothetical proteins). Further subdivision can be performed on the conserved hypothetical proteins based on a number of classifications, i.e., the level of homology and the broadness of observed homology along the tree of life. Besides hypothetical proteins, we cluster secreted proteins in multiple enzymatic categories, dictated by the carbohydrate-active enzyme database, or CAZY ( ; ( 31 ) ) Further divisions are made based on specific enzymatic groups (noncarbohydrate acting, such as phosphatases and proteases), carbohydrate binding capacity, and the rest of the proteins are (for now) depicted under miscellaneous proteins. For the next step we compare the secreted protein set to the pathogen-host interaction database (PHI-base; ; ( 32 ) ) using stand-alone BlastP. To this end, the protein fasta file containing the PHI-base proteins was downloaded and formatted locally using the formatdb algorithm, which is part of the Blast package ( 8 ). The formatted PHI-base database can then be used to annotate the secretome using BlastP analyses ( P -value < ). Also, using intrinsic properties of the secretome proteins we can predict and categorize an additional set of potential effector molecules. Generally, it has been observed that effector molecules are small in size (typically less than 300 amino acids) and rich in cysteine residues ( 33 ). These features can be used to annotate the secretome and define a set of putative small secreted proteins. 3. Notes 1. Running IPS analyses for a complete proteome (>10,000 proteins) requires a significant amount of memory and processor computing capacity. Data can be chopped in smaller bits and sequentially run using the & command in Linux to prevent

10 424 R. de Jonge overloading (and subsequent crashes). If problems occur with either memory or processor overload, it can be useful to check and alter the settings in the IPS configuration file related to the chunk size. IPS uses a parallelization procedure to effectively cope with bulk requests. This procedure chops the input file in smaller sets which are subsequently analyzed in parallel. The size of these sets, also known as chunks, is defined by the chunk size parameter. Increasing chunk size will limit the amount of parallel jobs and subsequently reduce processor and memory footprint. For a 64-bit server (8 cores, 12 GB of memory) a rather large chunk size of 500 1,000 is advisable. 2. IPS is included in the B2G program, and full genome annotation is performed using web-based service-access to the IPS repository hosted at the European Bioinformatics Institute (EBI). An IPS analysis can be run from {(Annotation), (InterProScan), (Run InterProScan (online))}. It is also possible to import the data from a previous IPS run (e.g., when the analysis was performed on a stand-alone server), by choosing the (Import InterProScan Results (xml)) option. Remember to use the right output format (the default in fact) for the IPS run % -format (raw xml txt ebixml html). 3. After each analysis (Blast, Mapping, Annotation and InterProScan) step statistics can be generated in B2G and visualized by choosing the appropriate statistics from the drop down menu under {(Statistics)}. Acknowledgments This research was supported by a Vidi grant of the Research Council for Earth and Life Sciences (ALW) of the Netherlands Organization for Scientific Research (NWO), by the European Research Area Network (ERA-NET) Plant Genomics and by the Centre for BioSystems Genomics (CBSG), which is part of the Netherlands Genomics Initiative and NWO. References 1. Metkzer ML (2010) Sequencing technologies the next generation. Nat. Rev. Genet. 11, Miller JR, Koren S and Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95, Martinez D, Grigoriev I and Salamov A (2010) Annotation of protein-coding genes in fungal genomes. Appl. Comput. Math. 9, Finn RD, et al (2010) The Pfam protein families database. Nucl. Acid. Res. 38, Zdobnov EM and Apweiler R (2001) InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinform. 17, Conesa A, et al (2005) Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinform. 21, Götz S, et al (2008) High-throughput functional annotation and data mining with the Blast2GO suite. Nucl. Acid. Res. 36,

11 25 In Silico Identification and Characterization of Effector Catalogs Altschul SF, et al (1990) Basic Local Alignment Search Tool. J. Mol. Biol. 215, Nielsen H, et al (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10, Bendtsen JD, et al (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, Käll L, Krogh A and Sonnhammer ELL (2004) A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338, Horton P, et al (2007) WoLF PSORT: protein localization predictor. Nucl. Acid. Res. 35, Emanuelsson O, et al (2007) Locating proteins in the cell using TargetP, SignalP and related tools. Nat. Protocol. 2, Nakai K and Horton P (2007) Computational prediction of subcellular localization. Method. in Mol Biol. 390, de Jonge R, et al (2010) Conserved fungal LysM effector Ecp6 prevents chitin-triggered immunity in plants. Science 329, Krogh A et al (2001) Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, Käll L, Krogh A and Sonnhammer ELL (2007) Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucl. Acid. Res. 35, Horton P and Nakai K (1999) Psort: a program for detecting sorting signals in proteins and determining their subcellular localization. TIBS 24, 34 xx 19. Nakai K and Kanehisa M (1992) A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14, Bannai H, et al (2002) Extensive feature detection of N-terminal protein sorting signals. Bioinform. 18, Casadio R, Martelli PL and Pierleoni A (2008) The prediction of protein subcellular localization from sequence: a shortcut to functional genome annotation. Brief. Func. Genom. Proteom. 7, Hawkings J and Boden M (2006) Detecting and sorting targeting peptides with neural networks and support vector machines. J. Bioinform. Comput. Biol. 4, Nair R and Rost B (2005) Mimicking cellular sorting improves prediction of subcelluar localization. J. Mol. Biol. 348, Pierleoni A, et al (2006) BaCelLo: a balanced subcellular localization predictor. Bioinform. 22, Klee EW and Sosa CP (2007) Computational classification of classically secreted proteins. Drug. Discov. Today 12, Klosterman S, et al (2011) Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathog 7: e van den Wymelenberg A, et al (2006) Computational analysis of the Phanerochaete chrysosporium v2.0 genome database and mass spectrometry identification of peptides in ligninolytic cultures reveal complex mixtures of secreted proteins. Fungal Genet. Biol. 43, Lee SA, et al (2003) An analysis of the Candida albicans genome database for soluble secreted proteins using computer-based prediction algorithms. Yeast 20, Choi J, et al (2010) Fungal secretome database: Integrated platform for annotation of fungal secretomes. BMC Genomics 11, Lum G and Min XJ (2011) FunSecKB: the fungal secretome knowledgebase. Databases (Oxford) 2011, bar Cantarel BL, et al (2009) The Carbohydrate- Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucl. Acid. Res. 37, Winnenburg R, et al (2006) PHI-base: a new database for pathogen host interactions. Nucl. Acid. Res. 34, Rep M (2005) Small proteins of plant-pathogenic fungi secreted during host colonization. FEMS Microbiol. Lett. 253, 19 27

-max_target_seqs: maximum number of targets to report

-max_target_seqs: maximum number of targets to report Review of exercise 1 tblastn -num_threads 2 -db contig -query DH10B.fasta -out blastout.xls -evalue 1e-10 -outfmt "6 qseqid sseqid qstart qend sstart send length nident pident evalue" Other options: -max_target_seqs: