Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes

Similar documents
Phylogenetic analyses. Kirsi Kostamo

SUPPLEMENTARY INFORMATION

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Gene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

Evolutionary Tree Analysis. Overview

Phylogenetic Tree Reconstruction

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees


SUPPLEMENTARY INFORMATION

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Dr. Amira A. AL-Hosary

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Using Bioinformatics to Study Evolutionary Relationships Instructions

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Algorithms in Bioinformatics

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Comparative Genomics II

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Phylogenetic inference

Effects of Gap Open and Gap Extension Penalties

A phylogenomic toolbox for assembling the tree of life

Example of Function Prediction

A (short) introduction to phylogenetics

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Phylogenetics in the Age of Genomics: Prospects and Challenges

Session 5: Phylogenomics

Intraspecific gene genealogies: trees grafting into networks

BINF6201/8201. Molecular phylogenetic methods

Pathogenic fungi genomics, evolution and epidemiology! John W. Taylor University of California Berkeley, USA

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Multiple Sequence Alignment: HMMs and Other Approaches

Unsupervised Learning in Spectral Genome Analysis

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

EVOLUTIONARY DISTANCES

Phylogeny: building the tree of life

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

TheDisk-Covering MethodforTree Reconstruction

USE OF CLUSTERING TECHNIQUES FOR PROTEIN DOMAIN ANALYSIS

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Anatomy of a tree. clade is group of organisms with a shared ancestor. a monophyletic group shares a single common ancestor = tapirs-rhinos-horses

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Comparative Bioinformatics Midterm II Fall 2004

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Phylogenetics. BIOL 7711 Computational Bioscience

Multiple Sequence Alignment. Sequences

I. Short Answer Questions DO ALL QUESTIONS

1 Abstract. 2 Introduction. 3 Requirements. 4 Procedure

Isolating - A New Resampling Method for Gene Order Data

Phylogenetic analysis of Cytochrome P450 Structures. Gowri Shankar, University of Sydney, Australia.

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Phylogenetics Todd Vision Spring Some applications. Uncultured microbial diversity

Phylogeny. November 7, 2017

Phylogeny: traditional and Bayesian approaches

Lecture 8 Multiple Alignment and Phylogeny

(Stevens 1991) 1. morphological characters should be assumed to be quantitative unless demonstrated otherwise

molecular evolution and phylogenetics

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Computational methods for predicting protein-protein interactions

Introduction to Bioinformatics Introduction to Bioinformatics

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Rapid speciation following recent host shift in the plant pathogenic fungus Rhynchosporium

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Molecular Evolution & Phylogenetics

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Compartmentalization detection

Copyright 2000 N. AYDIN. All rights reserved. 1

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

C3020 Molecular Evolution. Exercises #3: Phylogenetics

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Copyright (c) 2008 Daniel Huson. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation

Assessing the Performance of Single-Copy Genes for Recovering Robust Phylogenies

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

Theory of Evolution Charles Darwin

1 ATGGGTCTC 2 ATGAGTCTC

Phylogenetic molecular function annotation

Supplementary Information

Comparison and Analysis of Heat Shock Proteins in Organisms of the Kingdom Viridiplantae. Emily Germain, Rensselaer Polytechnic Institute

Tools and Algorithms in Bioinformatics

Evolution by duplication

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Bioinformatics: Network Analysis

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Presentation by Julie Hudson MAT5313

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Lecture 6 Phylogenetic Inference

Transcription:

Using Phylogenomics to Predict Novel Fungal Pathogenicity Genes David DeCaprio, Ying Li, Hung Nguyen (sequenced Ascomycetes genomes courtesy of the Broad Institute)

Phylogenomics Combining whole genome sequences and phylogenetic information to make inferences about gene function Phenotypic differences between related organisms can be explained in terms of genomic differences

Plant Pathogens Magnaporthe grisea Fusarium graminearum Stagonospora nodorum First sequenced pathogen (rice) Wheat blight Worst US pathogen Another wheat blight Other Ascomycete fungi Neurospora crassa Aspergillus nidulans Chaetomium globosum Model organism Model organism Human pathogen

Dean analysis Nature 2005 Identified putative pathogenicity genes in Magnaporthe grisea Found significant expansion in pth11-related gene family in M. grisea vs. N. crassa pth11-class is required for pathogenicity (DeZwann 1999) BUT N. crassa has RIP: all families are small

Testing the Hypothesis Examine pth11-related family in 3 pathogenic species vs. 3 non-pathogenic species If this family is pathogenicity related: Expect to see expansion in pathogens Expect to see no expansion in non-pathogens

Purpose of Phylogenetic Analysis Ralph Dean s M. grisea expansions may not actually be expansions when compared to more data sets and data sets with larger families If pth11 required for pathegenicity (appressorium development), other plantpathogens may also contain expansions yet unidentified

Procedure Obtain gene families for six organisms Pick out pth11-related family Align all genes in family using ClustalW Build phylogenetic tree for family Bootstrap analysis on the tree Examine pathogenic expansions

Find Gene Families Obtain match score for each pair of proteins in all genes in all organisms Obtain an all-to-all (but not identity) match of all proteins in all 6 organisms using NCBI-BLAST with default parameters and an E-value cutoff of 1e-10. Combine all hits between a pair of proteins to obtain score between the pair Select highest scoring non overlapping hits per pair. (coverage > 60% of shorter protein AND average identity > 30%)

Find Gene Famlies (cont.) COG Single Linkage Clustering Each protein initially in its own cluster Merge clusters with BLAST hits between any of their proteins Each cluster is a gene family

Family Analysis Three plant-pathogenic species (S. nodorum, M. grisea, F. graminearum) showed most expansions across all families 316 genes in PTH11 family: F. graminearum: 89 S. nodorum: 87 M. grisea: 49 A. nidulans: 42 C. globosum: 28 N. crassa: 21

Procedure: Phylogenetic Analysis Aligned all 316 sequences in pth11-related family using ClustalW Used Phylip to generate phylogenetic tree (parsimony method) Compared portions of tree that match Ralph Dean s M. grisea vs N. crassa tree Choose subset of family to examine further to test pathegenic expansion hypothesis

Phylogenetic Analysis (background) Tree-building Algorithms: Distance and Nearest Neighbor Joining Maximum Parsimony Maximum Likelihood Sequence order Input sequence order may influence tree found. Solution: randomize (jumble) input

Distance and Neighbor Joining Iteratively join closest (distance-wise) nodes Distance between two sequences = % sites different between them in an alignment Distance between a sequence and a joined node = Average distance between sequence and node

Maximum Parsimony Character-based method Chooses a tree that minimizes the number of mutational events (substitutions) Computationally inexpensive to run

Maximum Likelihood Best accounts for variation in sequences Likelihood L of a tree is the probability of observing the data given the tree L = P(data tree) Search all possible trees for one with highest probability P(data tree) Extremely computationally intensive

Bootstrap Analysis Statistical technique to measure level of confidence in a previously generated tree Resample alignment multiple times and generate a tree for each Consensus tree (each branch chosen if it appears in a majority of resamplings) gives confidence values for each branch of tree

Our Initial Tree Generated using maximum parsimony Did not have enough time for verification maximum likelihood = too computationally intensive bootstrap analysis = computationally intense and too many iterations Difficult to analyze entire tree due to size and complexity

Initial results Part of our tree matched significant portion of Ralph Dean s tree exactly Exact match significant since we aligned genes with genes from 5 other organisms, 2 of which are also pathegenic High confidence in M. grisea expansion

Dean (2005) Courtesy of Ralph A. Dean. Used with permission. Source: Supplementary Figure S1 in Dean R, et. al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434, 980-986 (21 April 2005). Portion of our initial tree (parsimony)

Narrowing the scope Choose a subtree relevant to our hypothesis Confirm structure built by our original tree Examine age of expansions Chose subtree of 33 members: It contains the exact match to Ralph Dean s tree It contains nearby S. globosum and F. graminearum expansions

Subtree Analysis Realigned sequences Built new tree using maximum parsimony (jumbled and ordered), maximum likelihood (ordered), distance and nearest neighbor. Ran bootstrap analysis (parsimony - jumbled and ordered, nearest neighbor) Removed genes placed as outgroups in the new tree

Subtree Analysis Expansion was consistent with pathogenicity High level of confidence in expansions Present in trees from different algorithms Supported by bootstrap trials 30 genes in resulting subtree: F. graminearum: 5 S. nodorum: 13 M. grisea: 8 A. nidulans: 2 C. globosum: 1 N. crassa: 1

CFEM Domain

Domains CFEM domain Present in 23 of 316 sequences in larger family Present 18 contiguous sequences in our tree PFamB_10167 Present in 29 genes of our subtree Unable to test larger family due to time constraints

Species Specific Expansions F. graminaerum M. grisea S. Nodorum

Results Confirmed expansion in M. grisea as reported by Ralph Dean. This gene family expansion was consistent with pathogenicity across 6 organisms Expansions species-specific: each plant pathogen developed expansions independently Predicted 13 putative pathogenicity genes from S. nodorum and 4 in F. graminearum

Weaknesses in our Approach Initial tree was unfiltered: 316 genes were generated using BLAST only. Improvement: filter out those that do not contain the CFEM protein domain or the PFamB_10167 domain We chose our subtree somewhat arbitrarily because we located putative F. graminearum and S. globosum expansions Improvement: given time and a filtered family sequence, analyze all expansions across all genes in family to estimate significance

Weaknesses Cont d: Didn t capture all of Dean s original pth11 family in our family.

References Dean R, et. al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 2005 Apr 21;434(7036):980-6. Kulkarni RD, et. al. Novel G-protein-coupled receptor-like proteins in the plant pathogenic fungus Magnaporthe grisea. Genome Biol. 2005;6(3):R24. Hu G, et. al. A phylogenomic approach to reconstructing the diversification of serine proteases in fungi. J Evol Biol. 2004 Nov;17(6):1204-14. DeZwann TM, et. al. Magnaporthe grisea pth11p is a novel plasma membrane protein that mediates appressorium differentiation in response to inductive substrate cues. Plant Cell. 1999 Oct;11(10):2013-30.

References Dean R, et. al. The genome sequence of the rice blast fungus Magnaporthe grisea. Nature. 2005 Apr 21;434(7036):980-6. DeZwann TM, et. al. Magnaporthe grisea pth11p is a novel plasma membrane protein that mediates appressorium differentiation in response to inductive substrate cues. Plant Cell. 1999 Oct;11(10):2013-30. Kulkarni RD, et. al. Novel G-protein-coupled receptor-like proteins in the plant pathogenic fungus Magnaporthe grisea. Genome Biol. 2005;6(3):R24. Hu G, et. al. A phylogenomic approach to reconstructing the diversification of serine proteases in fungi. J Evol Biol. 2004 Nov;17(6):1204-14. PHYLIP ( U. Washington) http://evolution.genetics.washington.edu/phylip.html CLUSTALW (EMBL-EBI) http://www.ebi.ac.uk/clustalw/ BLAST (NCBI) Broad Institute (MIT) http://www.broad.mit.edu PFAM (Sanger) http://www.sanger.ac.uk/software/pfam