Comparing Genomes! Homologies and Families! Sequence Alignments!
Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species at the genome level! The function of human genes and other regions may be revealed by studying their counterparts in lower organisms! Helps identify both coding and non-coding genes and regulatory elements!
Deletion Mutation ACTGACATGTACCA Sequence edits AC----CATGCACCA Rearrangements Inversion Translocation Duplication
Comparative genomics predicts one long transcript.
Uses all the species! Prediction pipeline: Begins with!!blast and sequence clustering! Compares gene relationships to!species relationships!
Proteins (all species) ---> BLAST ---> group similar proteins Alignments Phylogenetic Trees Reconcile Gene & Species Trees Extract ortholog & Paralog relationships
(1) Load the longest translation of each gene from all species used in Ensembl." (2) Run WUBLASTp+SW of every gene against every other (both self and non-self species) in a genome-wide manner." (3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values." (4) Extract the connected components (=single linkage clusters), each cluster representing a gene family." (5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE." (6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage." (7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree (TreeBeSt)." (8) From each gene tree, infer gene pairwise relations of orthology and paralogy types."
Anopheles gambiae Aedes aegypti Drosophila melanogaster Dasypus novemcinctus Loxodonta africana Echinops telfairi Tupaia belangeri Homo sapiens Pan troglodytes Macaca mulatta Otolemur garnettii Mus musculus Rattus norvegicus Spermophilus tridecemlineatus Cavia porcellus Oryctolagus cuniculus Erinaceus europaeus Myotis lucifugus Canis familiaris Felis catus Bos taurus Monodelphis domestica Ornithorhynchus anatinus Gallus gallus Xenopus tropicalis Gasterosteus aculeatus Oryzias latipes Takifugu rubripes Tetraodon nigroviridis Danio rerio Ciona intestinalis Ciona savignyi Caenorhabditis elegans Saccharomyces cerevisiae
GeneView page! GeneTreeView!
Orthologs : any gene pairwise relation where the ancestor node is a speciation event! Paralogs : any gene pairwise relation where the ancestor node is a duplication event!
ortholog_one2one" ortholog_one2many" ortholog_many2many" apparent_ortholog_one2one" within_species_paralog" between_species_paralog"
What is 1 to 1? What is 1 to many?
How: Cluster proteins for every isoform!! (transcript) in every species.! Why: Predict a function for novel!!! genes/proteins!!! Understand gene relationships!
More than 1,800,000 proteins clustered:! All Ensembl protein predictions from all species supported! 895,070 protein predictions! All metazoan (animal) proteins in UniProt:! 96,030 UniProtKB/Swiss-Prot! 892,0208 UniProtKB/TrEMBL!
BLASTP all-versus-all comparison! Markov clustering! For each cluster:! Calculation of multiple sequence alignments with ClustalW! Assignment of a consensus description!
Link to FamilyView
JalView multiple alignments Ensembl family members within human! Ensembl family members in other species!
Comparing Genomes! Homologies and Families! Sequence alignments!
To identify homologous regions! To spot trouble gene predictions! Conserved regions could be functional! To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)!
Should find all highly similar regions between two sequences! Should allow for segments without similarity, rearrangements etc.! Issues! Heavy process! Scalability, as more and more genomes are sequenced! Time constraint!
Enredo!!( regions Defines orthology map (co-linear Supports segmental duplications! Pecan! Consistency based multiple aligner! Optimized to cope with long DNA sequences! Ortheus! Ancestral sequences reconstructor! Inferring the history of insertion and deletions!
Use all coding exons! Get sets of best reciprocal hits! Create orthology maps! Build multiple global alignments!
In the Detailed View Panel:!
Choose Compara pairwise alignments!
Anchors 500.000 anchors for mammals --- more than 1 anchor per 10Kb Supports segmental duplications!! Covers 90% of the human protein coding genes ( Hsap-Mmus-Rnor-Cfam-Btau )
Human chromosome Orthologues Mouse chromosomes Mouse chromosomes
Syntenic blocks
View Homology in pages such as GeneView, ProtView, SyntenyView, GeneTreeView, or BioMart! View Protein Family information in FamilyView! View Alignments in ContigView, GeneSeqAlign View, through BioMart!
BIOMART