Supporting Online Material for

Size: px
Start display at page:

Download "Supporting Online Material for"

Transcription

1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau, Sylvie Samain, Jean Weissenbach, Philip Avner *To whom correspondence should be addressed. duret@biomserv.univ-lyon1.fr This PDF file includes Materials and Methods SOM Text Figs. S1 and S2 Tables S1 and S2 References and Notes Published 16 June 2006, Science 312, 1653 (2006). DOI: /science

2 Supplementary Material The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene Laurent Duret, Corinne Chureau, Sylvie Samain, Jean Weissenbach, Philip Avner Methods Sequencing. Using standards protocols we screened a BAC library from Monodelphis domestica (LB3 from BAC/PAC Resources of the Children s Hospital Oakland Research Institute). We used probes derived from eutherian Xist sequences, from chicken XicHR genes and from opossum genomic sequences available through the Ensembl web site (1). We isolated one BAC containing the Rasl11c gene and the 5' end of Lnx3 (EMBL accession number AM230660). We amplified and sequenced the Lnx3 cdna from opossum tissue samples of testis (male) and liver (male and female) (EMBL accession number AM230659). Searching for homologous genes. Homologs of protein-coding genes were searched with BLASTP (2) against the Ensembl protein annotations of complete vertebrate genomes (Ensembl release 34 (1)). This data set includes 12 species: six eutherian mammals (Bos taurus, Canis familiaris, Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus), one marsupial (the opossum Monodelphis domestica), one bird (Gallus gallus), one amphibian (Xenopus tropicalis), and three teleost 1

3 fishes (Danio rerio, Fugu rubripes, Tetraodon nigroviridis). For many of these genomes, the processes of sequencing, assembling and annotation are not totally finished. Thus, with BLASTP, one may miss homologs that have not been annotated yet, or that are present in genomic sequences that have not been incorporated in the assembly. We therefore used TBLASTN to search for homologs of the chicken Lnx3 protein gene within these 12 genomes, plus the draft assembly of the elephant genome (Loxodonta africana, build BROADE1), from the opossum shotgun sequences (41,343,661 sequence reads; 36.5 Gb) and from the monotreme platypus (Ornithorhynchus anatinus) (27,935,986 sequence reads; 21.3 Gb) (available at the NCBI trace archive: ftp://ftp.ncbi.nih.gov/pub/tracedb/). All these data sets were also used to search for homologs of non-coding RNA genes (Xist, Jpx and Ftx) with BLASTN. The protein-coding genes from XicHR belong to multigenic families. To distinguish between orthologs and paralogs, we computed phylogenetic trees (neighbor joining method, with poisson correction) for each gene family. We selected for phylogenetic analyses all homologs having a BLASTP score greater or equal to the score of the closest non-vertebrate homolog (tunicate, insect or nematode). Duplicates or sequences that were too short (due to incomplete gene prediction) were removed from the data set. In all cases, groups of orthologs of the chicken XicHR genes were supported by strong bootstrap values (> 90%). In eutherians we found paralogs of the XicHR genes, resulting from ancient duplications predating the divergence between fishes and tetrapods (Table S1). However, we did not find any orthologue of the XicHR genes, which indicates that they have been lost from the genome of eutherians or are too diverged to be recognizable by BLAST. In eutherians, besides Xist, the Xic region contains two RNA genes (Jpx and Ftx) and two protein-coding genes (Tsx and Cnbp2) (3) (Fig. 2a). With BLAST, we failed to detect any homolog of Tsx, Jpx and Ftx genes in non-eutherian vertebrates. However, genomic 2

4 alignments with SIM demontrated that Tsx is a truncated ortholog of Fip1l2 (see main text and below). Jpx and Ftx, are found in the same genomic interval and in the same orientation as UspL and Wave4 (Fig 2a). This suggests that like Xist, they may derive from proteincoding genes. Cnbp2 is a retrotransposed gene deriving from the Cnbp autosomal gene (3). We identified orthologs of Cnbp2 in human, mouse and cow that are located at the same position in the Xic locus. Phylogenetic analyses indicate that Cnbp exists in all vertebrates but that Cnbp2 is specific of eutherians. Genomic alignments. Repeated elements were masked from genomic sequences with RepeatMasker (Smit, A. F. A. and Green, P., unpublished), using taxon-specific transposable element data sets from Repbase Update (4). Pairwise local alignments between genomic sequences were then computed with SIM (5) using default parameters (match = 1, mismatch = -1, gap opening penalty = 6, gap extension penalty = 0.2). The SIM local alignment software (5) is based on the Smith & Waterman algorithm, that is slow but more sensitive than BLAST. For each pair of sequences, we searched the 300 best local alignments (SIM parameter k=300). The value of this parameter was tuned so as to be sure that no significant match would be excluded by SIM. Hence, a large majority of these local alignments correspond to non-significant similarities that occur by chance between any non-related sequences. Then, to reduce the number of such random matches, we selected among these 300 local alignments the best combination of hits occuring in the same order and orientation in both sequences (using LALNVIEW (6)). By imposing this constraint of conserved order and orientation, we increased the specificity of the homology searches. Typically, less than 10% of the 300 best local alignments are retained after this filtering. The comparison of human and chicken XicHR 3

5 sequences revealed 22 alignments. Eight of them overlap known exons in chicken, among which five also correspond to exons in human (NB: we considered that there was an overlap if at least 33% of the length of the exon is covered by the alignment). Some weak similarities may occur by chance between unrelated sequences. To compute the probability that such random sequence matches overlap known exons, we performed simulations. For each species, the position of each of the 22 alignments was randomly chosen along the genomic sequence (excluding the positions masked by RepeatMasker), and we counted the number of alignments overlapping exons. We performed 10 8 simulations to get the distribution of the number of random matches overlapping exons in each species, and inferred the probability to observe by chance 8 overlaps in chicken or 5 overlaps in human. There are two Xist exons that show similarity with two Lnx3 exons. The probability that random matches overlap two exons in both species is the product of the probability to overlap two exons in each species, divided by two (to take into account the fact they were found in the same order in both species). Supplementary discussion New function by loss of function? Does the loss of protein-coding capacities of Lnx3 in eutherians have any link with chromosome inactivation? Lnx3 is conserved in all vertebrate classes and like its paralogs Lnx1 and Lnx2, encodes a protein containing one RING type E3 ubiquitin ligase domain, one NPXY binding motif and four PDZ protein-interaction domains. Lnx1 and Lnx2 are thought to regulate the Notch (and/or ErbB2) signalling by targeting Numb (ErbB2) for degradation through the proteasome pathway(7, 8). Whilst the exons conserved in Xist correspond to two 4

6 of the PDZ motifs, these Xist exons contain frameshift mutations (Fig. 3). Unravelling further the precise biological function of Lnx3 in non-eutherian species, although beyond the scope of the present work, may be useful in understanding the evolution of X inactivation. Pseudogenization of protein-genes flanking Xist An intriguing issue is the coincident loss of protein-coding function in Lnx3 (to become Xist) with the loss of function of four other protein genes in the XicHR: Fip1l2, Rasl11c, UspL and Wave4. One possible explanation for this simultaneous loss of protein-coding function of these genes is that the expression of Xist, that has an activity of heterochromatinization in cis, might be incompatible with the correct regulation of neighbouring genes. Thus, the activity of Xist might have precluded the proper expression of flanking genes, thereby leading them to pseudogenization. This model implies that the advantages conferred by the expression of Xist were strong enough to counterbalance the deleterious effects of silencing four neighbor genes (that are well conserved in all other vertebrates). An alternative explanation would be that these five XicHR genes participate in a single process, separate from sex chromosome inactivation, that is no longer required in eutherians. Presently, little is known about the precise function of these genes. Fip1l2 is homologous to Fip1, a subunit of the cleavage and polyadenylation specific factor(9). Rasl11c belongs to the Ras family of small GTPases(10), and UspL to a large family of ubiquitin-specific proteases(11). Wave proteins are involved in the regulation of actin polymerization(12). It is interesting to note that Lnx proteins have a ubiquitin ligase activity, whereas Usp proteins are deubiquitylating enzymes. Thus one might imagine that UspL regulates the activity of Lnx3 by removing polyubiquitin from its target proteins, rescuing them from degradation by the 5

7 proteasome. This model is very speculative, and there is at present no evidence to suggest that the XicHR genes are involved in the same function. Finally, it is also possible that the pseudogenization of the XicHR genes is the direct consequence of the invasion of the Xic locus by transposable elements. Such high transposition activity might result from intragenomic conflicts at imprinted loci (13). References: 1. E. Birney et al., Nucleic Acids Res 34, D556 (2006). 2. S. F. Altschul et al., Nucleic Acids Res. 25, 3389 (1997). 3. C. Chureau et al., Genome. Res. 12, 894 (2002). 4. J. Jurka, Trends. Genet. 16, 418 (2000). 5. X. Huang, W. Miller, Advances in Applied Mathematics 12, 337 (1991). 6. L. Duret, E. Gasteiger, G. Perriere, Comput. Appl. Biosci. 12, 507 (1996). 7. D. S. Rice, G. M. Northcutt, C. Kurschner, Mol Cell Neurosci 18, 525 (2001). 8. P. Young et al., Mol Cell Neurosci 30, 238 (2005). 9. I. Kaufmann, G. Martin, A. Friedlein, H. Langen, W. Keller, Embo J 23, 616 (2004). 10. R. Louro et al., Biochem Biophys Res Commun 316, 618 (2004). 11. V. Quesada et al., Biochem Biophys Res Commun 314, 54 (2004). 12. T. E. Stradal et al., Trends Cell Biol 14, 303 (2004). 13. J. F. Wilkins, Trends Genet 21, 356 (2005). Supplementary figures and tables. 6

8 (a) Chic1 Lnx3 Uspl Xpct Fip1l2 Rasl Wave4 Chicken Human Chic1 Tsx Xist Jpx Ftx Xpct (pseudo) (b) Lnx3 Rasl11c Chicken Dog Xist Protein-coding exon Non-coding RNA exon Repeat sequence (SINE, LINE,...) Figure S1: Comparison of the chicken XicHR with the human and dog Xic region. Genomic sequences were first analyzed with RepeatMasker to identify and mask repeated elements and then aligned with SIM. The best combination of local alignments in consistent order and orientation is displayed (see Methods). (a) Global view of the whole chicken/human alignment. (b) Zoom on the Lnx3/Xist and Rasl11c region in the chicken/dog alignment. Positions are indicated in bp.

9 Mouse Rat Cow Dog Chimpanzee Human Opossum Chicken 97 Fugu Tetraodon 99 Rat Cow Dog Chimpanzee Human Opossum Chicken Mouse Lnx2 Chicken Xenopus Xenopus Fugu Tetraodon Fugu Opossum Lnx1 Lnx3 Figure S2: Phylogenetic tree of the Lnx gene family. The protein alignment comprised 478 sites (after exclusion of unreliable parts of the alignment). The tree was computed with Phyml(1), using the JTT model, with four categories of substitution rate and estimated gamma distribution parameter and estimated proportion of invariable sites. The scale of branch lengths is indicated (number of substitutions per site). Bootstrap values larger than 50% (after 500 replicates) are displayed. Lnx3 protein has been evolving three times faster than its chicken ortholog, which is suggestive of functional changes. Comparison with the chicken gene shows no frameshifts or non-sense mutations, and reveals a low ratio of non-synonymous over synonymous substitution (0.15). This shows that the opossum Lnx3, although rapidly evolving, is subject to purifying selection. Ensembl or EMBL gene identifiers: Cow (Bos taurus): ENSBTAG (Lnx1), ENSBTAG (Lnx2). Dog (Canis familiaris): ENSCAFG (Lnx1), ENSCAFG (Lnx2). Fugu (Fugu rubripes): SINFRUG (Lnx1), SINFRUG (Lnx2), SINFRUG (Lnx3). Chicken (Gallus gallus): ENSGALG (Lnx1), ENSGALG (Lnx2), ENSGALG (Lnx3). Human (Homo sapiens): ENSG (Lnx1), ENSG (Lnx2). Mouse (Mus musculus): ENSMUSG (Lnx1), ENSMUSG (Lnx2). Chimpanzee (Pan troglodytes): ENSPTRG (Lnx1), ENSPTRG (Lnx2). Rat (Rattus norvegicus): ENSRNOG (Lnx1), ENSRNOG (Lnx2). Tetraodon (Tetraodon nigroviridis): GSTENG (Lnx1), GSTENG (Lnx2). Xenopus (Xenopus tropicalis): ENSXETG (Lnx1), ENSXETG (Lnx3). Opossum (Monodelphis domestica): ENSMODG (Lnx1), ENSMODG (Lnx2), AM (Lnx3). (1) S. Guindon, O. Gascuel, Syst Biol 52, 696 (Oct, 2003).

10 Table S1: Orthologs and paralogs of XicHR protein genes in vertebrates. Chicken protein genes located in the Xic homologous region were compared against all Ensembl protein predictions with BLASTP. Ensembl gene identifiers of homologous genes are indicated in the table. Groups of orthologous genes were determined by phylogenetic analyses, and are surrounded by a rectangle in the table. Most of genes are found in four linkage group. The symbol <> indicates that genes are linked on the same scaffold in the genome assembly. The symbol # indicates a gap in the genome assembly. (a) Gene symbols correspond to the human gene nomenclature, except for (b) that do not exist in human and were named on the basis of their phylogenetic relationship with their closest homologs. (c) The gene was absent from Ensembl annotations, but was found in the genomic contig with TBLASTN. (d) The human Chic1 gene is missing from Ensembl annotations but is present in the genomic sequence and described in EMBL (accession number AL358796) and Uniprot (accession number Q5JSZ4). (e) Recent paralog resulting from a duplication in the rodent lineage (f) Recent paralog resulting from a duplication in the primate lineage. (g) EMBL accession number. It should be noticed that in chicken, the XicHR region is located on an autosome (chromosome 4), and hence can not be directly involved in a process similar to the X- inactivation of eutherians. Interestingly, most of the paralogs the chicken XicHR genes are also found in close linkage. The four paralogons (located respectively on chromosomes X, 4, 6 and 13 in human) most probably result from the two whole genome duplications that occured early in the evolution of vertebrates Dehal, P. & Boore, J. L. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3, e314 (2005).

11 Chrom. Cdx family Chic family Fip1 family Lnx family Rasl11 family Usp family Wave family Xpct family number Linkage group 1 (Xic, XicHR ): Gene symbol (a) Cdx4 Chic1 Fip1l2 (b) Lnx3 (b) Rasl11c (b) UspL (b) Wave4 (b) Xpct (SLC16A2 ) Zebrafish ENSDARG <> Hit TBLASTN (c) <> <> Hit TBLASTN (c) # # ENSDARG # ENSDARG # ENSDARG Fugu SINFRUG <> SINFRUG <> <> SINFRUG # SINFRUG <> SINFRUG # # SINFRUG Tetraodon # GSTENG Xenopus ENSXETG <> ENSXETG <> Hit TBLASTN (c) <> ENSXETG <> ENSXETG <> ENSXETG <> ENSXETG <> ENSXETG Chicken 4 ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG Opossum ENSMODG <> ENSMODG # # AM230659, AM (g) <> AM (g) # # ENSMODG # ENSMODG Dog X ENSCAFG <> Hit TBLASTN (c) <> <> <> <> <> <> ENSCAFG Mouse X ENSMUSG <> ENSMUSG <> <> <> <> <> <> ENSMUSG Human X ENSG <> AL (d) <> <> <> <> <> <> ENSG Linkage group 2 Gene symbol (a) Cdx2 Lnx2 Rasl11a Usp12 Wave3 (Wasf3) Zebrafish # ENSDARG <> ENSDARG Fugu # SINFRUG <> <> SINFRUG <> SINFRUG Tetraodon # GSTENG <> <> GSTENG <> GSTENG Xenopus # ENSXETG <> ENSXETG <> ENSXETG Chicken 1 # ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG Opossum ENSMODG <> <> ENSMODG <> ENSMODG <> ENSMODG <> ENSMODG Dog 25 ENSCAFG <> <> ENSCAFG <> ENSCAFG <> ENSCAFG <> ENSCAFG Mouse 5 ENSMUSG <> <> ENSMUSG <> ENSMUSG <> ENSMUSG <> ENSMUSG Human 13 ENSG <> <> ENSG <> ENSG <> ENSG <> ENSG Mouse 7 ENSMUSG (e) Linkage group 3 Gene symbol (a) Wave1 (Wasf1) SLC16A10 Zebrafish # ENSDARG Fugu SINFRUG # SINFRUG Tetraodon Xenopus ENSXETG <> ENSXETG Chicken 3 ENSGALG <> ENSGALG Opossum Dog 12 ENSCAFG <> ENSCAFG Mouse 10 ENSMUSG <> ENSMUSG Human 6 ENSG <> ENSG Linkage group 4 Gene symbol (a) Chic2 Fip1l1 (Fip1) Lnx1 (Lnx) Rasl11b Usp46 Zebrafish ENSDARG <> ENSDARG # # ENSDARG # Fugu SINFRUG # SINFRUG # SINFRUG # SINFRUG <> SINFRUG Tetraodon # GSTENG # GSTENG <> GSTENG <> GSTENG Xenopus ENSXETG <> ENSXETG <> ENSXETG <> ENSXETG # Chicken 4 ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG <> ENSGALG Opossum ENSMODG <> ENSMODG <> ENSMODG <> ENSMODG <> ENSMODG Dog 13 ENSCAFG <> <> ENSCAFG <> <> ENSCAFG Mouse 5 ENSMUSG <> ENSMUSG <> ENSMUSG <> ENSMUSG <> ENSMUSG Human 4 ENSG <> ENSG <> ENSG <> ENSG <> ENSG Linkage group 5 Gene symbol (a) Wave2 (Wasf2) Zebrafish Fugu SINFRUG Tetraodon Xenopus ENSXETG Chicken Opossum Dog 2 ENSCAFG Mouse 4 ENSMUSG Human 1 ENSG Human X ENSG (f) Linkage group 6 Gene symbol (a) Cdx1 Zebrafish Fugu Tetraodon Xenopus Chicken 13 ENSGALG Opossum Dog Mouse 18 ENSMUSG Human 5 ENSG TABLE S1

12 Table S2: list of alignments identified between chicken XicHR and eutherian Xic genomic sequences Position in Chicken Chicken Eutherian Species Start End Length Score %identity annotation annotation Dog non-coding region Dog non-coding region Human non-coding region Cow non-coding region Mouse non-coding region Human Fip1l exon1 Tsx exon 4 Mouse Fip1l exon1 Tsx exon 4 Human Fip1l exon2 Tsx exon 5 Mouse Fip1l exon2 Tsx exon 5 Cow non-coding region Mouse non-coding region Human Fip1l exon3 Tsx exon 6 Mouse non-coding region Human Fip1l exon5 Human Fip1l exon8 Human non-coding region Human non-coding region Human Fip1l exon12 Cow non-coding region Mouse non-coding region Cow non-coding region Mouse non-coding region Human Lnxl exon9 Xist exon h5/m6 Cow non-coding region Cow non-coding region Mouse non-coding region Cow non-coding region Mouse non-coding region Dog Lnxl exon3 Xist exon h4/m4 Human Lnxl exon3 Xist exon h4/m4 Mouse non-coding region Human non-coding region within Xist exon 1 Dog non-coding region Xist (by similarity) Dog non-coding region Xist (by similarity) Human non-coding region Human non-coding region Dog non-coding region Human non-coding region Human non-coding region Dog non-coding region Human non-coding region Mouse non-coding region Dog Rasl exon4 Cow Rasl exon4 Mouse non-coding region Dog Rasl exon3 Cow Rasl exon3 Dog Rasl exon2 Cow Rasl exon2 Cow Rasl exon1 Dog Rasl exon1 Human non-coding region Human non-coding region Cow non-coding region Dog non-coding region Dog non-coding region Cow non-coding region Cow non-coding region Cow non-coding region Human non-coding region Cow non-coding region Mouse non-coding region Dog non-coding region Cow non-coding region Mouse non-coding region Mouse non-coding region Cow non-coding region Human non-coding region Dog non-coding region Cow non-coding region Mouse non-coding region Dog non-coding region Dog non-coding region Mouse non-coding region Human non-coding region Dog non-coding region Mouse non-coding region

Comparing Genomes! Homologies and Families! Sequence Alignments!

Comparing Genomes! Homologies and Families! Sequence Alignments! Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species

More information

Supplemental Figure 1.

Supplemental Figure 1. Supplemental Material: Annu. Rev. Genet. 2015. 49:213 42 doi: 10.1146/annurev-genet-120213-092023 A Uniform System for the Annotation of Vertebrate microrna Genes and the Evolution of the Human micrornaome

More information

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier Master Biomedizin 2018 1) UCSC & UniProt 2) Homology 3) MSA 4) 1 12 a. All of the sequences in file1.fasta (https://cbdm.uni-mainz.de/mb18/) are homologs. How many groups of orthologs would you say there

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different

More information

Biased amino acid composition in warm-blooded animals

Biased amino acid composition in warm-blooded animals Biased amino acid composition in warm-blooded animals Guang-Zhong Wang and Martin J. Lercher Bioinformatics group, Heinrich-Heine-University, Düsseldorf, Germany Among eubacteria and archeabacteria, amino

More information

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny Phylogenetics and chromosomal synteny of the GATAs 1273 GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny CHUNJIANG HE, HANHUA CHENG* and RONGJIA ZHOU* Department

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Title slide (1) Tree of life 1891 Ernst Haeckel, Title on left

Title slide (1) Tree of life 1891 Ernst Haeckel, Title on left MDIBL talk July 14, 2005 The Evolution of Cytochrome P450 in animals. Title slide (1) Tree of life 1891 Ernst Haeckel, Title on left My opening slide is a collage (2) containing 35 eukaryotic species with

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that

More information

Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

More information

Vertebrate genome sequencing: building a backbone for comparative genomics

Vertebrate genome sequencing: building a backbone for comparative genomics 104 Forum Web Watch Vertebrate genome sequencing: building a backbone for comparative genomics James W. Thomas and Jeffrey W. Touchman The human genome sequence provides a reference point from which we

More information

Browsing Genomic Information with Ensembl Plants

Browsing Genomic Information with Ensembl Plants Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila

More information

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species Zhiqiang Li 1 and Peter Z. Revesz 1,a 1 Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE,

More information

Small RNA in rice genome

Small RNA in rice genome Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and

More information

The MANTiS Manual. Contents. MANTiS Version 1.1

The MANTiS Manual. Contents. MANTiS Version 1.1 The MANTiS Manual MANTiS Version 1.1 Contents Connection to the MANTiS database... 2 Memory settings... 2 Main functionalities... 2 Character Mapping View... 4 Genome content View... 5 Biological processes

More information

Fixation of Deleterious Mutations at Critical Positions in Human Proteins

Fixation of Deleterious Mutations at Critical Positions in Human Proteins Fixation of Deleterious Mutations at Critical Positions in Human Proteins Author Sankarasubramanian, Sankar Published 2011 Journal Title Molecular Biology and Evolution DOI https://doi.org/10.1093/molbev/msr097

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Chromosomal mapping, differential origin and evolution of the S100 gene family

Chromosomal mapping, differential origin and evolution of the S100 gene family Genet. Sel. Evol. 40 (2008) 449 464 INRA, EDP Sciences, 2008 DOI: 10.1051/gse:2008013 Available online at: www.gse-journal.org Original article Chromosomal mapping, differential origin and evolution of

More information

Example of Function Prediction

Example of Function Prediction Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An

More information

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Leo Goodstadt *, Chris P. Ponting Medical Research Council Functional Genetics Unit, University of Oxford, Department

More information

Bioinformatics and BLAST

Bioinformatics and BLAST Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists

More information

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences Xiaoyu Chen Martin Tompa Department of Computer Science and Engineering Department of Genome Sciences

More information

Multiple Alignment of Genomic Sequences

Multiple Alignment of Genomic Sequences Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

More information

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information # Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Details of PRF Methodology In the Poisson Random Field PRF) model, it is assumed that non-synonymous mutations at a given gene are either

More information

Quantitative and qualitative analyses. of in-paralogs

Quantitative and qualitative analyses. of in-paralogs Quantitative and qualitative analyses of in-paralogs Dissertation zur Erlangung des naturwissentschaflichen Doktorgrades der Bayerischen Julius-Maximilians Universität Würzburg vorgelegt von Stanislav

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

DUPLICATED RNA GENES IN TELEOST FISH GENOMES DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary

More information

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi) Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction Lesser Tenrec (Echinops telfairi) Goals: 1. Use phylogenetic experimental design theory to select optimal taxa to

More information

Evolution by duplication

Evolution by duplication 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.

More information

A Browser for Pig Genome Data

A Browser for Pig Genome Data A Browser for Pig Genome Data Thomas Mailund January 2, 2004 This report briefly describe the blast and alignment data available at http://www.daimi.au.dk/ mailund/pig-genome/ hits.html. The report describes

More information

The African coelacanth genome provides insights into tetrapod evolution

The African coelacanth genome provides insights into tetrapod evolution The African coelacanth genome provides insights into tetrapod evolution bioinformaatika ajakirjaklubi 27.05.2013 Ülesehitus Täisgenoomi sekveneerimisest vankrid mille ette neid andmeid on rakendatud evolutsiooni

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE 142 Part 5 Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE Golovnina K. *1, Blinov A. 1, Chang L.-S. 2 1 Institute of Cytology and

More information

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons Letter to the Editor Slow Rates of Molecular Evolution Temperature Hypotheses in Birds and the Metabolic Rate and Body David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons *Department

More information

Exploring evolution of brain genes involved in microcephaly through phylogeny and synteny analysis

Exploring evolution of brain genes involved in microcephaly through phylogeny and synteny analysis Rauf and Mir Theoretical Biology and Medical Modelling 2013, 10:61 RESEARCH Open Access Exploring evolution of brain genes involved in microcephaly through phylogeny and synteny analysis Sobiah Rauf and

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

mosaic: Supplementary material

mosaic: Supplementary material mosaic: Supplementary material Stefan R. Maetschke, Karin S. Kassahn, Jasmyn A. Dunn, Siew P. Han, Eva Z. Curley, Katryn J. Stacey, Mark A. Ragan January 14, 2010 1 1 Analysis of toll-like receptors TLR1

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

Genome Sequencing & DNA Sequence Analysis

Genome Sequencing & DNA Sequence Analysis 7.91 / 7.36 / BE.490 Lecture #1 Feb. 24, 2004 Genome Sequencing & DNA Sequence Analysis Chris Burge What is a Genome? A genome is NOT a bag of proteins What s in the Human Genome? Outline of Unit II: DNA/RNA

More information

Understanding relationship between homologous sequences

Understanding relationship between homologous sequences Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective

More information

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00. Promoters and Enhancers Systematic discovery of transcriptional regulatory motifs

More information

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex Kimberly Baer *, David McClellan Department of Integrative Biology, Brigham Young University,

More information

Exploring Evolution & Bioinformatics

Exploring Evolution & Bioinformatics Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage 8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage Chris M. Rands 1, Stephen Meader 1, Chris P. Ponting 1 *, Gerton Lunter 2

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Toni Gabaldón Contact: tgabaldon@crg.es Group website: http://gabaldonlab.crg.es Science blog: http://treevolution.blogspot.com

More information

A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes

A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes Hezroni et al. Genome Biology (2017) 18:162 DOI 10.1186/s13059-017-1293-0 RESEARCH Open Access A subset of conserved mammalian long non-coding s are fossils of ancestral protein-coding genes Hadas Hezroni,

More information

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class

More information

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development Supplementary Information: Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development Krishanpal Karmodiya 1, Krishanpal Anamika 1,2, Vijaykumar

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

Synonymous Codon Substitution Matrices

Synonymous Codon Substitution Matrices Synonymous Codon Substitution Matrices Adrian Schneider, Gaston H. Gonnet, and Gina M. Cannarozzi Computational Biology Research Group, Institute for Computational Science, ETH Zürich, Universitätstrasse

More information

Sequence Database Search Techniques I: Blast and PatternHunter tools

Sequence Database Search Techniques I: Blast and PatternHunter tools Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Conserved noncoding elements (CNEs) represent 3.5% of

Conserved noncoding elements (CNEs) represent 3.5% of A family of conserved noncoding elements derived from an ancient transposable element Xiaohui Xie*, Michael Kamal*, and Eric S. Lander* *Broad Institute of Massachusetts Institute of Technology and Harvard

More information

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family Jieming Shen 1,2 and Hugh B. Nicholas, Jr. 3 1 Bioengineering and Bioinformatics Summer

More information

Major Gene Families in Humans and Their Evolutionary History Prof. Yoshihito Niimura Prof. Masatoshi Nei

Major Gene Families in Humans and Their Evolutionary History Prof. Yoshihito Niimura Prof. Masatoshi Nei Major Gene Families in Humans Yoshihito Niimura Tokyo Medical and Dental University and Masatoshi Nei Pennsylvania State University 1 1. Multigene family Contents 2. Olfactory receptors (ORs) 3. OR genes

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

NIH Public Access Author Manuscript Immunogenetics. Author manuscript; available in PMC 2006 May 31.

NIH Public Access Author Manuscript Immunogenetics. Author manuscript; available in PMC 2006 May 31. NIH Public Access Author Manuscript Published in final edited form as: Immunogenetics. 2005 April ; 57(1-2): 151 157. Origin and evolution of the Ig-like domains present in mammalian leukocyte receptors:

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

WHAT fraction of new mutations in the genome are

WHAT fraction of new mutations in the genome are Copyright Ó 2011 by the Genetics Society of America DOI: 10.1534/genetics.110.124073 Inference of Mutation Parameters and Selective Constraint in Mammalian Coding Sequences by Approximate Bayesian Computation

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

More information

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss Methods Identification of orthologues, alignment and evolutionary distances A preliminary set of orthologues was

More information

The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes

The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes MBE Advance Access published July, 00 1 1 1 1 1 1 1 1 The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes Kousuke

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1. Schematic pipeline for single-cell genome assembly, cleaning and annotation. a. The assembly process was optimized to account for multiple cells putatively

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13 Visit to BPRC Adres: Lange Kleiweg 161, 2288 GJ Rijswijk Utrecht CS à Den Haag CS 9:44 Spoor 9a, arrival 10:22 Den Haag CS à Delft 10:28 Spoor 1, arrival 10:44 10:48 Delft Voorzijde à Bushalte TNO/Lange

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

Session 5: Phylogenomics

Session 5: Phylogenomics Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree

More information

1 Introduction. Abstract

1 Introduction. Abstract CBS 530 Assignment No 2 SHUBHRA GUPTA shubhg@asu.edu 993755974 Review of the papers: Construction and Analysis of a Human-Chimpanzee Comparative Clone Map and Intra- and Interspecific Variation in Primate

More information

Application of new distance matrix to phylogenetic tree construction

Application of new distance matrix to phylogenetic tree construction Application of new distance matrix to phylogenetic tree construction P.V.Lakshmi Computer Science & Engg Dept GITAM Institute of Technology GITAM University Andhra Pradesh India Allam Appa Rao Jawaharlal

More information

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018 DATA ACQUISITION FROM BIO-DATABASES AND BLAST Natapol Pornputtapong 18 January 2018 DATABASE Collections of data To share multi-user interface To prevent data loss To make sure to get the right things

More information

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants

More information