Supporting Online Material for

Similar documents
Comparing Genomes! Homologies and Families! Sequence Alignments!

Supplemental Figure 1.

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Biased amino acid composition in warm-blooded animals

GATA family of transcription factors of vertebrates: phylogenetics and chromosomal synteny

Hands-On Nine The PAX6 Gene and Protein

BLAST. Varieties of BLAST

Title slide (1) Tree of life 1891 Ernst Haeckel, Title on left

C3020 Molecular Evolution. Exercises #3: Phylogenetics

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Basic Local Alignment Search Tool

Vertebrate genome sequencing: building a backbone for comparative genomics

Browsing Genomic Information with Ensembl Plants

1 ATGGGTCTC 2 ATGAGTCTC

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Cubic Spline Interpolation Reveals Different Evolutionary Trends of Various Species

Small RNA in rice genome

The MANTiS Manual. Contents. MANTiS Version 1.1

Fixation of Deleterious Mutations at Critical Positions in Human Proteins

SUPPLEMENTARY INFORMATION

Chromosomal mapping, differential origin and evolution of the S100 gene family

Example of Function Prediction

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Annotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait

Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human

Bioinformatics and BLAST

Supplementary text and figures: Comparative assessment of methods for aligning multiple genome sequences

Multiple Alignment of Genomic Sequences

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Quantitative and qualitative analyses. of in-paralogs

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Evolution by duplication

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

A Browser for Pig Genome Data

The African coelacanth genome provides insights into tetrapod evolution

Computational approaches for functional genomics

Chapter # EVOLUTION AND ORIGIN OF NEUROFIBROMIN, THE PRODUCT OF THE NEUROFIBROMATOSIS TYPE 1 (NF1) TUMOR-SUPRESSOR GENE

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Exploring evolution of brain genes involved in microcephaly through phylogeny and synteny analysis

Genomes and Their Evolution

mosaic: Supplementary material

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Comparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Genome Sequencing & DNA Sequence Analysis

Understanding relationship between homologous sequences

Transcription Regulation and Gene Expression in Eukaryotes FS08 Pharmacenter/Biocenter Auditorium 1 Wednesdays 16h15-18h00.

Molecular Coevolution of the Vertebrate Cytochrome c 1 and Rieske Iron Sulfur Protein in the Cytochrome bc 1 Complex

Exploring Evolution & Bioinformatics

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

8.2% of the Human Genome Is Constrained: Variation in Rates of Turnover across Functional Element Classes in the Human Lineage

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

BIOINFORMATICS LAB AP BIOLOGY

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Camello, a novel family of Histone Acetyltransferases that acetylate histone H4 and is essential for zebrafish development

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Synonymous Codon Substitution Matrices

Sequence Database Search Techniques I: Blast and PatternHunter tools

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Pairwise & Multiple sequence alignments

Homology and Information Gathering and Domain Annotation for Proteins

8/23/2014. Phylogeny and the Tree of Life

Genomics and bioinformatics summary. Finding genes -- computer searches

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Conserved noncoding elements (CNEs) represent 3.5% of

A bioinformatics approach to the structural and functional analysis of the glycogen phosphorylase protein family

Major Gene Families in Humans and Their Evolutionary History Prof. Yoshihito Niimura Prof. Masatoshi Nei

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

NIH Public Access Author Manuscript Immunogenetics. Author manuscript; available in PMC 2006 May 31.

SUPPLEMENTARY INFORMATION

WHAT fraction of new mutations in the genome are

Tools and Algorithms in Bioinformatics

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes

Supplementary Information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

BIOINFORMATICS: An Introduction

Comparative genomics: Overview & Tools + MUMmer algorithm

Session 5: Phylogenomics

1 Introduction. Abstract

Application of new distance matrix to phylogenetic tree construction

DATA ACQUISITION FROM BIO-DATABASES AND BLAST. Natapol Pornputtapong 18 January 2018

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

Transcription:

www.sciencemag.org/cgi/content/full/312/5780/1653/dc1 Supporting Online Material for The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene Laurent Duret,* Corinne Chureau, Sylvie Samain, Jean Weissenbach, Philip Avner *To whom correspondence should be addressed. E-mail: duret@biomserv.univ-lyon1.fr This PDF file includes Materials and Methods SOM Text Figs. S1 and S2 Tables S1 and S2 References and Notes Published 16 June 2006, Science 312, 1653 (2006). DOI: 10.1126/science.1126316

Supplementary Material The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene Laurent Duret, Corinne Chureau, Sylvie Samain, Jean Weissenbach, Philip Avner Methods Sequencing. Using standards protocols we screened a BAC library from Monodelphis domestica (LB3 from BAC/PAC Resources of the Children s Hospital Oakland Research Institute). We used probes derived from eutherian Xist sequences, from chicken XicHR genes and from opossum genomic sequences available through the Ensembl web site (1). We isolated one BAC containing the Rasl11c gene and the 5' end of Lnx3 (EMBL accession number AM230660). We amplified and sequenced the Lnx3 cdna from opossum tissue samples of testis (male) and liver (male and female) (EMBL accession number AM230659). Searching for homologous genes. Homologs of protein-coding genes were searched with BLASTP (2) against the Ensembl protein annotations of complete vertebrate genomes (Ensembl release 34 (1)). This data set includes 12 species: six eutherian mammals (Bos taurus, Canis familiaris, Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus), one marsupial (the opossum Monodelphis domestica), one bird (Gallus gallus), one amphibian (Xenopus tropicalis), and three teleost 1

fishes (Danio rerio, Fugu rubripes, Tetraodon nigroviridis). For many of these genomes, the processes of sequencing, assembling and annotation are not totally finished. Thus, with BLASTP, one may miss homologs that have not been annotated yet, or that are present in genomic sequences that have not been incorporated in the assembly. We therefore used TBLASTN to search for homologs of the chicken Lnx3 protein gene within these 12 genomes, plus the draft assembly of the elephant genome (Loxodonta africana, build BROADE1), from the opossum shotgun sequences (41,343,661 sequence reads; 36.5 Gb) and from the monotreme platypus (Ornithorhynchus anatinus) (27,935,986 sequence reads; 21.3 Gb) (available at the NCBI trace archive: ftp://ftp.ncbi.nih.gov/pub/tracedb/). All these data sets were also used to search for homologs of non-coding RNA genes (Xist, Jpx and Ftx) with BLASTN. The protein-coding genes from XicHR belong to multigenic families. To distinguish between orthologs and paralogs, we computed phylogenetic trees (neighbor joining method, with poisson correction) for each gene family. We selected for phylogenetic analyses all homologs having a BLASTP score greater or equal to the score of the closest non-vertebrate homolog (tunicate, insect or nematode). Duplicates or sequences that were too short (due to incomplete gene prediction) were removed from the data set. In all cases, groups of orthologs of the chicken XicHR genes were supported by strong bootstrap values (> 90%). In eutherians we found paralogs of the XicHR genes, resulting from ancient duplications predating the divergence between fishes and tetrapods (Table S1). However, we did not find any orthologue of the XicHR genes, which indicates that they have been lost from the genome of eutherians or are too diverged to be recognizable by BLAST. In eutherians, besides Xist, the Xic region contains two RNA genes (Jpx and Ftx) and two protein-coding genes (Tsx and Cnbp2) (3) (Fig. 2a). With BLAST, we failed to detect any homolog of Tsx, Jpx and Ftx genes in non-eutherian vertebrates. However, genomic 2

alignments with SIM demontrated that Tsx is a truncated ortholog of Fip1l2 (see main text and below). Jpx and Ftx, are found in the same genomic interval and in the same orientation as UspL and Wave4 (Fig 2a). This suggests that like Xist, they may derive from proteincoding genes. Cnbp2 is a retrotransposed gene deriving from the Cnbp autosomal gene (3). We identified orthologs of Cnbp2 in human, mouse and cow that are located at the same position in the Xic locus. Phylogenetic analyses indicate that Cnbp exists in all vertebrates but that Cnbp2 is specific of eutherians. Genomic alignments. Repeated elements were masked from genomic sequences with RepeatMasker (Smit, A. F. A. and Green, P., unpublished), using taxon-specific transposable element data sets from Repbase Update (4). Pairwise local alignments between genomic sequences were then computed with SIM (5) using default parameters (match = 1, mismatch = -1, gap opening penalty = 6, gap extension penalty = 0.2). The SIM local alignment software (5) is based on the Smith & Waterman algorithm, that is slow but more sensitive than BLAST. For each pair of sequences, we searched the 300 best local alignments (SIM parameter k=300). The value of this parameter was tuned so as to be sure that no significant match would be excluded by SIM. Hence, a large majority of these local alignments correspond to non-significant similarities that occur by chance between any non-related sequences. Then, to reduce the number of such random matches, we selected among these 300 local alignments the best combination of hits occuring in the same order and orientation in both sequences (using LALNVIEW (6)). By imposing this constraint of conserved order and orientation, we increased the specificity of the homology searches. Typically, less than 10% of the 300 best local alignments are retained after this filtering. The comparison of human and chicken XicHR 3

sequences revealed 22 alignments. Eight of them overlap known exons in chicken, among which five also correspond to exons in human (NB: we considered that there was an overlap if at least 33% of the length of the exon is covered by the alignment). Some weak similarities may occur by chance between unrelated sequences. To compute the probability that such random sequence matches overlap known exons, we performed simulations. For each species, the position of each of the 22 alignments was randomly chosen along the genomic sequence (excluding the positions masked by RepeatMasker), and we counted the number of alignments overlapping exons. We performed 10 8 simulations to get the distribution of the number of random matches overlapping exons in each species, and inferred the probability to observe by chance 8 overlaps in chicken or 5 overlaps in human. There are two Xist exons that show similarity with two Lnx3 exons. The probability that random matches overlap two exons in both species is the product of the probability to overlap two exons in each species, divided by two (to take into account the fact they were found in the same order in both species). Supplementary discussion New function by loss of function? Does the loss of protein-coding capacities of Lnx3 in eutherians have any link with chromosome inactivation? Lnx3 is conserved in all vertebrate classes and like its paralogs Lnx1 and Lnx2, encodes a protein containing one RING type E3 ubiquitin ligase domain, one NPXY binding motif and four PDZ protein-interaction domains. Lnx1 and Lnx2 are thought to regulate the Notch (and/or ErbB2) signalling by targeting Numb (ErbB2) for degradation through the proteasome pathway(7, 8). Whilst the exons conserved in Xist correspond to two 4

of the PDZ motifs, these Xist exons contain frameshift mutations (Fig. 3). Unravelling further the precise biological function of Lnx3 in non-eutherian species, although beyond the scope of the present work, may be useful in understanding the evolution of X inactivation. Pseudogenization of protein-genes flanking Xist An intriguing issue is the coincident loss of protein-coding function in Lnx3 (to become Xist) with the loss of function of four other protein genes in the XicHR: Fip1l2, Rasl11c, UspL and Wave4. One possible explanation for this simultaneous loss of protein-coding function of these genes is that the expression of Xist, that has an activity of heterochromatinization in cis, might be incompatible with the correct regulation of neighbouring genes. Thus, the activity of Xist might have precluded the proper expression of flanking genes, thereby leading them to pseudogenization. This model implies that the advantages conferred by the expression of Xist were strong enough to counterbalance the deleterious effects of silencing four neighbor genes (that are well conserved in all other vertebrates). An alternative explanation would be that these five XicHR genes participate in a single process, separate from sex chromosome inactivation, that is no longer required in eutherians. Presently, little is known about the precise function of these genes. Fip1l2 is homologous to Fip1, a subunit of the cleavage and polyadenylation specific factor(9). Rasl11c belongs to the Ras family of small GTPases(10), and UspL to a large family of ubiquitin-specific proteases(11). Wave proteins are involved in the regulation of actin polymerization(12). It is interesting to note that Lnx proteins have a ubiquitin ligase activity, whereas Usp proteins are deubiquitylating enzymes. Thus one might imagine that UspL regulates the activity of Lnx3 by removing polyubiquitin from its target proteins, rescuing them from degradation by the 5

proteasome. This model is very speculative, and there is at present no evidence to suggest that the XicHR genes are involved in the same function. Finally, it is also possible that the pseudogenization of the XicHR genes is the direct consequence of the invasion of the Xic locus by transposable elements. Such high transposition activity might result from intragenomic conflicts at imprinted loci (13). References: 1. E. Birney et al., Nucleic Acids Res 34, D556 (2006). 2. S. F. Altschul et al., Nucleic Acids Res. 25, 3389 (1997). 3. C. Chureau et al., Genome. Res. 12, 894 (2002). 4. J. Jurka, Trends. Genet. 16, 418 (2000). 5. X. Huang, W. Miller, Advances in Applied Mathematics 12, 337 (1991). 6. L. Duret, E. Gasteiger, G. Perriere, Comput. Appl. Biosci. 12, 507 (1996). 7. D. S. Rice, G. M. Northcutt, C. Kurschner, Mol Cell Neurosci 18, 525 (2001). 8. P. Young et al., Mol Cell Neurosci 30, 238 (2005). 9. I. Kaufmann, G. Martin, A. Friedlein, H. Langen, W. Keller, Embo J 23, 616 (2004). 10. R. Louro et al., Biochem Biophys Res Commun 316, 618 (2004). 11. V. Quesada et al., Biochem Biophys Res Commun 314, 54 (2004). 12. T. E. Stradal et al., Trends Cell Biol 14, 303 (2004). 13. J. F. Wilkins, Trends Genet 21, 356 (2005). Supplementary figures and tables. 6

(a) Chic1 Lnx3 Uspl Xpct Fip1l2 Rasl Wave4 Chicken Human Chic1 Tsx Xist Jpx Ftx Xpct (pseudo) (b) Lnx3 Rasl11c Chicken Dog Xist Protein-coding exon Non-coding RNA exon Repeat sequence (SINE, LINE,...) Figure S1: Comparison of the chicken XicHR with the human and dog Xic region. Genomic sequences were first analyzed with RepeatMasker to identify and mask repeated elements and then aligned with SIM. The best combination of local alignments in consistent order and orientation is displayed (see Methods). (a) Global view of the whole chicken/human alignment. (b) Zoom on the Lnx3/Xist and Rasl11c region in the chicken/dog alignment. Positions are indicated in bp.

0.1 99 91 99 62 79 60 Mouse Rat Cow Dog Chimpanzee Human Opossum Chicken 97 Fugu Tetraodon 99 Rat Cow Dog Chimpanzee Human Opossum Chicken Mouse Lnx2 Chicken Xenopus Xenopus Fugu Tetraodon Fugu Opossum Lnx1 Lnx3 Figure S2: Phylogenetic tree of the Lnx gene family. The protein alignment comprised 478 sites (after exclusion of unreliable parts of the alignment). The tree was computed with Phyml(1), using the JTT model, with four categories of substitution rate and estimated gamma distribution parameter and estimated proportion of invariable sites. The scale of branch lengths is indicated (number of substitutions per site). Bootstrap values larger than 50% (after 500 replicates) are displayed. Lnx3 protein has been evolving three times faster than its chicken ortholog, which is suggestive of functional changes. Comparison with the chicken gene shows no frameshifts or non-sense mutations, and reveals a low ratio of non-synonymous over synonymous substitution (0.15). This shows that the opossum Lnx3, although rapidly evolving, is subject to purifying selection. Ensembl or EMBL gene identifiers: Cow (Bos taurus): ENSBTAG00000020658 (Lnx1), ENSBTAG00000015614 (Lnx2). Dog (Canis familiaris): ENSCAFG00000002039 (Lnx1), ENSCAFG00000006782 (Lnx2). Fugu (Fugu rubripes): SINFRUG00000132598 (Lnx1), SINFRUG00000149046 (Lnx2), SINFRUG00000131567 (Lnx3). Chicken (Gallus gallus): ENSGALG00000013940 (Lnx1), ENSGALG00000017096 (Lnx2), ENSGALG00000007703 (Lnx3). Human (Homo sapiens): ENSG00000072201 (Lnx1), ENSG00000139517 (Lnx2). Mouse (Mus musculus): ENSMUSG00000029228 (Lnx1), ENSMUSG00000016520 (Lnx2). Chimpanzee (Pan troglodytes): ENSPTRG00000016062 (Lnx1), ENSPTRG00000005733 (Lnx2). Rat (Rattus norvegicus): ENSRNOG00000002272 (Lnx1), ENSRNOG00000000955 (Lnx2). Tetraodon (Tetraodon nigroviridis): GSTENG00032770001 (Lnx1), GSTENG00023872001 (Lnx2). Xenopus (Xenopus tropicalis): ENSXETG00000011680 (Lnx1), ENSXETG00000004587 (Lnx3). Opossum (Monodelphis domestica): ENSMODG00000020659 (Lnx1), ENSMODG00000008989 (Lnx2), AM230659 (Lnx3). (1) S. Guindon, O. Gascuel, Syst Biol 52, 696 (Oct, 2003).

Table S1: Orthologs and paralogs of XicHR protein genes in vertebrates. Chicken protein genes located in the Xic homologous region were compared against all Ensembl protein predictions with BLASTP. Ensembl gene identifiers of homologous genes are indicated in the table. Groups of orthologous genes were determined by phylogenetic analyses, and are surrounded by a rectangle in the table. Most of genes are found in four linkage group. The symbol <> indicates that genes are linked on the same scaffold in the genome assembly. The symbol # indicates a gap in the genome assembly. (a) Gene symbols correspond to the human gene nomenclature, except for (b) that do not exist in human and were named on the basis of their phylogenetic relationship with their closest homologs. (c) The gene was absent from Ensembl annotations, but was found in the genomic contig with TBLASTN. (d) The human Chic1 gene is missing from Ensembl annotations but is present in the genomic sequence and described in EMBL (accession number AL358796) and Uniprot (accession number Q5JSZ4). (e) Recent paralog resulting from a duplication in the rodent lineage (f) Recent paralog resulting from a duplication in the primate lineage. (g) EMBL accession number. It should be noticed that in chicken, the XicHR region is located on an autosome (chromosome 4), and hence can not be directly involved in a process similar to the X- inactivation of eutherians. Interestingly, most of the paralogs the chicken XicHR genes are also found in close linkage. The four paralogons (located respectively on chromosomes X, 4, 6 and 13 in human) most probably result from the two whole genome duplications that occured early in the evolution of vertebrates 1. 1. Dehal, P. & Boore, J. L. Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol 3, e314 (2005).

Chrom. Cdx family Chic family Fip1 family Lnx family Rasl11 family Usp family Wave family Xpct family number Linkage group 1 (Xic, XicHR ): Gene symbol (a) Cdx4 Chic1 Fip1l2 (b) Lnx3 (b) Rasl11c (b) UspL (b) Wave4 (b) Xpct (SLC16A2 ) Zebrafish ENSDARG00000036288 <> Hit TBLASTN (c) <> <> Hit TBLASTN (c) # # ENSDARG00000018710 # ENSDARG00000023940 # ENSDARG00000029680 Fugu SINFRUG00000131564 <> SINFRUG00000131565 <> <> SINFRUG00000131567 # SINFRUG00000121898 <> SINFRUG00000121896 # # SINFRUG00000134073 Tetraodon # GSTENG00031147001 Xenopus ENSXETG00000004589 <> ENSXETG00000004588 <> Hit TBLASTN (c) <> ENSXETG00000004587 <> ENSXETG00000004586 <> ENSXETG00000004585 <> ENSXETG00000004583 <> ENSXETG00000004581 Chicken 4 ENSGALG00000007657 <> ENSGALG00000007660 <> ENSGALG00000007683 <> ENSGALG00000007703 <> ENSGALG00000007710 <> ENSGALG00000007726 <> ENSGALG00000007740 <> ENSGALG00000007748 Opossum ENSMODG00000020838 <> ENSMODG00000020837 # # AM230659, AM230660 (g) <> AM230660 (g) # # ENSMODG00000020870 # ENSMODG00000011962 Dog X ENSCAFG00000017189 <> Hit TBLASTN (c) <> <> <> <> <> <> ENSCAFG00000017195 Mouse X ENSMUSG00000031326 <> ENSMUSG00000031327 <> <> <> <> <> <> ENSMUSG00000033965 Human X ENSG00000131264 <> AL358796 (d) <> <> <> <> <> <> ENSG00000147 Linkage group 2 Gene symbol (a) Cdx2 Lnx2 Rasl11a Usp12 Wave3 (Wasf3) Zebrafish # ENSDARG00000045343 <> ENSDARG00000002379 Fugu # SINFRUG00000149046 <> <> SINFRUG00000149065 <> SINFRUG00000149079 Tetraodon # GSTENG00023872001 <> <> GSTENG00023876001 <> GSTENG00023878001 Xenopus # ENSXETG00000009805 <> ENSXETG00000009799 <> ENSXETG00000009796 Chicken 1 # ENSGALG00000017096 <> ENSGALG00000017099 <> ENSGALG00000017101 <> ENSGALG00000017103 Opossum ENSMODG00000009021 <> <> ENSMODG00000008989 <> ENSMODG00000008941 <> ENSMODG00000008874 <> ENSMODG00000008795 Dog 25 ENSCAFG00000006742 <> <> ENSCAFG00000006782 <> ENSCAFG00000006798 <> ENSCAFG00000006816 <> ENSCAFG00000006841 Mouse 5 ENSMUSG00000029646 <> <> ENSMUSG00000016520 <> ENSMUSG00000029641 <> ENSMUSG00000029640 <> ENSMUSG00000029636 Human 13 ENSG00000165556 <> <> ENSG00000139517 <> ENSG00000122035 <> ENSG00000152484 <> ENSG00000132970 Mouse 7 ENSMUSG00000062639 (e) Linkage group 3 Gene symbol (a) Wave1 (Wasf1) SLC16A10 Zebrafish # ENSDARG00000020984 Fugu SINFRUG00000127650 # SINFRUG00000154491 Tetraodon Xenopus ENSXETG00000021311 <> ENSXETG00000021337 Chicken 3 ENSGALG00000015066 <> ENSGALG00000015040 Opossum Dog 12 ENSCAFG00000003854 <> ENSCAFG00000003922 Mouse 10 ENSMUSG00000019831 <> ENSMUSG00000019838 Human 6 ENSG00000112290 <> ENSG00000112394 Linkage group 4 Gene symbol (a) Chic2 Fip1l1 (Fip1) Lnx1 (Lnx) Rasl11b Usp46 Zebrafish ENSDARG00000022690 <> ENSDARG00000011126 # # ENSDARG00000015611 # Fugu SINFRUG00000127299 # SINFRUG00000152840 # SINFRUG00000132598 # SINFRUG00000123609 <> SINFRUG00000123610 Tetraodon # GSTENG00006769001 # GSTENG00032770001 <> GSTENG00032776001 <> GSTENG00032777001 Xenopus ENSXETG00000011681 <> ENSXETG00000011679 <> ENSXETG00000011680 <> ENSXETG00000011677 # Chicken 4 ENSGALG00000013931 <> ENSGALG00000013943 <> ENSGALG00000013940 <> ENSGALG00000013948 <> ENSGALG00000013961 Opossum ENSMODG00000020667 <> ENSMODG00000020654 <> ENSMODG00000020659 <> ENSMODG00000020648 <> ENSMODG00000020647 Dog 13 ENSCAFG00000002050 <> <> ENSCAFG00000002039 <> <> ENSCAFG00000002014 Mouse 5 ENSMUSG00000029229 <> ENSMUSG00000029227 <> ENSMUSG00000029228 <> ENSMUSG00000049907 <> ENSMUSG00000054814 Human 4 ENSG00000109220 <> ENSG00000145216 <> ENSG00000072201 <> ENSG00000128045 <> ENSG00000109189 Linkage group 5 Gene symbol (a) Wave2 (Wasf2) Zebrafish Fugu SINFRUG00000150668 Tetraodon Xenopus ENSXETG00000020662 Chicken Opossum Dog 2 ENSCAFG00000012104 Mouse 4 ENSMUSG00000028868 Human 1 ENSG00000158195 Human X ENSG00000188459 (f) Linkage group 6 Gene symbol (a) Cdx1 Zebrafish Fugu Tetraodon Xenopus Chicken 13 ENSGALG00000005679 Opossum Dog Mouse 18 ENSMUSG00000024619 Human 5 ENSG00000113722 TABLE S1

Table S2: list of alignments identified between chicken XicHR and eutherian Xic genomic sequences Position in Chicken Chicken Eutherian Species Start End Length Score %identity annotation annotation Dog 25794 25824 31 19.0 80.7 non-coding region Dog 26699 26727 29 19.0 82.8 non-coding region Human 26852 26940 90 25.2 72.9 non-coding region Cow 26858 26888 31 19.0 80.7 non-coding region Mouse 26945 26966 22 20.0 95.5 non-coding region Human 31997 32053 57 23.0 70.2 Fip1l exon1 Tsx exon 4 Mouse 32020 32101 82 20.0 62.2 Fip1l exon1 Tsx exon 4 Human 33363 33436 74 22.8 72.1 Fip1l exon2 Tsx exon 5 Mouse 33369 33465 30.4 69.1 Fip1l exon2 Tsx exon 5 Cow 33809 33842 34 18.0 76.5 non-coding region Mouse 33846 33886 41 19.0 73.2 non-coding region Human 34478 34530 53 27.0 75.5 Fip1l exon3 Tsx exon 6 Mouse 36210 36244 35 19.0 77.1 non-coding region Human 37853 37936 84 42.0 75.0 Fip1l exon5 Human 40106 40193 89 23.8 67.1 Fip1l exon8 Human 41609 41650 42 22.0 76.2 non-coding region Human 45601 45656 56 22.4 77.4 non-coding region Human 46813 47010 198 45.2 66.1 Fip1l exon12 Cow 49955 49974 20 18.0 95.0 non-coding region Mouse 51691 51731 41 21.0 75.6 non-coding region Cow 52642 52671 30 20.0 83.3 non-coding region Mouse 52644 52669 26 22.0 92.3 non-coding region Human 57607 57726 120 22.0 59.2 Lnxl exon9 Xist exon h5/m6 Cow 60264 60317 54 20.4 76.5 non-coding region Cow 62080 62105 26 20.0 88.5 non-coding region Mouse 63516 63561 49 19.4 78.3 non-coding region Cow 64314 64335 22 18.0 90.9 non-coding region Mouse 67436 67459 24 22.0 95.8 non-coding region Dog 67628 67774 151 21.0 65.8 Lnxl exon3 Xist exon h4/m4 Human 67668 67778 111 19.8 64.0 Lnxl exon3 Xist exon h4/m4 Mouse 70429 70481 58 22.0 77.4 non-coding region Human 70687 70726 40 20.0 75.0 non-coding region within Xist exon 1 Dog 72768 72793 26 18.0 84.6 non-coding region Xist (by similarity) Dog 73455 73474 20 18.0 95.0 non-coding region Xist (by similarity) Human 76134 76190 66 19.2 73.7 non-coding region Human 79298 79330 33 19.0 78.8 non-coding region Dog 79500 79525 26 18.0 84.6 non-coding region Human 92826 92851 26 20.0 88.5 non-coding region Human 93507 93541 35 21.0 80.0 non-coding region Dog 96444 96484 41 19.0 73.2 non-coding region Human 104928 104954 27 19.0 85.2 non-coding region Mouse 111611 111647 37 19.0 75.7 non-coding region Dog 119400 119705 306 44.0 60.5 Rasl exon4 Cow 119547 119705 159 29.8 62.8 Rasl exon4 Mouse 123097 123123 27 19.0 85.2 non-coding region Dog 123420 123466 48 22.8 80.9 Rasl exon3 Cow 123427 123459 33 19.0 78.8 Rasl exon3 Dog 125885 125964 80 34.0 71.3 Rasl exon2 Cow 125885 125964 80 38.0 73.8 Rasl exon2 Cow 127343 127448 106 44.0 70.8 Rasl exon1 Dog 127350 127449 24.8 65.7 Rasl exon1 Human 130231 130273 43 20.6 82.9 non-coding region Human 139243 139285 43 19.0 72.1 non-coding region Cow 141754 141808 55 19.0 67.3 non-coding region Dog 141938 142018 82 20.8 66.7 non-coding region Dog 145137 145200 64 20.0 65.6 non-coding region Cow 145608 145661 54 18.4 74.5 non-coding region Cow 149893 149920 28 18.0 82.1 non-coding region Cow 150413 150451 39 19.0 74.4 non-coding region Human 157766 157792 27 21.0 88.9 non-coding region Cow 157779 157798 20 20.0.0 non-coding region Mouse 157881 157941 64 26.4 82.0 non-coding region Dog 157900 157945 46 22.0 73.9 non-coding region Cow 157919 157943 25 19.0 88.0 non-coding region Mouse 158379 158414 36 20.0 77.8 non-coding region Mouse 159129 159156 28 20.0 85.7 non-coding region Cow 160023 160047 25 19.0 88.0 non-coding region Human 160026 160048 23 21.0 95.7 non-coding region Dog 160748 160776 29 21.0 86.2 non-coding region Cow 160753 160785 33 19.0 78.8 non-coding region Mouse 170893 170936 48 19.2 79.6 non-coding region Dog 173849 173891 43 21.0 74.4 non-coding region Dog 176025 176062 38 20.0 76.3 non-coding region Mouse 176863 176889 27 19.0 85.2 non-coding region Human 178712 178744 33 19.0 78.8 non-coding region Dog 182558 182589 32 20.0 81.3 non-coding region Mouse 185789 185838 50 20.0 70.0 non-coding region