Nonneutral Evolution of the Transcribed Pseudogene Makorin1-p1 in Mice

Similar documents
Drosophila melanogaster and D. simulans, two fruit fly species that are nearly

Dr. Amira A. AL-Hosary

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Letter to the Editor. Department of Biology, Arizona State University

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

7. Tests for selection

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Lecture Notes: BIOL2007 Molecular Evolution

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

FUNDAMENTALS OF MOLECULAR EVOLUTION

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Understanding relationship between homologous sequences

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

How Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building

SEQUENCE DIVERGENCE,FUNCTIONAL CONSTRAINT, AND SELECTION IN PROTEIN EVOLUTION

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Efficiencies of maximum likelihood methods of phylogenetic inferences when different substitution models are used

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

KaKs Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Concepts and Methods in Molecular Divergence Time Estimation

Phylogenetic inference

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law

Letter to the Editor. Temperature Hypotheses. David P. Mindell, Alec Knight,? Christine Baer,$ and Christopher J. Huddlestons

Lecture 7 Mutation and genetic variation

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18

Phylogenetic Tree Reconstruction

Genomes and Their Evolution

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

Phylogenetics: Building Phylogenetic Trees

Temporal Trails of Natural Selection in Human Mitogenomes. Author. Published. Journal Title DOI. Copyright Statement.

8/23/2014. Phylogeny and the Tree of Life

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Constructing Evolutionary/Phylogenetic Trees

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

EVOLUTIONARY DISTANCES

Computational Biology: Basics & Interesting Problems

Natural selection on the molecular level

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Lecture 11 Friday, October 21, 2011

What Is Conservation?

Small RNA in rice genome

The Phylogenetic Handbook

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

TE content correlates positively with genome size

Group activities: Making animal model of human behaviors e.g. Wine preference model in mice

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogeny Estimation and Hypothesis Testing using Maximum Likelihood

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Evolutionary Analysis of Viral Genomes

Graph Alignment and Biological Networks

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Multiple Sequence Alignment. Sequences

Computational Structural Bioinformatics

Evolution of the Sry gene within the African pygmy mice Nannomys

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Lecture 4. Models of DNA and protein change. Likelihood methods

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Fitness landscapes and seascapes

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

EVOLUTIONARY DISTANCE MODEL BASED ON DIFFERENTIAL EQUATION AND MARKOV PROCESS

The Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Processes of Evolution

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Lecture 20 DNA Repair and Genetic Recombination (Chapter 16 and Chapter 15 Genes X)

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Computational approaches for functional genomics

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc


1 ATGGGTCTC 2 ATGAGTCTC

The Phylo- HMM approach to problems in comparative genomics, with examples.

Evolution by duplication

Molecular Evolution & the Origin of Variation

Molecular Evolution & the Origin of Variation

Graduate Funding Information Center

Variance and Covariances of the Numbers of Synonymous and Nonsynonymous Substitutions per Site

A Statistical Test of Phylogenies Estimated from Sequence Data

Full file at CHAPTER 2 Genetics

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Transcription:

Nonneutral Evolution of the Transcribed Pseudogene Makorin1-p1 in Mice Ondrej Podlaha and Jianzhi Zhang Department of Ecology and Evolutionary biology, University of Michigan, Ann Arbor Pseudogenes are nonfunctional relics of formerly functional genes and are thought to evolve neutrally. In some pseudogenes, however, the molecular evolutionary patterns are atypical of neutrally evolving sequences, exhibiting sequence conservation, codon-usage bias, and other features associated with functional genes. Makorin1-p1 is a transcribed pseudogene first identified in the mouse Mus musculus. The transcript of Makorin1-p1 can regulate the stability of the transcript of its paralogous functional gene Makorin1. Specifically, the half-life of Makorin1 mrna increases significantly in the presence of Makorin1-p1 transcript, and targeted deletion of Makorin1-p1 is lethal in mice. Here, we show that Makorin1-p1 originated after the separation of Mus and Rattus but before the divergence of M. musculus and M. pahari. The transcribed region of Makorin1-p1 exhibits rates of point and indel substitutions that are two to four times lower than those in the untranscribed region, suggesting that the transcribed region is under functional constraint and is not evolving neutrally. Although the transcript of Makorin1-p1 likely functions by its sequence similarity to Makorin1, we find no evidence of gene conversion between them, indicating that functional conservation alone is sufficient to maintain their coordinated evolution. A duplication-degeneration model is proposed to explain how Makorin1-p1 was co-opted into the regulatory system of Makorin1. There are over 10,000 pseudogenes in a typical mammalian genome, and it is plausible that many functional but untranslatable pseudogenes exist. Our results illustrate the potential of using evolutionary analysis to identify such pseudogenes from genome sequences. Introduction Pseudogenes are defined as DNA sequences of formerly functional genes rendered nonfunctional by severe mutations. Operationally, pseudogenes are usually identified by their disrupted open reading frames (ORFs), which are homologous to functional genes. Because of the presumable nonfunctionality, pseudogenes have been regarded as a paradigm of neutral evolution (Li, Gojobori, and Nei 1981). Several examples, however, challenge this traditional perspective. Pseudogenes such as Drosophila yakuba Adh-w (Jeffs and Ashburner 1991; Jeffs, Holmes, and Ashburner 1994) and D. melanogaster Lcp-w (Pritchard and Schaeffer 1997), a-esterase gene cluster (Robin et al. 2000), and Pglym87-w (Currie and Sullivan 1994) exhibit features of molecular evolution atypical of neutrally evolving sequences. Although seemingly deleterious mutations render these pseudogenes untranslatable, characteristics such as retention of codon-usage bias, excess of synonymous substitution rate over nonsynonymous rate, and rates of evolution only slightly higher than those of functional genes have been observed (Balakirev and Ayala 2003). Direct evidence for biological function of operationally defined pseudogenes is available in only a few cases. Nitric oxide synthase (NOS) is involved in intracellular signaling in the nervous system and is coexpressed with a NOS-related pseudogene in the snail Lymnaea stagnalis (Korneev, Park, and O Shea 1999). It was experimentally shown that the pseudogene transcript plays an important role in the regulation of the NOS protein synthesis by forming a stable RNA-RNA duplex with the transcript of the functional NOS gene. The formation of the duplex significantly suppresses the translation of NOS mrna and is possible because of antisense identity between the transcripts of the functional Key words: substitution rate, neutrality, pseudogene, Makorin, rodents, mice. E-mail: jianzhi@umich.edu. Mol. Biol. Evol. 21(12):2202 2209. 2004 doi:10.1093/molbev/msh230 Advance Access publication August 11, 2004 gene and pseudogene. In another dramatic example, targeted deletion of the pseudogene Makorin1-p1 in mice led to 80% mortality within 2 days of birth (Hirotsune et al. 2003). Affected individuals suffered from severe bone deformity and failure to thrive (Hirotsune et al. 2003). Although Makorin1-p1 (on chromosome 5) showed typical features of a truncated pseudogene derived from the functional gene Makorin1 (on chromosome 6), it was robustly transcribed. Sequence analysis of the approximately 700-bp pseudogene transcript revealed high similarity to the 59 end of Makorin1 mrna. Hirotsune et al. (2003) experimentally showed that the Makorin1-p1 transcript regulates the stability of Makorin1 mrna in trans. In the presence of Makorin1- p1 transcript, the half-life of Makorin1 mrna significantly increased. Although the exact molecular mechanism involved in this expression regulation is not clear, Hirotsune et al. (2003) proposed a plausible RNA-mediated model in which transcripts of Makorin1-p1 and Makorin1 directly compete for a destabilizing factor (e.g., RNase) that binds to the highly similar 59 sequence found in both transcripts (see also Lee [2003]). The pseudogene transcript effectively titrates out the destabilizing factor and protects the coding mrna from decay. This gene regulation model, as well as the high sequence similarity (95.4%) between the Makorin1-p1 transcript and the 59 portion of Makorin1 mrna, suggests that Makorin1-p1 and Makorin1 evolve in a coordinated fashion. Is it because of concerted evolution? Are there functional constraints on the transcribed and untranscribed portions of Makorin1-p1? To address these questions, we sequenced Makorin1-p1 in five additional Mus species. Our results show functional constraints on transcribed regions of Makorin1-p1, yet no evidence for concerted evolution. Materials and Methods PCR and Sequencing Hirotsune et al. (2003) discovered a transcript (;700 bp) of the pseudogene Makorin1-p1 (GenBank accession Molecular Biology and Evolution vol. 21 no. 12 Ó Society for Molecular Biology and Evolution 2004; all rights reserved.

Nonneutral Evolution of Pseudogene 2203 number AF 494488) capable of regulating the expression of its functional paralog Makorin1. We sequenced a continuous genomic region that spans extra 120 nucleotides 59 and 776 nucleotides 39 of the reported Makorin1- p1 transcript from five rodents: Mus macedonicus, M. caroli, M. cervicolor, M. spretus, and M. spicilegus. The high sequence divergence in the nontranscribed regions prevented us from amplifying additional flanking sequences. We also sequenced portions of Makorin1 that are homologous to the transcribed region of Makorin1-p1 from M. spretus and M. caroli. The primers used for PCR amplifications of Makorin1-p1 were based on M. musculus Makorin1-p1 (NT 039308 contig sequence), and the primer sequences are 59 of the first half: AGCAGCCACGTT- TATGTCCATC; 39 of the first half: ACCACCTCCATG- CAGATCCC and CTGCTTCTCCTCTTTCTCCTCCA; 59 of the second half: GCTGCCCAGAGGTCACAGC and CTTACTGCCCCTTCCTGCACT; and 39 of the second half: GGGAAGGTGAGGGTAGAGGAAC. PCR primers used in the amplification of Makorin1 were designed according to M. musculus Makorin1 (AF192785 ) and the primer sequences are 59 exon1: CCATTGCTGTGTGGGATAAA- CAGTAAT, 39 exon1: GCAACGTCCTACCTGCAGGT- GA; 59 exon2: GATTTCTGGTTCTTTGTCTTACAGATA, 39 exon2: AAGAGTTTATAAACAAATCCTTACCTGC; 59 exon3: TAACTGCTTACTTTTCTCTGTCAGATACG, 39 exon3: GTTTACAGGGTCCTCACAATACTTACTAC; 59 exon4: CCTTTACTACCATCTTAAACATTTTCAGC, 39 exon4- CACAAGCTGCTCTGAGCACTTACTT; 59 exon5: CTCTTGTCTCTTCCCTTCTTCCAGTC, 39 exon5: CGAAGATTGGGGAGGAGTCTCACT; and 59 exon6: AGCCTGTTCATTGTTTCCCAGGTC, 39 exon6: AGCC- CGGCTGCTCATACCTCA. PCR products were purified and cloned into pcr4topo vector (Invitrogen) and sequenced in both directions by an automated DNA sequencer. Sequence Analysis The nucleotide sequences of Makorin1-p1 from Mus musculus and the above five Mus species were aligned using ClustalX (Thompson et al. 1997) followed by manual adjustment. A well-supported phylogenetic tree of the six taxa used in our analysis was recently obtained using nuclear and mitochondrial DNA sequences (Lundrigan, Jansa, and Tucker 2002). We used this tree topology in our analysis of substitution rates. Insertion and deletion substitutions were manually counted using the parsimony principle based on the species tree. Nucleotide substitution rates were estimated using MEGA version 2.0 (Kumar et al. 2001). Likelihood ratio tests were used to examine nucleotide substitution rate variation among regions of the sequenced Makorin1-p1 pseudogene. Because likelihood ratio tests may give different results under different assumptions (Zhang 1999), we implemented JC69 (Jukes and Cantor 1969), HKY85 (Hasegawa, Kishino, and Yano 1985), TN93 (Tamura and Nei 1993), and general reversible (GTR [Yang 1997]) nucleotide substitution models. JC69 is the simplest of these four models, and it assumes equal probabilities for all types of nucleotide substitutions. A more complex HKY85 model allows for unequal base frequencies and incorporates different substitution rates for transitions and transversions. TN93 generalizes the HKY85 model to allow for different rates for transitions between purines and between pirimidines. The GTR model is the most complex and includes three parameters describing nucleotide frequencies and five parameters describing relative substitution rates among nucleotides. We also used a gamma distribution that describes rate variation among sites in GTR. PAML (Yang 1997) was used for the likelihood computation. To test whether there is concerted evolution between Makorin1-p1 and Makorin1, we made a gene tree using the neighbor-joining method (Saitou and Nei 1987) and examined sister relationships among genes. Sequences used in this phylogenetic reconstruction included Makorin1-p1 sequences of all six Mus species mentioned above, Makorin1 sequences from M. musculus, M. caroli, and M. spretus, and representatives of the Makorin gene family from mouse, rat, and human. The number of synonymous nucleotide substitution per synonymous site (d S ) was computed by the modified Nei-Gojobori method (Zhang, Rosenberg, and Nei 1998). Results Makorin1-p1 has only been identified previously in Mus musculus (Hirotsune et al. 2003). To determine its time of origin, we searched the rat and human genomes but did not find Makorin1-p1. This suggests either that Makorin1-p1 emerged in mice after they diverged from rats or that the gene deteriorated enough that it could not be recovered from the rat and human genome. To investigate the evolutionary rate and pattern of Makorin1-p1, we sequenced its genomic region that is known to be transcribed (;700 bp). In addition, we sequenced an adjacent region upstream of the transcript because the upstream sequence may contain necessary cis regulatory elements for transcription. For comparison, we also sequenced a downstream region of the transcript. Because of high sequence divergence, only 120 and 776 nucleotides were obtained for the two flanking regions, respectively. From 59 to 39, the three regions are referred to as regions A (120 bp), B (700 bp), and C (776 bp) (fig. 1). We were able to obtain these sequences from M. macedonicus (AY699801), M. caroli (AY699804), M. cervicolor (AY699803), M. spretus (AY699805), and M. spicilagus (AY699802). Amplification of Makorin1-p1 from a more distantly related species M. pahari (AY699800) was partially successful, as only region A1B of the pseudogene could be recovered. The fulllength sequence from M. musculus was downloaded from GenBank. In the following analyses, Mus pahari is excluded, and only the six full-length Makorin1-p1 sequences of M. macedonicus, M. caroli, M. cervicolor, M. spretus, M. spicilagus, and M. musculus are considered. We compared the pairwise distances among the six species for region A1B and for region C of the Makorin1- p1 sequences (fig. 2). Ten out of 15 pairwise distances are higher for region C than for region A1B, prompting us to further examine the rate heterogeneity in detail.

2204 Podlaha and Zhang FIG. 1. Nucleotide sequence alignment of Makorin1-p1 from six Mus species. Dots represent identical nucleotides to the Mus musculus sequence, and dashes indicate alignment gaps. The dotted line marks region A, solid line region B (transcribed portion of Makorin1-p1), and dashed line region C. For comparison, the Makorin1 sequence of M. musculus that is homologous to region B of Makorin1-p1 is also presented. The first 20 and last 22 nucleotides in this alignment are primer-encoded sequences and are not used in subsequent analyses. Applying the likelihood ratio test, we examined whether the nucleotide substitution rate varies among the three regions aforementioned. We assumed a particular model of nucleotide substitution for all the regions but allowed the substitution rate to vary among regions. The known species tree (fig. 3) for the six Mus species was used. To test several hypotheses of rate heterogeneity, we first implemented the JC69 model and treated the whole Makorin1-p1 sequence as a single region (A1B1C together). The logarithm of the likelihood (lnl) calculated for this hypothesis was 23023.95 (table 1). Next, we treated regions A and B separately from region C (A1B and C), under the assumption that the same JC69 model applies to the two regions and the relative branch lengths remain constant for the two regions. The lnl for this hypothesis was 23013.01, significantly higher than the previous one (P, 0.00001). The relative substitution rate for region C was 2.01 times that of region A1B. This suggests that region C evolves significantly more rapidly than region A1B. Last, we treated all three regions A, B, and C separately (A, B, C), obtaining lnl ¼ 23012.53. The relative substitution rates in regions A, B, and C were 1, 0.72, and 1.59, respectively. This result suggests that the transcribed region (B) evolves the slowest, and region C evolves 1.5 to 2 times faster than region A and B. However, the likelihood ratio test did not show this hypothesis to be significantly better than our second hypothesis, implying that regions A and B evolve at similar pace. Furthermore, we applied HKY85, TN93, and GTR nucleotide substitution models to our data set

Nonneutral Evolution of Pseudogene 2205 FIG. 2. Pairwise Jukes-Cantor distances of the six Mus species at the Makorin1-p1 locus. Distances were estimated with the completedeletion option. following a similar strategy to see whether our results are robust. The outcome of these analyses is summarized in table 1. We found nucleotide substitution rates, regardless of the substitution models assumed, to be no different between regions A and B. However, when compared with region C, A and B evolve significantly slower. This result remained unchanged even when a gamma distribution that describes among-site rate heterogeneity was used (table 1). To compare the Makorin1-p1 substitution rates with the neutral substitution rate in the Mus genome, we computed the synonymous nucleotide substitution rate (d S ) from 10 orthologous gene pairs of M. caroli and M. musculus using the modified Nei-Gojobori method (Zhang, Rosenberg, and Nei 1998). Because of the limited sequence data of M. caroli in GenBank, these are the only genes in GenBank that are suitable for this analysis. The 10 d S values vary from 0.025 to 0.126 (table 2), with an average of 0.049. The nucleotide distance between the same species pair for Makorin1-p1 was 0.032 6 0.006 and 0.107 6 0.012 for the A1B and C regions, respectively. Although the nucleotide substitution rate in region C FIG. 3. Numbers of indels in the evolution of Makorin1-p1 in six Mus species. On each tree branch is the parsimony-inferred number of indels for region A1B, followed by that for region C. The phylogeny follows Lundrigan, Jansa, and Tucker (2003). Table 1 Likelihood Ratio Tests of Equal Substitution Rates Among Different Regions of Makorin1-p1 Hypothesis a lnl b 2 (lnl 2 2 lnl 1 ) P-value Relative Rates c JC69 A1B1C 23023.95 A1B,C 23013.01 21.88 3.0 310 26 A1B: 1 C: 2.01 A,B,C 23012.53 0.96 0.33 A: 1 C: 1.59 B: 0.72 HKY85 A1B1C 22951.50 A1B,C 22940.62 21.76 3.0 310 26 A1B: 1 C: 2.01 A,B,C 22940.10 1.04 0.31 A: 1 C: 1.57 B: 0.71 TN93 A1B1C 22951.14 A1B,C 22940.10 22.08 3.0 310 26 A1B: 1 C: 2.12 A,B,C 22939.61 0.98 0.32 A: 1 C: 1.60 B: 0.71 GTR A1B1C 22948.13 A1B,C 22936.92 22.42 2.0 310 26 A1B: 1 C: 2.12 A,B,C 22936.45 0.94 0.33 A: 1 C: 1.61 B: 0.72 GTR-a d A1B1C 22900.11 A1B,C 22897.21 5.80 0.02 A1B: 1 C: 1.91 A,B,C 22896.81 0.80 0.37 A: 1 B: 0.67 C: 1.39 a Hypotheses of rate heterogeneity. A1B1C: whole pseudogene sequence evolves under the same rate; A1B,C: region A1B evolves under one rate and region C evolves under an independent rate; A,B,C: all three regions evolve under different rates. b Logarithm of likelihood score for various nucleotide substitution models. Best-fit hypothesis of rate heterogeneity was determined by likelihood ratio test. The p values were calculated using the 2(lnL 2 2 lnl 1 ) formula and v 2 distribution with 1 df. c Relative substitution rates for different regions. d GTR model with a gamma distribution of rate variation among sites. is significantly higher than the average synonymous substitution rate of the 10 genes concatenated (P, 0.01), there is one gene (Rnase6) that shows higher d S than the substitution rate of region C. If the 10 genes can be considered as representatives of the entire genome, the substitution rate in region C is at the 10% upper tail of the distribution from all genes in the genome. In other words, Makorin1-p1 is located in a region of relatively high mutation rate. Bustamante, Nielsen, and Hartl (2002) reported effects of GC-content differences between parental functional gene and new pseudogene loci on pseudogene substitution rates. In our case, however, Makorin1 and Makorin1-p1 loci have nearly identical GC content. It is also unlikely that region C is under positive selection, as it is not transcribed and has no known function. The nucleotide sequence alignment of Makorin1-p1 (fig. 1) shows the presence of 22 insertions/deletions (indels). To test whether indels are randomly distributed between region A1B and region C, we mapped the indels onto the species tree (fig. 3). We observed five and 17 indels for regions A1B and C, respectively. This translates into 0.006 and 0.025 indel substitutions per site for the two regions, respectively. Following Podlaha and

2206 Podlaha and Zhang Table 2 Numbers of Synonymous Substitutions per Synonymous Site (d s ) Between Orthologous Genes of M. caroli and M. musculus Calculated from Available Coding Sequences in GenBank Gene Name Length(nt) d s 6 SD Chromosome Accession Numbers b Ahr 2418 0.054 6 0.009 12 AF405564, NM_013464 Csprs 891 0.026 6 0.009 1 AJ401358, AJ401359 Irbp 780 0.046 6 0.013 14 AB033707, AB033711 Rnase 6 462 0.126 6 0.032 14 AY545651, NM_030098 SmcX 660 0.025 6 0.011 X AY260501, AF127245 Tcp1 393 0.053 6 0.021 17 AY057819, BC003809 Zfa 996 0.038 6 0.012 10 AY160001, NM_009540 Zfx 996 0.037 6 0.011 X AY160017, NM_011768 Zfy2 996 0.058 6 0.015 Y AY159977, NM_009571 Zp3 240 0.100 6 0.038 5 AY057754, NM_011776 avg. X a 1656 0.032 6 0.008 avg. Y a 996 0.058 6 0.015 avg. auto a 6180 0.052 6 0.006 avg. all a 8832 0.049 6 0.005 a Average for all genes located on either X, Y, or autosomes and average for all genes was calculated after concatenating all appropriate sequences. b Accession numbers for M. caroli and M. musculus sequences. Zhang (2003), we compared the two rates by a chi-square test and found them to be significantly different (v 2 ¼ 8.983, P, 0.003). On the other hand, we found no significant difference in indel substitution rate between regions A and B (P. 0.4). This finding, together with the result of a significantly lower nucleotide substitution rate in region A1B than in region C, strongly suggests the presence of selective constraints on region A1B, implying nonneutral evolution of Makorin1-p1. Although the exact molecular mechanism by which the Makorin1-p1 transcript regulates the stability of Makorin1 mrna is unknown, the suggested model (Hirotsune et al. 2003) implies that high sequence similarity between them is necessary for the functionality of the pseudogene. That is, Makorin1-p1 must evolve in concordance with Makorin1. To examine whether gene conversion and concerted evolution is responsible for this coordinated evolution, we reconstructed a gene phylogeny using Makorin1-p1 sequences from all six species and the Makorin1 sequence from M. musculus, M. caroli, and M. spretus (fig. 4). Included in the tree were also other members of the Makorin family from the human, mouse, and rat. Only regions homologous to Makorin1-p1 region B were used for tree-making. Concerted evolution will generate phylogenetic clustering of the functional gene and pseudogene from the same species. In other words, Makorin1-p1 and Makorin1 of M. musculus, for example, should form a monophyletic group, in exclusion of Makorin1-p1 of other species. This, however, was not observed. Makorin1-p1 of M. musculus is more closely related to pseudogenes of other Mus species than to Makorin1 of M. musculus, with high bootstrap support. The same result was obtained when the partial sequence of M. pahari Makorin1-p1 was included in the phylogenetic analysis. We noted, however, that the clade of Makorin1 from M. caroli, M. musculus, and M. spretus does not show the same phylogenetic relationships as the clade of Makorin1-p1 for the same taxa. Makorin1 shows a sister relationship of M. spretus to M. musculus, but Makorin1- p1 exhibits a sister relationship of M. spretus to M. caroli. This discrepancy is most likely caused by the limited number of nucleotides (610) used in this phylogenetic analysis. At any rate, our result suggests that gene conversion is not responsible for the presumably coordinated evolution between Makorin1-p1 and Makorin1. Functional conservation and purifying selection may be the sole reason for the high sequence similarity between them. Figure 4 shows that the branch leading to Makorin1- p1 of a Mus species is significantly longer than that to Makorin1 of the same species when the rat Makorin is used as an outgroup (P, 0.001; Tajima s [1993] test). This suggests that although the transcribed region of Makorin1-p1 is under purifying selection, the selection is weaker than that on Makorin1. Discussion The tree in figure 4 shows that Makorin1-p1 is a duplicate of Makorin1. Because of our limited taxon sampling, it is difficult to infer an accurate time of duplication. Our success in obtaining part of Makorin1-p1 in M. pahari suggests that the duplication event must have occurred before the divergence of M. musculus and M. pahari 8 to 10 MYA (Singh, Barbour, Berger 1998). Our finding of significantly lower rates of point and indel substitutions in transcribed regions than in untranscribed regions of Makorin1-p1 demonstrates selective constraints on the transcript of this pseudogene. These results are in concordance with Hirotsune et al. s (2003) finding that the Makorin1-p1 transcript is functionally important. Phylogenetic analysis does not support the hypothesis of gene conversion as the mechanism responsible for the coordinated evolution of Makorin1-p1 and Makorin1. Makorin1 belongs to the Makorin family of transcription factors. Putative orthologs of Makorin1 are known in human, mouse, wallaby, chicken, fruitfly, and nematode (Gray, Azama, and Whitmore 2000). Such wide phylogenetic distribution suggests its functional

Nonneutral Evolution of Pseudogene 2207 FIG. 4. Phylogenetic relationships of Makorin homologs from the mouse, rat, and human, including newly obtained Markorin1 and Markorin1- p1 sequences from Mus. The phylogeny was reconstructed using the DNA sequences homologous to Markorin1-p1 region B. The neighbor-joining method (Saitou and Nei 1987) with Kimura s (1980) distances was used. Bootstrap percentages derived from 1,000 replications are shown on interior branches. The GenBank accession numbers for the sequences are Makorin1 (H. sapiens), AF192784; Makorin1 (M. musculus), AF192785; Makorin1 (R. norvegicus), XM216131; Makorin2 (H. sapiens), AF302084; Makorin2 (R. norvegicus), XM238369; Makorin2 (M. musculus), AF277171; Makorin3 (M. musculus), NM011746; Makorin3 (H. sapiens), NM005664; Makorin3 (R. norvegicus), XM218735; Makorin4 (H. sapiens), NM030757; Makorin1 (M. spretus), AY699807; Makorin1 (M. caroli), AY699806; Makorin1-p1 (M. spretus), AY699805; Makorin1-p1 (M. spicilegus), AY699802; Makorin1-p1 (M. caroli), AY699804; Makorin1-p1 (M. macedonicus), AY699801; Makorin1-p1 (M. cervicolor), AY699803; and Makorin1-p1 (M. musculus), AF494488. importance. Hirotsune et al. s (2003) experiments showed a novel Makorin1 regulatory mechanism involving the transcript of the pseudogene Makorin1-p1. Makorin1-p1 knockout mice suffer from severe bone deformities, and most of them die in 2 days. Because Makorin1-p1 is most likely absent in rats, an interesting question is why rats do not suffer from the severe phenotypes that the knockout mice exhibit. How is Makorin1 expression regulated in rats and other species without Makorin1-p1? How and why was Makorin1-p1 co-opted into the regulatory mechanism of Makorin1? Based on experimental evidence, Hirotsune et al. (2003) proposed that Makorin1-p1 functions by titrating out a destabilizing factor that otherwise destabilizes the mrna of Makorin1. Assuming that this hypothesis is correct, we here propose a duplication-degeneration model (fig. 5) that explains the evolutionary origin of Makorin1-p1 s involvement in Makorin1 s expression regulation. Before the emergence of Makorin1-p1, the production of Makorin1 mrna was high enough to titrate out the destabilizing factor. Immediately after Makorin1-p1 was produced by duplication from Makorin1, Makorin1-p1 had the same sequence and expression level as Makorin1. The doubling of the amount of mrna may be unnecessary or even slightly deleterious. Thus, degenerate mutations that reduce the expression level of Makorin1 could be fixed. If this happened, Makorin1-p1 became indispensable, as its transcript would be needed to titrate out the destabilizing factor. Mutations that disrupted the ORF of Makorin1-p1 could still be fixed because the transcript could still titrate out the destabilizing factor. This would explain the conservation in both expression pattern and transcript sequence of the pseudogene. This model could be tested in the future by examining the expression levels of Makorin1 and Makorin1-p1 in mice and the expression level of Makorin1 in rats. The hypothesis predicts that the amount of Makorin1 expression in rats is higher than that in mice. The discovery that pseudogenes can regulate the expression of their functional homologs opens a new research area on gene regulation. An obvious question arising from this finding is how common this phenomenon is throughout the genome. The number of pseudogenes found in a genome varies greatly among organisms. From analyses of completely sequenced genomes, there seems to be no direct correlation between the number of pseudogenes and that of functional genes. For example, the ratio between these two numbers is 0.05 in the nematode Caenorhabditis elegans (;1,100 pseudogenes), 0.035 in the yeast Saccharomyces cerevisiae (;220), 0.008 in the fly Drosophila melanogaster (;110) (Harrison et al. 2003), 0.45 in mouse (;14,000 [Waterson et al. 2002]), and 0.66 in human (;20,000 [Torrents et al. 2003]). High rates of genomic DNA deletion in Drosophila and Caenorhabditis were thought to be the cause of very few detectable pseudogenes in these species (Petrov and Hartl 2000; Petrov 2001). However, the exact process

2208 Podlaha and Zhang FIG. 5. Duplication-degeneration model of the origin of the functional pseudogene Makorin1-p1. Hirotsune et al. (2003) proposed that Makorin1-p1 regulates Makorin1 expression by titrating out a destabilizing factor that otherwise destabilizes the mrna of Makorin1. Our evolutionary model is based on this hypothesis. Before the emergence of Makorin1-p1, the production of Makorin1 mrna was high enough to titrate out the destabilizing factor (A). Immediately after duplication of the ancestral Makorin1 gene, both copies had same expression levels (B). The doubling of the amount of mrna could have been unnecessary or even slightly deleterious. Thus, degenerate mutations that reduced the expression level of Makorin1 could be fixed. If this happened, Makorin1-p1 became indispensable, as its transcript would be needed to titrate out the destabilizing factor (C). Mutations disrupting the ORF of Makorin1-p1 could still be fixed because the transcript could still titrate out the destabilizing factor. This would explain the conservation in both expression pattern and transcript sequence of the pseudogene. explaining such discrepancies in relative pseudogene numbers between invertebrates and vertebrates is still unknown. Is it possible that many vertebrate pseudogenes are actively involved in gene regulation and contribute to genomic and organismal complexity? In this report, we provided evidence for evolutionary constraints on the transcribed portion of Makorin1-1p. Many examples of transcribed pseudogenes are known (e.g., Board, Coggan, and Woodcock 1992; Brandt et al. 1993; Furbass and Vanselow 1995; Bard et al. 1995), and it is possible that some of them also play physiological roles. Even retrotransposed genes, which are generally thought to become pseudogenes immediately after retrotransposition, have been shown to be able to evolve into new functional genes (Long et al. 2003). The findings of functional pseudogenes are relevant when pseudogenes are used for estimating rates and patterns of mutations (Li, Gojobori, and Nei 1981; Gojobori, Li, and Graur 1982; Li, Wu, and Luo 1984; Graur, Shuali, and Li 1989; Petrov and Hartl 1999). Care should be taken to ensure that such pseudogenes are indeed nonfunctional. Our approach of comparing point and indel substitution rates in transcribed and untranscribed regions of pseudogenes may be used to systematically look for functional pseudogenes, particularly when complete genome sequences of closely related species are available, and thereby expedite the discovery of more functional elements in genomes. Another question arising from the study of Makorin1- p1 is whether we can still call such sequences pseudogenes. The meaning of nonfunctionality in the pseudogene definition is quite ambiguous. From studies on pseudogene evolution mentioned above, functionality can be described on two levels: operational and biological. At the operational level, one can examine whether a particular gene contains frame-shifting indels, premature stop codons, and disrupted splice recognition sites. At the biological level, functionality can be described as the physiological role and fitness effect of the presence of the pseudogene. Of these two, biological function is much more difficult to examine. Decisions on whether to call a particular DNA sequence a gene or pseudogene can be made only after careful scrutiny and experimental investigations of its function. To strictly adhere to the pseudogene definition, only gene relics that lack biological function should be called pseudogenes. Once a biological function of a gene relic is found, depending on whether the functionality involves the DNA sequence or its transcript, such a relic should be called either a regulatory element or a gene, respectively. In the present case, Makorin1-p1 is probably better called an RNA gene, rather than a pseudogene. Supplementary Material Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under accession numbers AY699800 to AY699807. Acknowledgments We thank David Webb for providing helpful comments on an earlier draft. This work was supported by a start-up fund of the University of Michigan and NIH grant GM67030 to J.Z. Literature Cited Balakirev, E. S., and F. J. Ayala. 2003. Pseudogene: are they junk or functional DNA? Annu. Rev. Genet. 37:123 151. Bard, J. A., S. P. Nawoschik, B. F. O Dowd, S. R. George, T. A. Branchek, and R. L. Weinshank. 1995. The human serotonin 5-hydroxytryptamine1D receptor pseudogene is transcribed. Gene 153:295 296. Board, P. G., M. Coggan, and D. M. Woodcock. 1992. The human Pi class glutathione transferase sequence at 12q13-q14 is a reverse-transcribed pseudogene. Genomics 14: 470 473. Brandt, P., M. Unseld, U. Eckert-Ossenkopp, and A. Brennicke. 1993. An rps14 pseudogene is transcribed and edited in Arabidopsis mitochondria. Curr. Genet. 24:330 336. Bustamante, C. D., R. Nielsen, and D. L. Hartl. 2002. A maximum likelihood method for analyzing pseudogene evolution: implications for silent site evolution in humans and rodents. Mol. Biol. Evol. 19:110 117. Currie, P. D., and D. T. Sullivan. 1994. Structure, expression and duplication of genes which encode phosphoglyceromutase of Drosophila melanogaster. Genetics 138:353 363. Furbass, R., and J. Vanselow. 1995. An aromatase pseudogene is transcribed in the bovine placenta. Gene 154:287 291.

Nonneutral Evolution of Pseudogene 2209 Gojobori, T., W. H. Li, and D. Graur. 1982. Patterns of nucleotide substitution in pseudogenes and functional genes. J. Mol. Evol. 18:360 369. Graur, D., Y. Shuali, and W. H. Li. 1989. Deletions in processed pseudogenes accumulate faster in murids than in humans. J. Mol. Evol. 28:279 285. Gray, T. A., K. Azama, K. Whitmore, et al. 2001. Phylogenetic conservation of the Makorin-2 gene, encoding a multiple zincfinger protein, antisense to the RAF1 proto-oncogene. Genomics 77:119 126. Harrison, P. M., D. Milburn, Z. Zhang, P. Bertone, and M. Gerstein. 2003. Identification of pseudogenes in the Drosophila melanogaster genome. Nucleic Acids Res. 32:1033 1037. Hasegawa, M., H. Kishino, and T. Yano. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol. 22:160 174. Hirotsune, S., N. Yoshida, A. Chen, L. Garret, F. Suglyama, S. Takahashi, K. Yagami, A Wynshaw-Boris, and A. Yoshiki. 2003. An expressed pseduogene regulates the messenger- RNA stability of its homologous coding gene. Nature 423: 91 100. Jeffs, P. S., and M. Ashburner. 1991. Processed pseudogenes in Drosophila. Proc. R. Soc. Lond. B Biol. Sci 244:151 159. Jeffs, P. S., E. C. Holmes, and M. Ashburner. 1994. The molecular evolution the alcohol dehydrogenase and alcohol dehydrogenase-related genes in the Drosophila melanogaster species subgroup. Mol. Biol. Evol. 11:287 304. Jukes, T. H., and C. R. Cantor. 1969. Evolution of protein molecules. Pp. 21 132 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York. Kimura, M. 1980. A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111 120. Korneev, S. A., J. Park, and M. O Shea. 1999. Neuronal expression of neural nitric oxidase synthase (nnos) protein is suppressed by an antisense RNA transcribed from an NOS Pseudogene. J. Neurosci. 19:7711 7720. Kumar, S., K. Tamura, I. B. Jakobsen, and M. Nei. 2001. MEGA2: molecular evolutionary genetics analysis software. Bioinformatics 17:1244 1245. Lee, J. T. 2003. Complicity of gene and pseudogene. Nature 423:26 28. Li, W. H., T. Gojobori, and M. Nei. 1981. Pseudogenes as a paradigm of neutral evolution. Nature 292:237 239. Li, W. H., C. I. Wu, and C. C. Luo. 1984. Nonrandomness of point mutation as reflected innucleotide substitutions in pseudogenes and its evolutionary implications. J. Mol. Evol. 21:58 71. Long, M., E. Betran, K. Thornton, and W. Wang. 2003. The origin of new genes: glimpses from the young and old. Nat. Rev. Genet. 4:865 875. Lundrigan, B. L., S. A. Jansa, and P. K. Tucker. 2002. Phylogenetic relationships in the genus Mus, based on paternally, maternally and biparantally inherited characters. Syst. Biol. 51:410 431. Petrov, D. A., 2001. Evolution of genome size: new approaches to an old problem. Trends Genet. 17:23 28. Petrov, D. A., and D. L. Hartl. 1999. Patterns of nucleotide substitution in Drosophila and mammalian genomes. Proc. Natl. Acad. Sci. USA 96:1475 1479.. 2000. Pseudogene evolution and natural selection for a compact genome. J. Hered. 91:221 227. Podlaha, O., and J. Zhang. 2003. Positive selection on proteinlength in the evolution of a primate sperm ion channel. Proc. Natl. Acad. Sci. USA 100:12241 12246. Pritchard, J. K., and S. W. Schaeffer. 1997. Polymorphism and divergence at a Drosophila pseudogene locus. Genetics 147:199 208. Robin, G. C. Q., R. J. Russell, D. J. Cutler, and J. G. Oakeshott. 2000. The evolution of an Æ-esterase pseudogene inactivated in the Drosophila melanogaster lineage. Mol. Biol. Evol. 17:563 575. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406 425. Singh, N., K. W. Barbour, and F. G. Berger. 1998. Evolution of transcription regulatory elements within the promoter of a mammalian gene. Mol. Biol. Evol. 15:312 325. Tajima, F. 1993. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135:599 607. Tamura, K., and M. Nei. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10: 512 526. Thompson, J. D., T. J. Gibson, F. Plewniak, F. Jeanmougin, and D. G. Higgins. 1997. The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 24:4876 4882. Torrents, D., M. Suyama, E. Zbonov, and P. Bork. 2003. A genome-wide survey of human pseudogenes. Genome Res. 13:2559 2567. Waterson, R. K., K. Lindblad-Toh, E. Birney, et al. 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520 562. Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 13:555 556. Zhang, J. 1999. Performance of likelihood ratio tests of evolutionary hypotheses under inadequate substitution models. Mol. Biol. Evol. 16:868 875. Zhang, J., H. F. Rosenberg, and M. Nei. 1998. Positive Darwinian selection after gene duplication in primate ribonuclease genes. Proc Natl Acad Sci USA 95:3708 3713. Dan Graur, Associate Editor Accepted August 2, 2004