FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts
Contents Preface xiii Introduction 1 CHAPTER 1 Genes, Genetic Codes, and Mutation 5 NUCLEOTIDE SEQUENCES 5 GENOMES AND DNAREPLICATION 8 GENES AND GENE STRUCTURE 9 Protein-coding genes 9 RNA-specifying genes 12 Posttranscriptional modifications of RNA 13 Untranscribed genes 13 Pseudogenes 14 AMINO ACIDS 15 PROTEINS 20 TRANSLATION AND GENETIC CODES 22 MUTATION 25 Substitution mutations 26 Recombination 29 Deletions and insertions 32 Inversions 35 Mutation rates 35 Spatial distribution of mutations 37 Patterns of mutation 38 Are mutations random? 38 FURTHER READINGS 38 CHAPTER 2 Dynamics of Genes in Populations 39 CHANGES IN ALLELE FREQUENCIES 40 NATURAL SELECTION 41 Codominance 43 Dominance 44 Overdominance and underdominance 45 RANDOM GENETIC DRIFT 47 EFFECTIVE POPULATION SIZE 52 GENE SUBSTITUTION 53 Fixation probability 54 Fixation time 55 Rate of gene substitution 57 vi
Contents vii GENETIC POLYMORPHISM 57 Gene diversity 57 Nucleotide diversity 58 THE DRIVING FORCES IN EVOLUTION 59 The neo-darwinian theory and the neutral mutation hypothesis 61 Testing the neutral mutation hypothesis 63 FURTHER READINGS 65 CHAPTER 3 Evolutionary Change in Nucleotide Sequences 67 NUCLEOTIDE SUBSTITUTION IN A DNA SEQUENCE 67 Jukes and Cantor s one-parameter model 68 Kimura s two-parameter model 71 NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO DNA SEQUENCES 74 Number of substitutions between two noncoding sequences 75 Substitution schemes with more than two parameters 77 Violation of assumptions 79 Number of substitutions between two protein-coding genes 79 Indirect estimations of the number of nucleotide substitutions 85 AMINO ACID REPLACEMENTS BETWEEN TWO PROTEINS 86 ALIGNMENT OF NUCLEOTIDE AND AMINO ACID SEQUENCES 86 Manual alignment by visual inspection 87 The dot matrix method 87 Distance and similarity methods 90 Alignment algorithms 94 Multiple alignments 97 FURTHER READINGS 98 CHAPTER 4 Rates and Patterns of Nucleotide Substitution 99 RATES OF NUCLEOTIDE SUBSTITUTION 100 Coding regions 101 Noncoding regions 105 Similarity profiles 107 CAUSES OF VARIATION IN SUBSTITUTION RATES 108 Functional constraints 108 Synonymous versus nonsynonymous rates 110 Variation among different gene regions 111 Variation among genes 113 Acceleration of nucleotide substitution rates following partial loss of function 115 Estimating the intensity of purifying selection 116 Mutational input: Male-driven evolution 117 POSITIVE SELECTION 119 Detecting positive selection 119 Parallelism and convergence 121 Prevalence of positive selection 123 PATTERNS OF SUBSTITUTION AND REPLACEMENT 123 Pattern of spontaneous mutation 124 Pattern of substitution in human mitochondrial DNA 127 Patterns of amino acid replacement 128 What protein properties are conserved in evolution? 130
viii Contents NONRANDOM USAGE OF SYNONYMOUS CODONS 132 Measures of codon-usage bias 132 Universal and species-specific patterns of codon usage 133 Codon usage in unicellular organisms 134 Codon usage in multicellular organisms 137 Codon usage and population size 139 MOLECULAR CLOCKS 139 RELATIVE RATE TESTS 142 Margoliash, Sarich, and Wilson s test 142 Tajima s 1D method 144 Tests involving comparisons of duplicate genes 145 LOCAL CLOCKS 146 Nearly equal rates in mice and rats 146 Lower rates in humans than in African apes and monkeys 147 Higher rates in rodents than in primates 148 EVALUATION OF THE MOLECULAR CLOCK HYPOTHESIS 150 Causes of variation in substitution rates among evolutionary lineages 151 Are living fossils molecular fossils too? 153 Primitive versus advanced : A question of rates 153 Phyletic gradualism versus punctuated equilibria at the molecular level 154 RATES OF SUBSTITUTION IN ORGANELLE DNA 155 Mammalian mitochondrial genes 157 Plant nuclear, mitochondrial, and chloroplast DNAs 157 Substitution and rearrangement rates 160 RATES OF SUBSTITUTION IN RNA VIRUSES 160 Estimation models 161 Human immunodeficiency viruses 162 FURTHER READINGS 163 CHAPTER 5 Molecular Phylogenetics 165 IMPACTS OF MOLECULAR DATA ON PHYLOGENETIC STUDIES 165 ADVANTAGES OF MOLECULAR DATA IN PHYLOGENETIC STUDIES 167 TERMINOLOGY OF PHYLOGENETIC TREES 167 Rooted and unrooted trees 169 Scaled and unscaled trees 169 The Newick format 170 Number of possible phylogenetic trees 170 True and inferred trees 173 Gene trees and species trees 174 Taxa and clades 176 TYPES OF DATA 177 Character data 177 Assumptions about character evolution 178 Polarity and taxonomic distribution of character states 180 Distance data 180 METHODS OF TREE RECONSTRUCTION 181 DISTANCE MATRIX METHODS 182 Unweighted pair-group method with arithmetic means (UPGMA) 183 Transformed distance method 185
Contents ix Sattath and Tversky s neighborsrelations method 186 Saitou and Nei s neighbor-joining method 189 MAXIMUM PARSIMONY METHODS 189 Weighted and unweighted parsimony 193 Searching for the maximum parsimony tree 194 MAXIMUM LIKELIHOOD METHODS 198 ROOTING UNROOTED TREES 200 ESTIMATING BRANCH LENGTHS 202 ESTIMATING SPECIES DIVERGENCE TIMES 204 TOPOLOGICAL COMPARISONS 206 Penny and Hendy s topological distance 206 Consensus trees 206 ASSESSING TREE RELIABILITY 208 The bootstrap 209 Tests for two competing trees 211 PROBLEMS ASSOCIATED WITH PHYLO- GENETIC RECONSTRUCTION 212 Strengths and weaknesses of different methods 214 Minimizing error in phylogenetic analysis 216 MOLECULAR PHYLOGENETIC EXAMPLES 217 Phylogeny of humans and apes 217 Cetartiodactyla and SINE phylogeny 225 The origin of angiosperms 228 MOLECULAR PHYLOGENETIC ARCHEOLOGY 230 Phylogeny of the marsupial wolf 232 Is the quagga extinct? 232 The dusky seaside sparrow 234 THE UNIVERSAL PHYLOGENY 237 The first divergence events 238 The cenancestor 243 Endosymbiotic origin of mitochondria and chloroplasts 245 FURTHER READINGS 247 CHAPTER 6 Gene Duplication, Exon Shuffling, and Concerted Evolution 249 TYPES OF GENE DUPLICATION 250 DOMAINS AND EXONS 250 DOMAIN DUPLICATION AND GENE ELONGATION 255 The ovomucoid gene 258 Enhancement of function in the 2 allele of haptoglobin 258 Origin of an antifreeze glycoprotein gene 260 Prevalence of domain duplication 262 FORMATION OF GENE FAMILIES AND THE ACQUISITION OF NEW FUNCTIONS 262 RNA-specifying genes 265 Isozymes 268 Opsins 269 DATING GENE DUPLICATIONS 271 GENE LOSS 273 Unprocessed pseudogenes 274 Unitary pseudogenes 275 Nonfunctionalization time 276 THE GLOBIN SUPERFAMILY 278 PREVALENCE OF GENE DUPLICATION, GENE LOSS, AND FUNCTIONAL DIVERGENCE 281
x Contents EXON SHUFFLING 283 Mosaic proteins 283 Phase limitations on exon shuffling 286 Exonization and pseudoexonization 289 Different strategies of multidomain gene assembly 290 THE INTRONS-EARLY VERSUS INTRONS-LATE HYPOTHESES 291 Intron sliding 292 The relative fraction of early and late introns 294 ALTERNATIVE PATHWAYS FOR PRODUCING NEW FUNCTIONS 294 Overlapping genes 294 Alternative splicing 296 Intron-encoded proteins and nested genes 299 Functional convergence 299 RNA editing 301 Gene sharing 302 MOLECULAR TINKERING 303 CONCERTED EVOLUTION 304 MECHANISMS OF CONCERTED EVOLUTION 308 Gene conversion 308 Unequal crossing over 309 Relative roles of gene conversion and unequal crossing over 312 DETECTION AND EXAMPLES OF CONCERTED EVOLUTION 313 The A and G -globin genes in the great apes 314 The concerted evolution of genes and pseudogenes 315 FACTORS AFFECTING THE RATE OF CONCERTED EVOLUTION 317 Number of repeats 318 Arrangement of repeats 318 Structure of the repeat unit 318 Functional requirement 319 Populational processes 320 EVOLUTIONARY IMPLICATIONS OF CONCERTED EVOLUTION 320 Spread of advantageous mutations 320 Retardation of paralogous gene divergence 321 Generation of genic variation 321 METHODOLOGICAL PITFALLS DUE TO CONCERTED EVOLUTION 322 FURTHER READINGS 322 CHAPTER 7 Evolution by Transposition 323 TRANSPOSITION AND RETROPOSITION 323 TRANSPOSABLE ELEMENTS 325 Insertion sequences 326 Transposons 327 Taxonomic, developmental, and target specificity of transposition 328 Autonomy of transposition 329 RETROELEMENTS 329 Retroviruses 330 Retroposons and retrotransposons 330 Retrons 333 Pararetroviruses 333 Evolutionary origin of retroelements 334 RETROSEQUENCES 336 Retrogenes 336 Semiprocessed retrogenes 338 Retropseudogenes 338 Sequence evolution of retropseudogenes 341
Contents xi LINES AND SINES 343 SINEs derived from 7SL RNA 344 SINEs derived from trnas 346 Where there s a SINE, there s a LINE 347 DNA-mediated transposable elements and transposable fossils 349 Rate of SINE evolution 349 GENETIC AND EVOLUTIONARY EFFECTS OF TRANSPOSITION 349 Hybrid dysgenesis 354 Transposition and speciation 357 Evolutionary dynamics of transposable element copy number 358 HORIZONTAL GENE TRANSFER 359 Horizontal transfer of virogenes from baboons to cats 361 Horizontal transfer of P elements between Drosophila species 363 Promiscuous DNA 365 FURTHER READINGS 366 CHAPTER 8 Genome Evolution 367 C VALUES 368 THE EVOLUTION OF GENOME SIZE IN PROKARYOTES 368 THE MINIMAL GENOME 371 The analytical approach 371 The experimental approach 373 GENOME MINIATURIZATION 374 Genome size reduction following endosymbiosis 374 Genome size reduction in parasites 375 GENOME SIZE IN EUKARYOTES AND THE C VALUE PARADOX 375 MECHANISMS FOR GLOBAL INCREASES IN GENOME SIZE 380 Polyploidization 380 Polysomy 382 The yeast genome 382 Polyploidy of the vertebrate genome 384 MAINTENANCE OF NONGENIC DNA 384 The hypotheses 386 The evidence 387 Why do similar species have different genome sizes? 388 THE REPETITIVE STRUCTURE OF THE EUKARYOTIC GENOME 389 Localized repeated sequences 390 Dispersed repeated sequences 392 Repetitive sequences as a cause of variation in genome size 394 MECHANISMS FOR REGIONAL INCREASES IN GENOME SIZE 395 GENE DISTRIBUTION 397 How many genes are there, where are they, and do we need them? 397 Gene number evolution 400 CHROMOSOMAL EVOLUTION 402 Chromosomes, plasmids, and episomes 402 Evolution of chromosome number in prokaryotes 402 Chromosome number variation in eukaryotes 403 MECHANISMS FOR CHANGES IN GENE ORDER AND GENE DISTRIBUTION AMONG CHROMOSOMES 404 Counting gene order rearrangement events 406
xii Contents Gene order rearrangements in bacteria 408 Gene order rearrangements in eukaryotes 410 Gene order as a phylogenetic character 411 GC CONTENT IN BACTERIA 412 CHIROCHORES 415 COMPOSITIONAL ORGANIZATION OF THE VERTEBRATE GENOME 417 The distribution of genes and other genetic elements among isochores 420 Origin of isochores 422 EMERGENCE OF NONUNIVERSAL GENETIC CODES 425 FURTHER READINGS 427 APPENDIX I Spatial and Temporal Frameworks of the Evolutionary Process 429 APPENDIX II Basics of Probability 437 LITERATURE CITED 441 INDEX 467 TAXONOMIC INDEX 479