Similar documents
Intraspecific gene genealogies: trees grafting into networks

Taming the Beast Workshop

How robust are the predictions of the W-F Model?

Estimating Evolutionary Trees. Phylogenetic Methods

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Concepts and Methods in Molecular Divergence Time Estimation

Inferring Speciation Times under an Episodic Molecular Clock

MtDNA profiles and associated haplogroups

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

An Evaluation of Different Partitioning Strategies for Bayesian Estimation of Species Divergence Times

Bustamante et al., Supplementary Nature Manuscript # 1 out of 9 Information #

Comparative Genomics II

The problem Lineage model Examples. The lineage model

Gene Genealogies Coalescence Theory. Annabelle Haudry Glasgow, July 2009

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Molecular Clocks. The Holy Grail. Rate Constancy? Protein Variability. Evidence for Rate Constancy in Hemoglobin. Given

BINF6201/8201. Molecular phylogenetic methods

EVOLUTIONARY DISTANCES

8/23/2014. Phylogeny and the Tree of Life

Fine-Scale Phylogenetic Discordance across the House Mouse Genome

Dr. Amira A. AL-Hosary

Mathematical models in population genetics II

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Phylogenetic Tree Reconstruction

Phylogenetic Networks, Trees, and Clusters

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Non-Parametric Bayesian Population Dynamics Inference

Coalescent based demographic inference. Daniel Wegmann University of Fribourg

Consensus Methods. * You are only responsible for the first two

Molecular Markers, Natural History, and Evolution

Hidden Markov models in population genetics and evolutionary biology

Selection against A 2 (upper row, s = 0.004) and for A 2 (lower row, s = ) N = 25 N = 250 N = 2500

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Evaluating Purifying Selection in the Mitochondrial DNA of Various Mammalian Species

Time dependency of foamy virus evolutionary rate estimates

Chapter 6 Linkage Disequilibrium & Gene Mapping (Recombination)

Bayesian Inference of Interactions and Associations

Motivating the need for optimal sequence alignments...

MOLECULAR SYSTEMATICS: A SYNTHESIS OF THE COMMON METHODS AND THE STATE OF KNOWLEDGE

Managing Uncertainty

Bayesian Phylogenetics

Frequency Spectra and Inference in Population Genetics

Estimating Recombination Rates. LRH selection test, and recombination

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

Similarity Measures and Clustering In Genetics

A Nonparametric Bayesian Approach for Haplotype Reconstruction from Single and Multi-Population Data

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Learning ancestral genetic processes using nonparametric Bayesian models

A Phylogenetic Network Construction due to Constrained Recombination

Lecture 22: Signatures of Selection and Introduction to Linkage Disequilibrium. November 12, 2012

From Individual-based Population Models to Lineage-based Models of Phylogenies

Calculation of IBD probabilities

A comparison of two popular statistical methods for estimating the time to most recent common ancestor (TMRCA) from a sample of DNA sequences

C3020 Molecular Evolution. Exercises #3: Phylogenetics

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

p(d g A,g B )p(g B ), g B

Phylogenetics. BIOL 7711 Computational Bioscience

Nature Genetics: doi: /ng Supplementary Figure 1. Icm/Dot secretion system region I in 41 Legionella species.

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

Today's project. Test input data Six alignments (from six independent markers) of Curcuma species

Non-Parametric Bayes

A (short) introduction to phylogenetics

Systematics - Bio 615

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Use of Y Chromosome and Mitochondrial DNA Population Structure in Tracing Human Migrations

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Calculation of IBD probabilities

Molecules consolidate the placental mammal tree.

Detecting selection from differentiation between populations: the FLK and hapflk approach.

Integer Programming in Computational Biology. D. Gusfield University of California, Davis Presented December 12, 2016.!

Computational Biology: Basics & Interesting Problems

Leber s Hereditary Optic Neuropathy

Phylogenetic inference

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Phylogenetic Analysis

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

A novel method with improved power to detect recombination hotspots from polymorphism data reveals multiple hotspots in human genes

Package LBLGXE. R topics documented: July 20, Type Package

Microsatellite data analysis. Tomáš Fér & Filip Kolář

6 Introduction to Population Genetics

Supplementary Information

The problem of Y-STR rare haplotype

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Bayesian Phylogenetics:

Quartet Inference from SNP Data Under the Coalescent Model

molecular evolution and phylogenetics

Processes of Evolution

Supporting Information Text S1

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Phylogeny and Speciation. Early Human Evolution and Migration. Mitochondrial Eve 2/15/17

Challenges when applying stochastic models to reconstruct the demographic history of populations.

Transcription:

that of Phylotree.org, mtdna tree Build 1756 (Supplementary TableS2). is resulted in 78 individuals allocated to the hg B4a1a1 and three individuals to hg Q. e control region (nps 57372 and nps 1602416526) was sequenced for 36 samples that could not be allocated to the known sub-clades of these two haplogroups, and 25 samples were further selected for complete mitochondrial genome sequencing. Using information from the full sequences, additional nucleotides were typed by Sanger sequencing to complete the haplogroup assignment. e 25 newly generated complete mtdna sequences were merged with published data (see Supplementary TableS4), and a Bayesian phylogenetic approach in BEAST 1.8.457 used to analyse two data sets comprising genomes belonging to hgs B4a1a1 (n 442) and M29/Q ( 111), respectively (Supplementary TableS4). e data sets were partitioned into the D-loop, rrna genes, and rst, second, and third, codon positions of the 13 protein-coding genes. Each data subset was assigned an independent model of nucleotide substitution, selected using the Bayesian information criterion in PartitionFinder 58. Four demographic models for the tree prior were compared: constant size, exponential growth, logistic growth, and Bayesian skyride coalescent59, together with two models of rate variation across lineages: strict clock and uncorrelated lognormal relaxed clock60. Marginal likelihoods were calculated using path sampling with 25 power posteriors, with samples drawn every 2M MCMC steps across a total of 50M steps. For the B4a1a1 analysis, the best combination was a strict clock with a logistic growth coalescent model. For the M29-Q analysis, the combination of strict clock and Skyride model is reported because the demographic model showed a clear change in population size. To calibrate the estimate of the timescale, a normal prior for the mutation rate was specied (mean 2.14 10 8 9 61 mutations/site/year, standard deviation 2.87 10 ). e posterior distributions of parameters, including the genealogy, were estimated using MCMC sampling. Samples were drawn every 5,000 steps over a total of 50 M MCMC steps. To check for convergence, each analysis was run in duplicate. Aer checking for acceptable MCMC mixing, the samples from the two runs were combined. Sucient sampling was checked by computing the eective sample sizes of all parameters, which were found to be greater than 200. To make a principal component analysis of the haplogroup B4a1a data, 442 complete Polynesian and Melanesian mtdna genomes used for the BEAST analysis (Supplementary TableS4) were combined with the additional complete or partially complete mtdna sequences from Melanesia54 ( 378), Hawaii55 ( 159), New Zealand Maori53 ( 20), and the remaining hg B4a1a1 haplotypes from Leeward Society Islands ( 55). Haplotypes were assigned to sub-clades of the hg B4a1a by the HaploGrep2 soware 62 and manual inspection of sequences. e resulting haplogroup frequencies were used to produce a population level PCA in R63 (Supplementary Fig.S8). Eighty-one male individuals from the Leeward Society Islands were genotyped for Y chromosome haplogroup specic SNPs by Sanger sequencing in a hierarchical manner, including new branch-dening SNPs from sub-clades O3i-B451 and O3i-B450. Unless otherwise stated, all nomenclature follows that of Karmin,.39 to avoid potential confusion. e Y chromosomes of nine individuals belonged to haplogroups typical of Europeans (G, J, and R) and were not subject to further analysis. In addition, the Y chromosomes of 49 Maori males sampled in New Zealand were genotyped (Supplementary TableS1A). ese samples were also hierarchically tested to a level of phylogenetic resolution equivalent to the main haplogroup level in the Leeward Society Islanders (Supplementary TableS3). e 72 Leeward Society Isles samples with non-european Y chromosomes, together with the 49 Maori samples, were further genotyped for 23 Y chromosome short tandem repeats (Y-STRs; Supplementary TableS3). Aer merging with comparative data from other sources 16,17,39,41,42 and excluding individuals with partial results, this produced a data set of 15 microsatellites and these were used it to construct phylogenetic networks of hgs C2a-M208 and K -M9 using the reduced median algorithm with the soware Network 4.6.1.1 soware 64 (Fluxus-Engineering). e same 15 microsatellites occurring on the C2a-M208 background were used to perform PCA in R63, in order to examine the relationship of eastern, western and outlier Polynesia populations for this key haplogroup within Polynesian Y chromosome diversity (Supplementary Fig.S10). Next, seven individuals belonging to hgs C2a1-P33 ( 4), K-M9 ( 1), O3i-B450 ( 1) and O6a-JST002611 ( 1) were selected for target-capture re-sequencing using the BigY service (Gene By Gene Ltd) (Supplementary TableS1A). e paired-end reads were mapped to the GRCh37 human reference sequence. e reconstruction and rooting of the phylogeny of the seven samples from the Leeward Society Islands used 56 sequences published in Karmin,.39 and 17 hg O individuals from the 1000 Genome Project65 (Supplementary TableS5). Aer ltering the data, the overlap between the data set was extracted and the re-mapping lter based on modeling the poorly mapped regions was applied, as described in Karmin,.39. is resulted in 6.2 Mbp of usable sequence of the non-recombining male-specic Y chromosome region, and sites with minimum 95% call rate were used in the analysis. A Bayesian phylogenetic approach in BEAST57 was used to analyse a nal data set comprising 7669 SNPs from these 80 individuals. To correct for ascertainment bias, we added constant sites corresponding to the nucleotide composition across the remainder of the chromosome. e four demographic models and other details of the settings used in the MCMC analyses matched those used for analyses of mtdna. e best-tting model was exponential growth, which had a log Bayes factor of 8.079 compared with the next-best model (Bayesian skyride). To calibrate the estimate of the timescale, a mutation rate of 8.71 10 10 mutations/site/year 66 was specied. A set of 713,014 SNPs was screened using the HumanOmniExpress-24 BeadChip array in 30 individuals from the Leeward Societies Isles (Supplementary TableS1A). Twenty-six samples yielded high genotyping success rates ( 5% missing genotypes), and 686,565 autosomal SNPs, with less than 5% missing data, were kept for further analyses. Inference of cryptic relationships between samples was performed using KING v. 1.467 and no rst-, second- or third-degree relatives were detected. A single sample clustered together with Europeans in the nestructure 37 run (see below), and was excluded from all population level analyses (Supplementary TableS1B).