Deciphering the Enigma of Undetected Species, Phylogenetic, and Functional Diversity. Based on Good-Turing Theory

Similar documents
APPENDIX E: Estimating diversity profile based on the proposed RAD estimator (for abundance data).

G.S. Gilbert, ENVS291 Transition to R vf13 Class 10 Vegan and Picante. Class 10b - Tools for Phylogenetic Ecology

An introduction to the picante package

Generating phylogenetic trees with Phylomatic and dendrograms of functional traits in R

Introduction to Occupancy Models. Jan 8, 2016 AEC 501 Nathan J. Hostetter

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Ecological Archives E A2

Thirty years of progeny from Chao s inequality: Estimating and comparing richness with incidence data and incomplete sampling

Package HierDpart. February 13, 2019

Estimating diversity and entropy profiles via discovery rates of new species

Other resources. Greengenes (bacterial) Silva (bacteria, archaeal and eukarya)

Package symmoments. R topics documented: February 20, Type Package

Package PVR. February 15, 2013

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2014 University of California, Berkeley

Studying the effect of species dominance on diversity patterns using Hill numbers-based indices

Distance-Based Functional Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers

Transitivity a FORTRAN program for the analysis of bivariate competitive interactions Version 1.1

Package Select. May 11, 2018

SLOSS debate. reserve design principles. Caribbean Anolis. SLOSS debate- criticisms. Single large or several small Debate over reserve design

Package SPECIES. R topics documented: April 23, Type Package. Title Statistical package for species richness estimation. Version 1.

Ants in the Heart of Borneo a unique possibility to join taxonomy, ecology and conservation

K1D: Multivariate Ripley s K-function for one-dimensional data. Daniel G. Gavin University of Oregon Department of Geography Version 1.

The biogenesis-atbc2012 Training workshop "Evolutionary Approaches to Biodiversity Science" June 2012, Bonito, Brazil

Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size

2/7/2018. Strata. Strata

CHAO, JACKKNIFE AND BOOTSTRAP ESTIMATORS OF SPECIES RICHNESS

FIG S1: Rarefaction analysis of observed richness within Drosophila. All calculations were

BIOS 230 Landscape Ecology. Lecture #32

BAT Biodiversity Assessment Tools, an R package for the measurement and estimation of alpha and beta taxon, phylogenetic and functional diversity

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Supplementary Materials for

The Royal Entomological Society Journals

Sampling e ects on beta diversity

Resampling Methods CAPT David Ruth, USN

Phylogenetic diversity and conservation

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

Phylogenetic diversity area curves

Phylogenetic analyses. Kirsi Kostamo

Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

R eports. Incompletely resolved phylogenetic trees inflate estimates of phylogenetic conservatism

STAT 3008 Applied Regression Analysis Tutorial 1: Introduction LAI Chun Hei

Phylogenetic Diversity and distribution patterns of the Compositae family in the high Andes of South America

Package spthin. February 20, 2015

Phylogenetic diversity area curves

Package WGDgc. October 23, 2015

Testing the spatial phylogenetic structure of local

A Primer of Ecology. Sinauer Associates, Inc. Publishers Sunderland, Massachusetts

Forum. Bringing the Eltonian niche into functional diversity. Oikos 127: , D. Matthias Dehling and Daniel B.

PhyloNet. Yun Yu. Department of Computer Science Bioinformatics Group Rice University

NICHE BREADTH AND RESOURCE PARTIONING

Estimating a Population Mean. Section 7-3

A.P. Biology CH Population Ecology. Name

Package covsep. May 6, 2018

Package WGDgc. June 3, 2014

Quantifying the relevance of intraspecific trait variability for functional diversity

Pattern to Process: Research and Applications for Understanding Multiple Interactions and Feedbacks on Land Cover Change (NAG ).

Community phylogenetics review/quiz

11 major glaciations occurred during the Pleistocene. As glaciers advanced and receded the sea level globally decreased and rose accordingly.

Lecture 2. A Review: Geographic Information Systems & ArcGIS Basics

Supporting Online Material for

Pairs a FORTRAN program for studying pair wise species associations in ecological matrices Version 1.0

(S1) (S2) = =8.16

Introduction to Statistics and R

Explanation and guidance for a decision-support tool to help manage post-fire Black-backed Woodpecker habitat

Outline Classes of diversity measures. Species Divergence and the Measurement of Microbial Diversity. How do we describe and compare diversity?

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Rank-abundance. Geometric series: found in very communities such as the

Title ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Summary and Conclusions

Mechanisms underlying local functional and phylogenetic beta diversity in two temperate forests

Fernando A. O. Silveira Universidade Federal de Minas Gerais, Brazil

RESERVE DESIGN INTRODUCTION. Objectives. In collaboration with Wendy K. Gram. Set up a spreadsheet model of a nature reserve with two different

Passing-Bablok Regression for Method Comparison

Stochastic dilution effects weaken deterministic effects of nichebased processes in species rich forests

Experimental Design and Data Analysis for Biologists

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

Competition: Observations and Experiments. Cedar Creek MN, copyright David Tilman

A Field Guide To The Ants Of New England By Aaron M. Ellison, Nicholas J. Gotelli Ph.D. READ ONLINE

Stochastic dilution effects weaken deterministic effects of niche-based. processes in species rich forests

Package HarmonicRegression

Polyploidy and Invasion of English Ivy in North American Forests. Presented by: Justin Ramsey & Tara Ramsey

K. Nagaraju Shivaprakash 1,2,3*, B. R. Ramesh 4, Ramanan Umashaanker 5 and Selvadurai Dayanandan 1,2

BIOMETRICS INFORMATION

Import Digital Spatial Data (Shapefiles) into OneStop

Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016

Relationship between weather factors and survival of mule deer fawns in the Peace Region of British Columbia

Community assembly in temperate forest birds: habitat filtering, interspecific interactions and priority effects

Part 7: Glossary Overview

Phylogenetic and functional diversity area relationships in two temperate forests

How to quantify biological diversity: taxonomical, functional and evolutionary aspects. Hanna Tuomisto, University of Turku

MS&E 226: Small Data

Package breakaway. R topics documented: March 30, 2016

OECD QSAR Toolbox v.4.0. Tutorial on how to predict Skin sensitization potential taking into account alert performance

Disentangling spatial structure in ecological communities. Dan McGlinn & Allen Hurlbert.

Seen once or more than once: applying Good Turing theory to estimate species richness using only unique observations and a species list

Sampling in Space and Time. Natural experiment? Analytical Surveys

GEOGRAPHIC INFORMATION SYSTEMS (GIS)

Lab Activity: The Central Limit Theorem

Transcription:

Metadata S1 Deciphering the Enigma of Undetected Species, Phylogenetic, and Functional Diversity Based on Good-Turing Theory Anne Chao, Chun-Huo Chiu, Robert K. Colwell, Luiz Fernando S. Magnago, Robin L. Chazdon, and Nicholas J. Gotelli Metadata S1: Description of Data Sets and Running Procedures for the R Code Good- Turing Data S1: species frequency/abundance data for 425 tree species (Magnago et al. 2014) As presented below, the first row in Data S1 shows the names of the two habitats ( Edge and Interior ) in the illustrative example considered in the main text. Then, beginning with the second row, there are three entries in each row: species name, this species sample abundance in the Edge habitat, and the sample abundance in the Interior habitat. There are 425 tree species records collected from two habitats from 11 forest fragments in Espírito Santo State, in south-eastern Brazil (Magnago et al. 2014). Edge Interior Habitat names Species names Acanthiza_lineata Carpotroche_brasiliensis 011 321 Astronium_concinnum 110 11 Astronium_graveolens 36 7 Spondias_macrocarpa 12 1...... Qualea_megalocarpa 3 3 Vochysia_angelica 2 3 Species abundances The R code Good-Turing for computing all the estimators discussed in the main text is available in Github (https://github.com/annechao). For readers without R background, we also provide the online software GoodTuring available from https://chao.shinyapps.io/goodturing/ to facilitate all computations. There are six main functions in the R code: (1) Function Richness(data) for estimating species richness in each individual assemblage given specified 1

species-by-assemblage abundance matrix data. (Species names are optional). (2) Function Shared_richess(data) for estimating shared species richness between two assemblages given specified species-by-assemblage abundance matrix data. (Species names are optional). (3) Function PD(data, tree) for estimating Faith PD in each individual assemblage given specified species-byassemblage abundance matrix data and a specified phylogenetic tree (in Newick format) spanned by all observed species. (Species names are required in species abundance data and must match those names in the specified phylogenetic tree.) (4) Function Shared_PD (data, tree) for estimating shared PD between two assemblages given specified species-by-assemblage abundance matrix data and a specified phylogenetic tree (in Newick format) spanned by all observed species. (Species names are required in species abundance data and must match those names in the specified phylogenetic tree.) (5) Function FAD(data, dis_matrix) for estimating FAD in each individual assemblage given specified speciesby-assemblage abundance matrix data and specified pairwise distance matrix of all observed species. (Species names are required in species abundance data and must match those names in the specified distance matrix. Also, the the ordering of species in abundance data should also be the same as that in the distance matrix.) (6). Function Shared_FAD(data, dis_matrix) for estimating shared FAD between two assemblages given specified species-by-assemblage abundance matrix data and specified pairwise distance matrix of all observed species. (Species names are required in species abundance data and must match those names in the specified distance matrix. Also, the the ordering of species in abundance data should also be the same as that in the distance matrix.) Running procedures First, copy and paste the R code Good-Turing available from the Github into the R Console. The following steps show how to run the R function Richness(data) to obtain the Chao1 species richness estimator and 95% confidence interval. The input data include a species-by-assemblage abundance matrix (species names are optional). The number of assemblages can be any positive integer. # The packages "knitr" must be installed and loaded before running the R function Richness() # Install Knitr package from CRAN install.packages( knitr ) # Import Knitr # Import Data S1 (species abundances) # Run R function Richness(data) to obtain the Chao1 richness estimate and 95% confidence interval via # log-transformation Richness(Brazil) 2

The output is shown below; see the main text and Table 2(a) of Chao et al. 2017 for notation and interpretation. The following steps show how to run the R function Shared_Richness(data, tree) to obtain the Chao1-shared species richness estimator and 95% confidence interval. The input data include a species-by-assemblage abundance matrix (species names are optional). The number of assemblages must be two. # The packages "knitr" must be installed and loaded before running the R function Shared_richness(). install.packages( knitr ) # Import Data S1 (species abundances) Table 2(b) # Run R function Shared_richness(data) to obtain the Chao1-shared richness estimate and 95% confidence # interval via log-transformation Shared_richness (Brazil) The output is shown below; see the main text and Table 2(b) of Chao et al. 2017 for notation and interpretation. Data S2: the phylogenetic tree (in Newick format) of 425 species The data file consists of the phylogenetic tree (in Newick format) of the 425 observed species listed in Data S1. The tree was constructed using the software Phylomatic (Webb and Donoghue 2005). The following steps show how to run the R function PD(data, tree) to obtain the Chao1-PD estimate and 95% confidence interval. The input data must include species names and the corresponding species-by-assemblage abundance matrix. The number of assemblages is allowed to be any positive integer. Species names in the abundance data must match those in the specified Newick-format phylogenetic tree. 3

# The packages "ade4", "phytools", "ape" and "knitr" must be installed and loaded before running the R # function PD() install.packages(c( phytools, ade4, knitr, ape )) library(phytools) library(ade4) library(ape) # Import abundance data and tree tree=read.newick("c:/users/stat_pc/desktop/datas2.txt") # Run R function PD(data, tree) to obtain the Chao1-PD estimate and 95% confidence interval via a # log-transformation PD(Brazil, tree) The output is shown below; see the main text and Table 3(a) of Chao et al. 2017 for notation and interpretation. The following steps show how to run the R function Shared_PD(data, tree) to obtain the Chao1-PD-shared estimate and 95% confidence interval. The input data must include species names and the corresponding species-by-assemblage abundance matrix. The number of assemblages is allowed to be any positive integer. Species names in the abundance data must match those in the specified Newick-format phylogenetic tree. # The packages "ade4", "phytools", "ape" and "knitr" must be installed and loaded before running the # R function Shared_PD(). install.packages(c( phytools, ade4, knitr, ape )) library(phytools) library(ade4) library(ape) # Import abundance data and tree tree=read.newick("c:/users/stat_pc/desktop/datas2.txt") # Run R function Shared_PD(data, tree) to obtain the Chao1-PD-shared estimate and 95% confidence # interval via a log-transformation Shared_PD(Brazil, tree) 4

The output is shown below; see the main text and Table 3(b) of Chao et al. 2017 for notation and interpretation. Data S3: Species pairwise Gower distance matrix of 425 species based on six species traits All the 425 observed species listed in Data S1 were described by a set of six functional traits, including five categorical variables: fruit size (size categories), seed size (size categories), fruit type, fruit dispersal syndrome, and successional group, together with one quantitative variable: wood density (Magnago et al. 2014). Based on these six traits, the species distance matrix in Data S3 was calculated by a Gower mixed-variables coefficient of distance with equal weights for all traits. The following steps show how to run the R function FAD(data, dis_matrix) to obtain the Chao1-FAD estimate and 95% confidence interval. The input data must include species names and the corresponding species-by-assemblage abundance matrix; the input dis_matrix denotes a species pairwise distance matrix. The number of assemblages is allowed to be any positive integer. Species names in species abundance data and must match those names in the specified distance matrix. Moreover, the ordering of species in abundance data should also be the same as that in the specified distance matrix. # The packages "knitr" must be installed and loaded before running the R function FAD(). install.packages( knitr ) # Import abundance data and distance matrix dis=read.table("c:/users/stat_pc/desktop/datas3.txt") # Run R function FAD(data, tree) to obtain the Chao1-FAD estimate and 95% confidence interval # via a log-transformation FAD(Brazil, as.matrix(dis)) The output is shown below; see the main text and Table 4(a) of Chao et al. 2017 for notation and interpretation. The following steps show how to run the R function Shared_FAD(data, dis_matrix) to obtain the Chao1-FADshared estimate and 95% confidence interval. The input data must include species names and the corresponding species-by-assemblage abundance matrix; the input dis_matrix denotes a species pairwise distance matrix. The number of assemblages is allowed to be any positive integer. Species names in species abundance data and must match those names in the specified distance matrix. Moreover, the ordering of species in abundance data should also be the same as that in the specified distance matrix. 5

# The packages "knitr" must be installed and loaded before running the following function. install.packages( knitr ) # Import abundance data and distance matrix dis=read.table("c:/users/stat_pc/desktop/datas3.txt") # Run R function Shared_FAD(data, tree) to obtain the Chao1-FAD-shared estimate and 95% # confidence interval via a log-transformation based on 200 bootstrap replications Shared_FAD(Brazil, as.matrix(dis)) The output is shown below; see the main text and Table 4(b) of Chao et al. 2017 for notation and interpretation. NOTE: (1) A standard approximation method (i.e., delta-method) is applied to obtain a variance estimate and the associated 95% confidence interval via a log-transformation for the estimator of species richness, shared species richness, PD, shared PD, and FAD. (2) For the shared FAD estimator, a bootstrap method with 200 replications is applied to obtain a variance estimate and the associated 95% confidence interval via a log-transformation. Because of the random bootstrapping process with 200 replications, the estimated 95% confidence interval will vary slightly each time the same abundance and distance matrix data are imported. References Chao, A, C.-H. Chiu, R. K. Colwell, L. F. S. Magnago, R. L. Chazdon, and N. J. Gotelli, N. J. 2017. Deciphering the Enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory. Ecology. Magnago, L. F. S., D. P. Edwards, F. A. Edwards, A. Magrach, S. V. Martins, and W. F. Laurance. 2014. Functional attributes change but functional richness is unchanged after fragmentation of Brazilian Atlantic forests. Journal of Ecology 102:475 485. Webb, C. O., and M. J. Donoghue. 2005. Phylomatic: tree assembly for applied phylogenetics. Molecular Ecology Notes 5:181 183. 6