Ecography. Supplementary material

Size: px

Start display at page:

Download "Ecography. Supplementary material"

Elinor Heath
5 years ago
Views:

1 Ecography ECOG Dambros, C. S., Moraris, J. W., Azevedo, R. A. and Gotelli, N. J Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest. Ecography doi: / ecog Supplementary material

2 APPENDIX 1 DETAILED DESCRIPTION OF METHODS AND SUPPLEMENTARY RESULTS Text A1 Details on the development of a termite sampling protocol for ecological studies We used the data collected at Ducke Forest, central Amazonia, Brazil, to establish a sampling design for ecological studies of termites that would maximize the power of statistical tests. The results obtained were used to develop a sampling protocol to be used in other parts of Amazonia. At Ducke Forest, termites were sampled in m long transects. The 250m long transects are used for Long Term Ecological Research (LTER) and are ideal to sample termites and other groups, and to compare their distribution (Magnusson et al. 2005). The transects followed the elevation isocline to minimize variation in edaphic conditions within transects. The sampling protocol used within each transect was modified from Jones and Eggleton (2000). In each transect, 10 sections of 2x5m were sampled and termites were searched in logs, branches, leaves, and soil for 20 minutes by 3 investigators. The sampling yield a total sampling effort of 300 sections searched. Each transect was used as a sampling unit in regression analyses testing for the association of soil variables and termite species composition. Termite species composition was measured as the Bray- Curtis dissimilarity index between all pairs of transects. A Principal Coordinates Analyses (PCoA) was performed using the Bray-Curtis dissimilarity matrix and the first axis of PCoA was used as response variable in regression models. The weakest association detected was of termite species composition against soil bases (R 2 = 0.16; P = 0.03). Soil clay content was the stronger association detected (R 2 = 0.42; P < 0.01). Termite occurrence and species richness was associated only with ant predator density (Dambros et al. 2016). To develop the sampling design, we have considered the trade-off in costs of sampling a given number of transects and the number of sections surveyed within each transect. Considering that a research project has resources to sample a given number of sections, we determined the distribution of these sections that would maximize statistical power (ie. probability of detecting an association when it exists). For example, if one has the resources and time to sample 150 sections, what is the best arrangement of sections that maximize statistical power? Intensively sample 15 transects with 10 sections each, or spread the sections in 30 transects with 5 sections each? Of course, the costs of sampling two sections in a single transect is not the same as sampling one section in two transects. We would need to include several other aspects to truly establish the most cost effective sampling design. The costs of sampling additional sections within a transect or additional transects probably differ among studies considering the logistics of sampling areas, etc. However, in our study we are considering LTER sites, where trails and transects were pre-established, so most of the logistic costs are not relevant for this study. In other words, in our study, sampling four sections in one transect or two transects with two sections each have about the same cost. It is also important to note that our transects followed the elevation isocline (not necessarily linear), and there is little environmental variation within each transect. Therefore, the similarity in termite species richness and composition between two sections within a transect is higher in our study than in other termite studies (eg. Jones and Eggleton 2000; Davies et al. 2003). This is relevant because sections within a transect are highly redundant in our study and provide similar information, whereas in other studies sections can be complementary. To test the effect of the reduction in the number of transects and sections on the association of termite species composition and soil bases and soil clay content, we rarefied the number of transects and sections sampled per transect. We then re-run all analyses using the reduced dataset 1000 times and determined if the association would be detected at an alpha level of The rarefaction of sections is similar to what was conducted by Jones and Eggleton (2000), but in our study we are not evaluating the

3 number of species sampled, but the power of statistical tests commonly used in ecological studies. For both soil bases and soil clay content, higher statistical power was detected when increasing the number of transects and sections sampled (Fig. A1). However, the increase in the number of transects had a stronger influence in the power of statistical tests than the number of sections sampled per transect (Fig. A1). For example, sampling 150 sections spread in 25 transects (6 sections each) would lead to the detection of the association between soil bases and termite species composition ~40% of the time. In contrast, the association would be detected only ~20% of the time when sampling 150 sections spread in 15 transects (10 sections each). Statistical power is still relatively high when sampling only one section per transect if 30 transects (or more) are sampled. Our results support the sampling of fewer than 10 sections per transect in ecological studies of termites when multiple transects (>30) are sampled. However, the particularities of our sampling design should be taken into account when designing future studies.

4 Text A2 - Detailed description of rarefaction procedure applied to individual transects with more than five sections. To calculate the abundance of each species expected by sampling five sections in those transects with more than five sections, we divided the species abundances by the number of sections sampled in a given transect. This measurement represents the density of termites from a particular species in a transect. For example, a species with abundance of 10 colonies in a transect where 10 sections were sampled has a density of 1 colony per section. To obtain the abundance expected in five sections, we multiplied the species density in a given transect by five. The expected abundance for all species within a transect was measured as the sum of the expected abundances for individual species. To calculate the probability of a species to occur in a given transect by sampling only five sections, or the expected presence of a particular species in a given transect, we derived the following formula: where N represents the number of sections surveyed, N i represents the number of sections where species i was present, and n represents the number of sections to be subsampled (in our case n = 5 for all transects). The code to run this calculation in R is 1-(factorial(N-n)/factorial(N))*(factorial(N-Ni)/factorial(N-Ni-n)) Note that this formula calculates the number of species that would be sampled in 5 sections in a single try (without replacement). This calculation is different from sequentially sampling one section, replacing it, and repeating the procedure until five sections were obtained. In the later case, the calculation would be simply, The estimated species richness per transect was calculated as the sum of the probability of occurrence for all species sampled in each transect, or The results obtained using these formulas provide the same results as randomly selecting five sections in each transect, and recording the species abundances, species richness, and presence and absence for each species. To demonstrate this, we randomly selected only five sections in all transects (rarefaction), and used measures, such as termite abundance, obtained in five sections for analyses. The random selection of sections was repeated 999 times for each transect, and the mean abundance, mean

5 species richness, and mean abundance per species was recorded. Note that for transects where only five sections were sampled, the results from rarefaction are identical to the observed values because there is only one possible combination of five sections that could be selected in a randomization. For transects with more than five sections, the resulting values represented the average values that would be obtained by sampling only five transects. This procedure should not change the type I error rate of our analyses, but should increase the statistical power of our models (compared to sampling only five sections in all transects) because the values obtained in those transects with more than five sections represent a more precise measurement, closer to the true expected value.

6 Text A3 - Detailed description for the construction of Moran Eigenvector Maps and associated weighting matrix, w. In our study, two sampling designs were used. In each of 12 grids within the Amazonian forest, we sampled from five to 32 transects spaced regularly in intervals of 1 km. The transects were organized within regular grids, whereas the grids had an irregular distribution (Fig 1 in main text). We determined that transects within a grid should be much more connected than transects in distinct grids. The idea in our procedure was to represent a local community within a grid, and a metacommunity among grids in a hierarchy. We established that 1) transects close to each other within a grid would be connected; and 2) that the connectivity (eg. dispersal probability) between two transects within a grid would be equal to the connectivity of a transect with all transects outside the grid combined. The connectivity matrix between pairs of transects within a grid was created by connecting each transect to all its adjacent transects in a radius of (Moore neighborhood; 1 if connected, zero otherwise; Fig. 1b in manuscript). We then multiplied the within grid connectivity matrix by 1/(1 + n i ), where n i represents the number of neighbors to which a given transect is connected to. We added 1 in the denominator because each transect was later connected to other transects outside the grid (Fig. A2). The connectivity between grids was determined by a Gabriel graph (Legendre and Legendre 2012) and was used to determine the connectivity between pairs of transects in distinct grids (1 if connected, zero otherwise). The matrix of connectivity between transects in distinct grids was then multiplied element-wise by 1/[(1 + n i )g j ], where g j represents the number of transects sampled in the grid where a given transect is located. Finally, we summed both matrices to obtain w. If we had only two grids with 2 transects each, the connectivity between the transects within a grid would be 1/(1+1), or 0.5. The connectivity of two transects in distinct grids would be 1/[(1+1)2], or Moran Eigenvector Maps construction and selection To create MEMs, we run an eigen analyses on the final connectivity matrix w. The eigen analysis generated 197 vectors representing spatial autocorrelation from broad to fine spatial scales, which were determined from their associated eigenvalues (large and small eigenvalues represent broad and fine spatial autocorrelation, respectively; Dray et al. 2012). To reduce the number of vectors to be included in our models, we performed two further steps. First, we assessed the spatial autocorrelation of MEMs by calculating Moran's I, and selected only MEMs significantly correlated with the geographical distance separating transects (Dray et al. 2012). Second, we created a regression or RDA model, when appropriate, using only MEMs as predictor variables of termite abundance, species richness, and species composition (PCoA axis using the Bray-Curtis dissimilarity matrix). We then run a forward stepwise selection of MEMs based on the adjusted R 2 of the model (Dray et al. 2012; Legendre and Gauthier 2014). This procedure was conducted independently for each response variable, and the final number of MEMs depended on the explanatory power of each MEM for a particular variable. Note that because MEMs are orthogonal and independent, the inclusion of all MEMs in the analyses would explain 100% of the variation in any response variable. So, including all MEMs would not be very informative. After the forward selection, the selected MEMs were divided into two groups: Broad and fine scale MEMs. Finally, we applied a variance partitioning approach to separate the portion of variance in the response variable explained by 1) spatial autocorrelation in species distribution that could be a result of limited dispersal in fine scales; 2) spatial autocorrelation in species distribution that could be a result of limited dispersal in broad scales; 3) species association with environmental variables spatially structured in fine scales; 4) species association with environmental variables spatially structured in broad scales; 5) species association with non spatially structured variables; and 6) residual variation.

7 Figure A1. Probability of detecting an association of termite species composition with soil bases and soil clay content when the number of transects and the number of sections within each transect is rarefied. Termite species composition was measured as the first Principal Component of a Principal Coordinates Analysis (PCoA) on the Bray-Curtis pairwise dissimilarity matrix. Arrows represent two scenarios for spreading sections in transects with the same sampling effort (measured as the overall number of sections). Highest power is obtained by increasing the number of transects sampled. See main text for details on the measurements of soil bases and soil clay content.

Figure A2 Biplot based on a distance-based Redundancy Analysis (db-rda) representing the association of termite species composition (response variable)

Termite species composition was measured using the abundance balanced component of the Bray-Curtis dissimilarity index (Baselga 2013; summarized in

8 Figure A2 Biplot based on a distance-based Redundancy Analysis (db-rda) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the abundance balanced component of the Bray-Curtis dissimilarity index (Baselga 2013; summarized in PCoA axes in db-rda analysis). Polygons represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.

variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data.

9 Figure A3. Biplot based on a Non-metric Multidimensional Scaling analysis (NMDS) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the Bray-Curtis dissimilarity index (summarized in NMDS axes in NMDS analysis). Polygons in (a) and (c) represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.

10 References Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4, Dambros, C.S., Morais, J.W., Vasconcellos, A., Souza, J.L.P., Franklin, E. & Gotelli, N.J. (2016) Association of ant predators and edaphic conditions with termite diversity in an amazonian rain forest. Biotropica. Davies, R.G., Hernández, L.M., Eggleton, P., Didham, R.K., Fagan, L.L. & Winchester, N.N. (2003) Environmental and spatial influences upon species composition of a termite assemblage across neotropical forest islands. Journal of Tropical Ecology, 19, Dray, S., Legendre, P. & Peres-Neto, P.R. (2006) Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling, 196, Jones, D.T. & Eggleton, P. (2000) Sampling termite assemblages in tropical forests: testing a rapid biodiversity assessment protocol. Journal of Applied Ecology, 37, Magnusson, W.E., Lima, A.P., Luizão, R., Luizão, F., Costa, F.R., CAStIlHO, C.V. de & Kinupp, V.F. (2005) RAPELD: a modification of the Gentry method for biodiversity surveys in long-term ecological research sites. Biota neotropica, 5,

11 Appendix 2

12 Diversity and composition of termites in Amazonia CSDambros 19 October, 2016 Abstract This document describes the analyses conducted in the manuscript about termite distribution in Amazonia submitted to Ecography. Environmental data were previously extracted from rasters, and missing soil data were inputed as explained in the main text. All datasets used are publicly available on-line, and the R code provides links for their individual download. Contents 1 Load required packages 4 2 Import data 4 3 Dealing with missing values 5 4 Select only predictors of interest 6 5 Create site x species matrix Calculate species overall abundance and species richness Calculate similarity matrices MEMs construction Extract geogrpahical coordinates for transects and grids Create connectivity matrix Transect-transect within grids Transect-transect among grids Merge local and regional dispersal into a single matrix Create Moran Eigenvectors Maps (MEMs) Selection of MEMs with spatial autocorrelation Analyses (species richness and composition analyzed separatelly) Species richness Forward selection of MEMs Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) Regressions after the removal of spatial autocorrelation Variance partitioning

13 7.2 Species composition Forward selection of MEMs RDA analysis - Species composition Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Apendix: Additional Analyses Species composition using Non-Metric Multidimensional Scaling (NMDS) Run NMDS and extract scores Forward selection of MEMs Simple regression (without accounting for spatial autocorrelation) Regression after the removal of spatial autocorrelation Biplot of NMDS - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using only turnover component (nestedness removed) Calculate simpson s dissimilarities (turnover component of Sorensen) Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using balanced abundances (turnover component of bray Curtis) Calculate balanced dissimilarities (turnover component of Bray) Forward selection of MEMs RDA analysis simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning

14 Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using RDA on hellinger transformed termite data (not db-rda) Hellinger transformation Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation)

15 1 Load required packages Some of the analyses ran in our study were transformed in functions which are publicly available on-line. The following lines load the required libraries, and download and source the necessary functions library(vegan) installed using install.packages("vegan") library(spdep) installed using install.packages("spdep") library(sp) installed using install.packages("sp") library(spacemaker) # installed using: #install.packages("tripack") #install.packages("spacemaker",repos=" source(" 2 Import data Similarly to the functions, we made our datasets publicly available on-line. They are downloaded using the next chunk of code. If you have problems running these lines, you might have an Internet connection problem (proxies can be a problem when trying to download things directly from R). Try to download the files to your directory using the provided links, and then read the files directly from your folder by changing the path to file (eg. instead of use c://file_path). Some files imported here are not original datasets. These data were transformed to facilitate analyses. For example, environmental data from rasters were extracted previously to these analyses and were incorporated into the main environmental dataframe. The original datasets, as well as scripts with data transformation are available under request to the first author. # Termite data # Import termite record data isoptera<-read.csv(" # Import termite species trait data isopterataxonid<-read.csv(" # Import environmental data env<-read.csv(" head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast

16 4 I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA P P.input 1 NA NA NA NA NA NA Dealing with missing values Some soil variables were not available for all transects. Removing these data from the analyses could prevent the detection of the association of other variables with termite species richness and composition. For example, removing data from the Jaú National Park, a distinct biogeographic region with extreme values for temperature and precipitation could affect the association of termite species composition with rivers and climate. To overcome the problem of having missing data, we performed data inputation - filling missing data with values. For those transects with missing values, we randomly selected values from other transects. 1 Grids with environmental data were sampled in all sampling regions of the study, so data inputation was spread across the study region. Moreover, 137 transects had all environmental data available, and climatic variables, tree cover, and biogeographic information were available for all transects, so data inputation was not necessary for these variables. # Input missing soil data set.seed(102) # Guarantees the results will be exactly the same every time # For each column, replace missing values with a random sample from non-missing values for(i in 9:ncol(env)){ # Create random sample from non-missing entries sample.env<-sample(env[!is.na(env[,i]),i],size = nrow(env),replace = TRUE) env[,i]<-ifelse(is.na(env[,i]),sample.env,env[,i]) # Replace NAs with random sample } head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam This procedure will certainly add noise to the data and reduce the power of statistical tests (similar to the removal of sampling areas). The removal of areas with missing soil data provided similar results 5

17 5 campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca P P.input Select only predictors of interest To facilitate analyses, we create a new dataframe with only those variables used as predictor variables in the sttatistical models. This makes the following code more simple, especially when all variables are used in a regression model. # Select only variables to be used as predictors, combine and log transform variables predictors<-data.frame(env[,c("temp","prec","treecover")], lnp=log(env$p+1),lnbases=log(env$k+env$mg+env$ca+1),clay=env$clay) # Define the varible rivers (biogeographic region) rivers<-factor(env$region, levels=c("guianawest","guianaeast","negro","inambari","rondonia")) # Standardize predictor variables predictors.std<-decostand(predictors,"standardize") # Visualize the first rows of predictors head(predictors) Temp Prec TreeCover lnp lnbases Clay

18 Create site x species matrix The termite data provided is in the long format, that is, each occurrence is a row in the original spreadsheet. Species, sampling location, etc, are attributes (columns) in this spreadsheet. To run our analyses, we started by transforming the termite data into a short table, where each row represents a sampling location and each column represents a species. Note that long format contains more information and is preferable for storage. Before starting, the variables PlotID and TaxonID in table isoptera were reordered to be in the same order than these variables in the environmental table (env) and species info table (isopterataxonid). The number of subsamples sampled within each transect was also recorded # Make sure transects in termite dataset are in the same order as #in environmental dataset isoptera$plotid<-factor(isoptera$plotid,levels=env$plotid) # Make sure species in termite dataset are in the same order from species dataset isoptera$taxonid<-factor(isoptera$taxonid,levels=isopterataxonid$taxonid) attach(isoptera) # Create transect X species matrix termite.plot.obs<-tapply(isoptera$n,list(plotid,taxonid),sum) termite.plot.obs[is.na(termite.plot.obs)]<-0 # NAs are true zeros effort.plot<-tapply(n_subplots,plotid,mean) # Number of sections sampled per transect detach(isoptera) The number of subsamples in each transect varied as can be checked by the following code range(effort.plot) [1] 5 12 To avoid comparing transects with different sampling effort, transects with more than 5 sections were rarefied. In other words, 5 sections were randomly selected in each transect (without replacement), and the average abundance of individual species, probability of occurrence (average presence), species overall abundance, and species richness were calculated. If all transects had only 5 sections, the results would be exactly the same for the abundance and presence/absence matrices, species abundance per transect, and species richness per transect. Abundance matrix termite.plot<-(5*termite.plot.obs)/as.vector(effort.plot) Presence-Absence matrix (probability of occurrence) 7

19 #With replacement #termite.plot.pa<-1-((1-termite.plot.obs/as.vector(effort))^5) # Without replacement (as simulated and used;ignore possible warnings) termite.plot.pa<-poccur(as.vector(effort.plot),termite.plot.obs,5) # NaNs are 1s termite.plot.pa[is.nan(termite.plot.pa)]< Calculate species overall abundance and species richness Now that the matrix of estimated species abundance and presence/absence is calculated, it is ease to calculate the expected overall abundace and species richness for all species. The procedure is the same as if we were not rarefing the community: Overall species abundance is the sum of abundance for all species, and the expected species richness is the sum of the probability of occurrence of all individual species. Total abundance per transect # The sum of individual abundances is overall abundance termite.n<-rowsums(termite.plot,na.rm=true) Total species richness per transect # The sum of presences is species richness termite.s<-rowsums(termite.plot.pa,na.rm=true) 5.2 Calculate similarity matrices To quantify the changes in termite species composition, the Bray-Curtis pairwise dissimilarity matrix was calculated using the vegdist function. We added a column to the site x species dataframe with 1s in all entries so that sites without any shared species are not considered completely dissimilar, and sites without any species can also be included. Composition termite.bray<-vegdist(cbind(termite.plot.pa,1),"bray")# Used in dbrda analyses termite.pcoa<-cmdscale(termite.bray,k=2,add = TRUE) 6 MEMs construction This section can be skipped if the reader is not interested in the particularities of the sampling design used in this paper. A more general and simpler way to construct MEMs is provided in Dray et al. (2012). In this paper, transects were nested within grids, and this nested design was used to create an overall connectivity matrix. The connectivity matrix represents two hierarchical levels, and assume that transects within a grid are much more connected to each other than transects in separate grids. Moreover, all transects within a grid have the same connectivity with a transect in another grid. The hierarchical matrix was designed to represent local communities (within grid), and a broad metacommunity (between grids). 8

20 6.1 Extract geogrpahical coordinates for transects and grids In this step we will start preparing the data for the construction of Moran Eigenvector Maps. The first step is to create a matrix with two columns representing the spatial coordinates of the sampling transects (eg. LatLong). We then aggregated the coordinates of transects within each grid. To obtain the coordinates of the grid, the mean of the coordinates of individual transects in the grid was calculated. coords<-env[,c("utm_easting","utm_northing")] # The same as LongLat but in UTM (meters) regional.coords<-aggregate(coords,list(env$gridid),mean)# Spatial Coordinates for grids 6.2 Create connectivity matrix Because we wanted to treat differently the transects within the same grid from those in distinct grids (such as in communities within a metacommunity), we started by creating two connectivity matrices: transect-transect within individual grids and transect-transect between grids. Note that if you have a simpler sampling design, you can create a single matrix representing the pairwise connectivity between all pairs of sites, and then run the eigen analysis on it (see R code in Dray et al. 2012). However, using this simple method in our analyses would produce a connectivity matrix in which a single transect from a grid is connected to a single transect in the adjacent grid. Moreover, some transects would be more connected to other transects in other grids than to other transects within the same grid Transect-transect within grids In the within matrix, all transects that are separated by less than 1700 meters are connected. This magic number is just any number greater than the distance between transects in the diagonal (Hypotenuse = square root of (1km in the side + 1km in the other side); see map in the main article). This number can be referred as the truncation treshold in the literature. # Create dispersal matrix to represent spatial autocorrelation Local communities # Connect plots that are less than 1700 meters apart, but not self local.nb.mat<-((as.matrix(dist(coords))<1700&as.matrix(dist(coords))>5)*1) # Prob of leaving a plot to metacom = prob of leaving to another plot within grid pleave<-1/(colsums(local.nb.mat)+1) # probs to disperse to neighboring plots within grid within<-t(local.nb.mat/(colsums(local.nb.mat)+1)) Transect-transect among grids The between grid connectivity matrix is similar to the within grid, but the criteria to connect transects is different. Now there is an extra step to create a Gabriel graph that will inform which grids are connected to one another. # Regional community (dispersal between grids of plots) # determine in which grid (hub in metacomm), a particular transect is reg.hub<-as.integer(env$gridid) 9

21 # Use Delaunay triangulation to create metacommunity structure (requires spdep package) regional.nb<-graph2nb(gabrielneigh(as.matrix(regional.coords[,2:3])), sym=t,row.names = regional.coords[,1]) # Calculate the number of neighbors a grid (hub) has reg.nlink<-sapply(regional.nb,length) # Create a matrix of 1s & 0s of dispersal from transects to grids regional.nb.mat<-sapply(regional.nb,function(x){(1*matrix(reg.hub%in%as.integer(x)))}) # Create matrix of dispersal from plot to plot through the metacommunity regional.nb.mat1<-t(t(regional.nb.mat)*(1/reg.nlink))[,reg.hub] # Define how many plots each grid is connected to hub.out<-(1/(table(reg.hub)))[reg.hub] # Final probability of dispersal plot-plot in metacommunity between<-matrix(hub.out,nrow(env),nrow(env))*regional.nb.mat1 between.nb<-sapply(as.data.frame((t(pleave*t(between)))),function(x)(1:length(x))[x!=0]) attributes(between.nb)<-list(class="nb",sym=true) Merge local and regional dispersal into a single matrix In a final step, the within and between connectivity matrices were merged. In this final matrix, pairs of transects were connected by being in the same grid, or by being in distinct grids that were close to each other. We determined that the probability of an individual to leave the grid would be the same as the probability to move from one transect to the next within the grid. 2 # merge local and regional: # p(plot2plot)=p(leave & arrive by metacomm hub) or p(stay in grid and move) plot2plot<-(t(pleave*t(between)))+within # transform into a list removing zeros (sparse matrix) plot2plot.nb<-sapply(as.data.frame(plot2plot),function(x)(1:length(x))[x!=0]) # Transform into a class nb (spdep), just for plotting attributes(plot2plot.nb)<-list(class="nb",sym=true) After all this hard work, here is a graph showing the between and within grid connectivity. Remember that all this complexity was very particular for this study, and other sampling designs and other ecological questions might require different approaches. 6.3 Create Moran Eigenvectors Maps (MEMs) Now that a single connectivity matrix W was created, W needs to be centered, and then an eigen analysis using this matrix generates multiple vectors (called Moran Eigenvector Maps). We used the functions in the spdep and spacemaker packages to do this. 2 This procedure is different from the usual in PCNM analyses. In most studies, the connectivity is 1 divided by the total number of connections, so that the connectivity of one transect to any other is the same. Here, the connectivity of one transect to another within the grid is 1/(neighbors in the grid)+1, whereas the connectivity of one transect to another in another grid is 1/(neighbors in same grid + transects in the other grid). 10

22 Figure 1: Connectivity of all transects with detail for one of the sampling grids Calculate Moran EigenVectors representing spatial autocorrelation glist<-sapply(as.data.frame(plot2plot),function(x)x[x!=0])# Create spatial weights list lw1 <- nb2listw(plot2plot.nb,glist)# transform nbb obj into listw (with spatial weights) MEMs <- scores.listw(lw1)#calculate Eigenvalues and vectors for w listw not symmetric, (w+t(w)) used in the place of w 6.4 Selection of MEMs with spatial autocorrelation Because the eigen analysis generated 197 vectors that could be individualy used as predictor variables in regression and RDA models, we selected only those vectors with significant spatial autocorrelation. Note that the alternative hypothesis for MEMs with positive and negetive associated eigenvalues is different. This happens because the vectors with negative eigenvalues represent negative spatial autocorrelation (so we are interested in the probability of getting values so low or even lower than observed). # Detect if moran eigenvectors have significant spatial autocorrelation pvals.less<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="less",reps=1000) pvals.greater<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="greater", reps=1000) pvals<-ifelse(mems$values<0,pvals.less,pvals.greater) #pvals<-pvals.greater # When using only positive eigenvalues # Select only vectors with significant spatial aucorrelation 11

23 MEMs.signif.vec <- as.data.frame(mems$vectors[,pvals<0.01]) MEMs.signif.val <- MEMs$values[pvals<0.01] names(mems.signif.val)<-colnames(mems.signif.vec) 7 Analyses (species richness and composition analyzed separatelly) In the previous section MEMs were created to represent spatial structure in the data. These MEMs are vectors (similar to vectors representing environmental variables, lat, long, etc) that can be incorporated into any classical statistical analysis, such as regressions. In this section, we show how the analyses using MEMs as co-variates are performed to test the effect of the environment and space on species richness and composition. The first step in all these models is to reduce the number of MEMs in a way that only MEMs with high explanatory power are included in statistical models. 7.1 Species richness Forward selection of MEMs To further reduce the number of spatial predictor variables in our regression and RDA models, we selected only MEMs that had a high explanatory power for an individual response variable. Note that the rda function from the vegan package is used even for termite species richness (univariate). It turns out that RDA is an extension of regression when multiple response variables exist, and RDA is exactly the same as a regression if used with a single response variable. To facilitate the forward selection using R2, and to standardize our procedures, we used the rda function for uni and multivariate models. set.seed(102)# Use to obtain exactly the same results Regression models with intercept as predictor reg.s0 <- rda(termite.s ~ 1, data=mems.signif.vec) # Create model with all MEMs as predictors reg.s.mem <- rda(termite.s ~., data=mems.signif.vec) #use rda to perform forward selection based on R2 reg.s.fw <- ordir2step(reg.s0, formula(reg.s.mem), steps = 10000, direction="forward",r2scope=false) sel<-attr(reg.s.fw$terms,"term.labels") MEMs.S.sel<-MEMs.signif.vec[,sel] MEMvals.S.sel<-MEMs.signif.val[sel] Either using the rda or lm functions, it is simple to calculate the variation in termite species richness explained by broad and local scale spatial autocorrelation: Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) After obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors and biogeographic regions. 12

24 # Model with only environemntal data reg.s<-lm(termite.s~.,data=cbind(predictors.std,rivers));summary(reg.s) Call: lm(formula = termite.s ~., data = cbind(predictors.std, rivers)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Temp ** Prec * TreeCover lnp lnbases Clay riversguianaeast riversnegro riversinambari riversrondonia Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 187 DF, p-value: 2.255e Regressions after the removal of spatial autocorrelation Because the output of the forward selection is a regression model with the spatial predictors, we can extract the residuals from this model. The residuals can then be regressed agains the environmental variables. In other words, we are asking whether the variance not explained by space can be explained by the environment. #combine all data into a single dataframe alldata<-cbind(predictors.std,rivers,mems.s.sel) #regression analysis reg.s.full<-lm(termite.s~.,data=alldata);summary(reg.s.full) Call: lm(formula = termite.s ~., data = alldata) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) 13

25 (Intercept) e-10 *** Temp Prec * TreeCover lnp lnbases Clay riversguianaeast * riversnegro ** riversinambari * riversrondonia ** V V V e-05 *** V V ** V ** V ** V ** V *** V ** V V * V V * V V * V * V * V * V * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 167 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 8.98 on 30 and 167 DF, p-value: < 2.2e-16 reg.s.noriver<-lm(termite.s~.,data=cbind(predictors.std,mems.s.sel)) anova(reg.s.full,reg.s.noriver) Analysis of Variance Table Model 1: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + rivers + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Model 2: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Res.Df RSS Df Sum of Sq F Pr(>F) * 14

26 --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Using rda and permutation to test for significance of individual variables #(similar to regression) #to use varpart with mix of quantitative and categorical variables hs1 <- dudi.hillsmith(rivers, scannf = F, nf = 50) # Temp individually anova(rda(termite.s,predictors.std[,1],cbind(predictors.std[,-1],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 1], Z = cbind(predictors.std[, -1], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model Residual # Prec individually anova(rda(termite.s,predictors.std[,2],cbind(predictors.std[,-2],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 2], Z = cbind(predictors.std[, -2], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #... # Not interested in individual variables at this moment Variance partitioning We can particionate the variance into many different components and use the varpart function to simplify this procedure #Decompose R2 into Rivers, Environment, and Space # Importance of components plot(varpart(termite.s,hs1$li,predictors.std,mems.s.sel),xnames=c("riv","env","dist")) 15

27 Riv Env Dist Residuals = 0.45 Values <0 not shown # Test for signifcance of Rivers, Environment, and Space anova(rda(termite.s,hs1$li,cbind(mems.s.sel,predictors)))#rivers Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = hs1$li, Z = cbind(mems.s.sel, predictors)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,predictors,cbind(mems.s.sel,hs1$li)))#environment Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors, Z = cbind(mems.s.sel, hs1$li)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,mems.s.sel,cbind(predictors,hs1$li)))#space Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = MEMs.S.sel, Z = cbind(predictors, hs1$li)) Df Variance F Pr(>F) 16

28 Model *** Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' Species composition Forward selection of MEMs Similar to what was performed previously, the first step in the analysis of species composition was to filter spatial predictors (MEMs) with high explanatory power for speies composition. Because we are using Bray-Curtis dissimilarities as the response variable in the Redundancy Analysis, we will use the capscale function (or distance-based RDA or simply db-rda). set.seed(102)# Use to obtain exactly the same results db-rda models with intercept as predictor rda.comp0 <- capscale(termite.bray ~ 1, data=mems.signif.vec,add=true) # Create model with all MEMs as predictors rda.comp.mem <- capscale(termite.bray ~., data=mems.signif.vec,add=true) #use rda to perform forward selection based on R2 rda.comp.fw <- ordir2step(rda.comp0, formula(rda.comp.mem),direction="forward") sel<-attr(rda.comp.fw$terms,"term.labels") MEMs.comp.sel<-MEMs.signif.vec[,sel] MEMvals.comp.sel<-MEMs.signif.val[sel] Again, we calculate the total variation in termite species composition (dbrda axes) explained by space RDA analysis - Species composition As with richness, after obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors Simple RDA (without accounting for spatial autocorrelation) # Model with only environemntal data rda.comp<-capscale(termite.bray~.,data=cbind(predictors.std),add=true) anova(rda.comp,by="margin") Permutation test for capscale under reduced model Marginal effects of terms Permutation: free Number of permutations: 999 Model: capscale(formula = termite.bray ~ Temp + Prec + TreeCover + lnp + lnbases + Clay, data = cbind Df SumOfSqs F Pr(>F) Temp *** 17

Diversity and composition of termites in Amazonia CSDambros 09 January, 2015

Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Put the abstract here Missing code is being cleaned. Abstract Contents 1 Intro 3 2 Load required packages 3 3 Import data 3