Diversity and composition of termites in Amazonia CSDambros 09 January, 2015

Size: px

Start display at page:

Download "Diversity and composition of termites in Amazonia CSDambros 09 January, 2015"

Hilary Benjamin Lang
5 years ago
Views:

1 Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Put the abstract here Missing code is being cleaned. Abstract Contents 1 Intro 3 2 Load required packages 3 3 Import data 3 4 Extract geogrpahical coordinates for transects and grids 4 5 Create default MAP 4 6 Create site x species matrix 5 7 Calculate PCoA 6 8 Dealing with missing values 7 9 Create connectivity matrix Transect-transect within grids Transect-transect among grids Merge local and regional dispersal into a single matrix Plot connectivity matrix Create Moran Eigenvectors Maps (MEMs) 9 11 Calculate spatial autocorrelation of MEMs 9 12 Analysis Regression analysis Termite abundance

2 14 Results in MAP Observed Predicted Residual Forward Selection of MEMs Define Broad and Fine scale MEMs Variance partitioning Significance of environmental variables controlling for space Plot variance partitioning graph Supplementary Analysis: RDA and dbrda RDA (Euclidean distance on Hellinger transformed data) Distance-Based RDA

3 1 Intro This script describes in detail all analyses conducted for the paper XXX. All datasets used are publicly available on-line, and the R code provides links for their individual download. Some datasets were previously modified to facilitate their use here. The original datasets as well as R scripts used for their transformation are available on-line at XXX. The clean R code presented in this document is also available at XXX in.r format. 2 Load required packages Some of the analyses run in our study were coded along the text, some analyses were in pre built packages, and some were transformed in functions, which are publicly available on-line. The following lines load the required libraries, and download and source the necessary functions rm(list=ls()) library(vegan) installed using install.packages("vegan") library(raster) installed using install.packages("raster") library(maptools) installed using install.packages("maptools") library(rgdal) installed using install.packages("rgdal") library(spdep) installed using install.packages("spdep") library(sp) installed using install.packages("sp") #source(" # poncho function for figures source("additionalfunctions.r") source("pie.ord.r") 3 Import data Similarly to the functions, we made our datsets publicly available on-line. They are downloaded using the next chunk of code. If you have problems running these lines, you might have an Internet connection problem. Try to download the files to your directory using the provided links, and then read the files directly from your folder. The files imported here are not original datasets. These data were transformed to facilitate analyses. For example, environmental data from rasters were extracted previous to these analyses and incorporated into the main environmental dataframe. The original datasets, as well as scripts with data transformation are also available at: XXX # Import maps MAP<-readShapeSpatial("MAP.shp") rivers<-readshapespatial("rivers.shp") # Termite data Import termite record data isoptera<-read.csv("termiteproject.csv",row.names=1) Import termite species data - Path to be changed isopterataxonid<-read.csv("~/dropbox/labtermes/tabela_geral/isopterataxonid.csv") 3

4 # Environmental data env<-read.csv("termiteprojectenvironment.csv") 4 Extract geogrpahical coordinates for transects and grids coords<-env[,c("utm_easting","utm_northing")] regional.coords<-aggregate(coords,list(env$moduloid),mean)# Spatial Coordinates for grids 5 Create default MAP Before starting the analyses, the following lines can be run to create a default map to be used when plotting the results. This avoids repeating these lines several times when plotting maps #Creade default map MAP.1<-function(){ # Reduce fig margins par(mar=c(0.1,0.1,0.1,0.1)) # Nice positioning of labels lab.coords<-matrix(c( , ,886041, ,151940, , ,77515, , , , , , , , , , ),ncol=2) # Country labels labels<-c("brazil","f. Guiana","Guiana","Suriname","Venezuela","Colombia","Peru","Bolivia", "Brazilian Amazonia") # Plot Maps plot(map,border="grey40",col="grey80",bg="grey100",xlim=c(-80463, ),ylim=c( , )) plot(map[9,],col="grey34",bg="grey100",border="grey34",add=t) plot(map[6,],border="grey20",col=na,bg=na,add=t) # Add labels text(lab.coords[,1],lab.coords[,2],labels,adj=0,col=rep(c("grey30","grey70"),c(length(labels)-1,1))) plot(rivers,col="grey20",add=t) # Add grid locations points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) op<-par(no.readonly=true) # Add rectangle for barplot par(fig=c(0.52,.98,0.1,.33),usr=c(0,1,0,1),new=true) plot.new() rect(0,0,1,1,col="grey85",border="grey15") 4

5 par(op) par(fig=c(0.6,.95,0.16,.3),new=true) plot.new() # Add labels for barplot axis(1,at=c(0.05,0.95),labels=c("broad","fine"),tick=false,cex.axis=1,padj=-1) axis(1,at=0.5,labels="mems",tick=false,cex.axis=1,padj=-.5) axis(2,at=c(0,1),labels=0:1,cex.axis=1,las=2) axis(2,at=0.5,labels=expression(r^2),tick=false,cex.axis=1) box(bty="l") par(op) } # Call map #MAP.1() # The following lines only define where points will actually be plotted to avoid overlapping x2<-c( ,828511,209624,693235,70966,615452,432829,388864,453120,977315, ,422683,970551) y2<-c( , , , , , , , , , , , , ) #They were all merged into the nx table to facilitate plotting later nx<-cbind(regional.coords[,-1],x2,y2) 6 Create site x species matrix The termite data provided is in the long format, that is, each occurrence is a row in the original spreadsheet. Species, sampling location, etc are attributes (columns) in this spreadsheet. To run our analyses, we started by transforming the termite data into a short table, where each row represents a sampling location, and each column represents a species. Note that long format contain more information and is preferable for storage. Before starting, the variables PlotID and TaxonID in table isoptera were changed to be in the same order than these variables in the environmental table (env) and species info table (isopterataxonid). The number of subsamples sampled within each transect was also recorded # Make sure transects in termite dataset are in the same order from environmental dataset isoptera$plotid<-factor(isoptera$plotid,levels=env$plotid) # Make sure species in termite dataset are in the same order from species dataset isoptera$taxonid<-factor(isoptera$taxonid,levels=isopterataxonid$taxonid) attach(isoptera) termite.plot.obs<-tapply(isoptera$n,list(plotid,taxonid),sum)# Create transect X species matrix termite.plot.obs[is.na(termite.plot.obs)]<-0 # NAs are true zeros effort.plot<-tapply(n_subplots,plotid,mean) # Number of sections sampled per transect 5

6 detach(isoptera) The number of subsamples in each transect varied as can be checked by the following code range(effort.plot) [1] 5 12 To avoid comparing transects with different sampling effort, transects with more than 5 sections were rarefied. In other words, 5 sections were randomly selected in each transect, and the average abundance of each species, probability of presence (average presence), species overall abundance and species richness were calculated. If all transects had only 5 sections, the results would be exactly the same for the abundance and presence/absence matrices, species abundance per transect, and species richness per transect. The rarefaction was performed for all species, and for Wood feeding and Soil feeding termites separately. Abundance matrix termite.plot<-(5*termite.plot.obs)/as.vector(effort.plot) termite.plot.wood<-termite.plot[,isopterataxonid$fg1=="w"] termite.plot.soil<-termite.plot[,isopterataxonid$fg1=="s"] Presence-Absence matrix #termite.plot.pa<-1-((1-termite.plot.obs/as.vector(effort))^5)#with replacement termite.plot.pa<-poccur(as.vector(effort.plot),termite.plot.obs,5)# Without replacement (as simulated) Warning: NaNs produced termite.plot.pa[is.nan(termite.plot.pa)]<-1 termite.plot.pa.wood<-termite.plot.pa[,isopterataxonid$fg1=="w"] termite.plot.pa.soil<-termite.plot.pa[,isopterataxonid$fg1=="s"] Abundance termite.n<-rowsums(termite.plot,na.rm=true) termite.n.wood<-rowsums(termite.plot.wood,na.rm=true) termite.n.soil<-rowsums(termite.plot.soil,na.rm=true) Richness termite.s<-rowsums(termite.plot.pa,na.rm=true) termite.s.wood<-rowsums(termite.plot.pa.wood,na.rm=true) termite.s.soil<-rowsums(termite.plot.pa.soil,na.rm=true) 7 Calculate PCoA 6

7 Composition termite.bray<-vegdist(cbind(termite.plot,1),"bray") termite.pcoa<-scores(cmdscale(termite.bray)) termite.pcoa1<-termite.pcoa[,1] termite.pcoa2<-termite.pcoa[,2] Composition soil termite.bray.soil<-vegdist(cbind(termite.plot.soil,1),"bray") termite.pcoa.soil<-scores(cmdscale(termite.bray.soil)) termite.pcoa1.soil<-termite.pcoa.soil[,1] termite.pcoa2.soil<-termite.pcoa.soil[,2] Composition wood termite.bray.wood<-vegdist(cbind(termite.plot.wood,1),"bray") termite.pcoa.wood<-scores(cmdscale(termite.bray.wood)) termite.pcoa1.wood<-termite.pcoa.wood[,1] termite.pcoa2.wood<-termite.pcoa.wood[,2] 8 Dealing with missing values Some environmental soil variables and ant data were not available for all transects. Removing these data from the analyses could prevent the detection of the association of other variables with termite species abundance and composition. To overcome the problem of having missing data we performed data inputation, that is filling missing data with values. For those transects with missing values, we randomly selected values from other transects. We analyzed the ant data by including an interaction of grid:antdensity, so that on those grids without ant information, slopes are close to zero due to data inputation, whereas in the remaining transects the slopes are not affected by data inputation. The number of samples without environmental data was 58 for ants, 61 for P and Clay, and 51 for Mg and K. Grids with environmental data were sampled in all sampling regions of the study, so data inputation was spread across the study region. #Data Imputation For those transects without env data, randomly select vals from other transects set.seed(102) # randomly select from observations env.1<-as.data.frame(sapply(env,function(x)ifelse(is.na(as.numeric(x)),sample(x[!is.na(x)]),x))) # Restore variables that are factors env.1[,colnames(env)[(sapply(env,is.factor))]]<-env[,sapply(env,is.factor)] 7

8 9 Create connectivity matrix 9.1 Transect-transect within grids # Create dispersal matrix to represent spatial autocorrelation Local communities # Connect plots that are less than 1700 meters apart, but not self local.nb.mat<-((as.matrix(dist(coords))<1700&as.matrix(dist(coords))>5)*1) # Prob of leaving a plot to metacom = prob of leaving to another plot within grid pleave<-1/(colsums(local.nb.mat)+1) # probs to disperse to neighboring plots within grid within<-t(local.nb.mat/(colsums(local.nb.mat)+1)) 9.2 Transect-transect among grids Regional community (dispersal between grids of plots) reg.hub<-as.integer(env$moduloid)# determine in which grid (hub in metacomm), a particular plot is # Use Delaunay triangulation to create metacommunity structure (requires spdep package) regional.nb<-graph2nb(gabrielneigh(as.matrix(regional.coords[,2:3])),sym=t,row.names = regional.coords[, # Calculate the number of neighbors a grid (hub) has reg.nlink<-sapply(regional.nb,length) # Create a matrix of 1s & 0s of dispersal from plots to grids regional.nb.mat<-sapply(regional.nb,function(x){(1*matrix(reg.hub%in%as.integer(x)))}) # Create matrix of dispersal from plot to plot through the metacommunity regional.nb.mat1<-t(t(regional.nb.mat)*(1/reg.nlink))[,reg.hub] # Define how many plots each grid is connected to hub.out<-(1/(table(reg.hub)))[reg.hub] # Final probability of dispersal plot-plot in metacommunity between<-matrix(hub.out,198,198)*regional.nb.mat1 9.3 Merge local and regional dispersal into a single matrix # merge local and regional: p(plot2plot)=p(leave & arrive by metacomm hub) or p(stay in grid and move) plot2plot<-(t(pleave*t(between)))+within # transform into a list removing zeros (sparse matrix) plot2plot.nb<-sapply(as.data.frame(plot2plot),function(x)(1:length(x))[x!=0]) 8

9 # Transform into a class nb (spdep), just for plotting attributes(plot2plot.nb)<-list(class="nb",sym=true) 9.4 Plot connectivity matrix Figure 1: plot of chunk plotgabriel 10 Create Moran Eigenvectors Maps (MEMs) Calculate Moran EigenVectors representing spatial autocorrelation w<-(t(plot2plot)+plot2plot)/2 #Make weight (dispersal) matrix symetric w<-t(t((w-rowmeans(w)))-colmeans(w)+mean(w)) #normalize data (column and row means to zero) MEMs<-eigen(w)#Calculate Eigenvalues and vectors for w # Eliminate vector where eigenvalue equals about zero MEMs$vectors<-MEMs$vectors[,abs(MEMs$values)!=min(abs(MEMs$values))] # Eliminate value where eigenvalue equals about zero MEMs$values<-MEMs$values[abs(MEMs$values)!=min(abs(MEMs$values))] 11 Calculate spatial autocorrelation of MEMs # Detect if moran eigenvectors have significant spatial autocorrelation pvals<-sapply(1:length(mems$values),function(x){ morani.pvals(mems$vectors[,x],plot2plot,c("less","greater")[(mems$values[x]>0)+1],100)}) # Select only vectors with significant spatial aucorrelation MEMs.signif.vec <- as.data.frame(mems$vectors[,pvals<0.01]) MEMs.signif.val <- MEMs$values[pvals<0.01] 9

10 12 Analysis To analyze the data, we first created a string with the predictor variables to be included in our models. By creating the string this way, we avoided repeating the same long string in every section where a model is fitted. Some variables were also log transformed previous to analyses. These are represented by I(log(variable + h)). h is a constant added to all log transformed variables, so that zeros do not produce an error (log(0)=-inf; log(1)=0). #Select environmental variables to include in the model h=1 #Factor to be added to variables prior to log transformation # Create dataframe only with predictor variables attach(env.1) predictors<-data.frame(layer.1,layer.12,tclandsat,clay,lnca=log(ca+h),lnk=log(env.1$k+h),lnp=log(p+h)) detach(env.1) head(predictors) layer.1 layer.12 TCLandsat Clay lnca lnk lnp Regression analysis We run two types of models. For univariate models, we run multiple regressions. For multivariate (species composition) we run Redundancy Analysis, which is an extension of multiple regression. Note that using rda with a single response variable produces exactly the same results as a multiple regression, so we used the function rda for univariate models in some cases where RDA implementation is easier. In this annotated script, only the analyses for termite abundance and overall composition is shown for brevity. The analyses for species richness, and abundance, richness, and composition for wood and soil feeding termites are available in the clean R script provided along with data Termite abundance Similarly to what has been done for the predictor variables, the response variable was called as a string. This variable is then called later with the code get(response). We conducted the analyses using the string method because the same code is repeated several times, so that only the string needs to be modified when changing the response variable. We also performed regression analyses with only environmental variables as predictors, and then compared the spatial structure observed in the data, spatial structure expected by the environmental variables, and spatial structure in the residuals. # Create regression model REG<-lm(termite.N~.,data=predictors) summary(reg) 10

11 Call: lm(formula = termite.n ~., data = predictors) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) *** layer ** layer TCLandsat Clay lnca * lnk lnp * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.61 on 190 degrees of freedom Multiple R-squared: 0.173, Adjusted R-squared: F-statistic: 5.67 on 7 and 190 DF, p-value: 5.74e-06 # Com in this case is just the termite.n variable, termite abundance Com<-termite.N # Termite abundance predicted by the environment using the regression model Env<-predict(REG,type="response") # Residuals from the regression model Res<-termite.N-predict(REG,type="response") # Create a dataframe with observed, predicted, and residuals scores.n<-cbind(com,env,res) We then calculated the variance explained by individual MEMs. These R2 values were grouped into 28 groups from broad to fine scale (smoothed MEMs; Dray et al. 2012). P-values for each group was calculated by randomizing the values of termite abundance and recalculating R2 values 1000 times (Monte Carlo permutations). P-value = (number of times R2 is higher or equal observed + 1)/(number of randomizations + 1) # calculate variance explained by MEMs test.n<-groupr2.pvals(scores.n,mems$vectors[,1:196],28,1000) 14 Results in MAP 14.1 Observed 11

12 #Observed MAP.1() segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,1]-min(scores.n[,1]),groups=env$moduloid, col=c("green","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) Loading required package: plotrix points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,1],ylim=c(0,1),col=(test.n$pvals[,1]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,1]),min(scores.n[,1]),length=4)), title="abundance",col="green",bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 28 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 2: plot of chunk YinMap #dev.copy2pdf(file="yn1.pdf")#save figure as a pdf 14.2 Predicted 12

13 #Predicted MAP.1() segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,2]-min(scores.n[,2]),groups=env$moduloid, col=c("green","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,2],ylim=c(0,1),col=(test.n$pvals[,2]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,2]),min(scores.n[,2]),length=4)), title="abundance",col="green",bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 19 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 3: plot of chunk FinMap #dev.copy2pdf(file="fn1.pdf")#save figure as a pdf 14.3 Residual #Residual MAP.1() 13

14 segments(nx[,1],nx[,2],nx[,3],nx[,4],col="white") points(regional.coords[,-1],pch=21,col="white",bg=1,cex=.8,lwd=1.3) pie.ord(nx[as.integer(env$moduloid),3:4],scores.n[,3],groups=env$moduloid, col=c("gold","green"),starpie=true,expand=.08,border=adjustcolor(1,alpha=.8)) points(nx[,3:4],pch=21,bg=1,cex=.6,col="white") par(fig=c(0.6,.95,0.16,.3),new=true) barplot(test.n$obs[,3],ylim=c(0,1),col=(test.n$pvals[,3]<0.05),axes=false)# legend.ord(0.80,0.40,round(seq(max(scores.n[,3]),min(scores.n[,3]),length=4)), title="abundance",col=c("green","green","gold","gold"),bg="grey85",border="grey15") Colombia Venezuela Suriname F. Guiana Guiana Abundance 13 Brazilian Amazonia R2 Peru Bolivia 0 Broad MEMs Fine Brazil Figure 4: plot of chunk RinMap #dev.copy2pdf(file="rn1.pdf") #Save figure as a pdf 15 Forward Selection of MEMs After defining the response variable, we performed a forward selection of MEMs, and selected only those spatial predictors that significantly increased the values of adjusted R2 of the models. That is, the MEMs that better explained the variance in the response variable. Forward selection and definition of broad and fine scale variables # Create null model reg.0 <- rda(termite.n ~ 1, data=mems.signif.vec) 14

15 # Create model with all MEMs reg.full <- rda(termite.n ~., data=mems.signif.vec) #use rda to perform forward selection based on R2 reg.fw <- ordir2step(reg.0, formula(reg.full), steps = 10000,direction="forward",R2scope=FALSE) # Extract the name of MEMs selected selected<-attr(reg.fw$terms,"term.labels") # Create dataframe with selected MEMs vec.sel<-mems.signif.vec[,match(selected,colnames(mems.signif.vec))] # Create vector with the corresponding eigenvalues for selected MEMs val.sel<-mems.signif.val[match(selected,colnames(mems.signif.vec))] 16 Define Broad and Fine scale MEMs broad <- vec.sel[,val.sel>0] # Split selected MEMs in broad scale MEMs fine <- vec.sel[,val.sel<0] # Split selected MEMs in fine scale MEMs 17 Variance partitioning After conducting forward selection and splitting MEMs in broad and fine scale MEMs, we partitioned the variance of the termite abundance into 1) variance explaned by environmental predictors, variance explained by broad scale spatial autocorrelation, and variance explained by fine scale spatial autocorrelation. Spatial structure of data and variance partitioning varpart.table.n <- varpart(termite.n, predictors, broad, fine) varpart.table.n Partition of variation in RDA Call: varpart(y = termite.n, X = predictors, broad, fine) Explanatory tables: X1: predictors X2: broad X3: fine No. of explanatory tables: 3 Total variation (SS): Variance: No. of observations: 198 Partition table: Df R.square Adj.R.square Testable 15

16 [a+d+f+g] = X TRUE [b+d+e+g] = X TRUE [c+e+f+g] = X TRUE [a+b+d+e+f+g] = X1+X TRUE [a+c+d+e+f+g] = X1+X TRUE [b+c+d+e+f+g] = X2+X TRUE [a+b+c+d+e+f+g] = All TRUE Individual fractions [a] = X1 X2+X TRUE [b] = X2 X1+X TRUE [c] = X3 X1+X TRUE [d] FALSE [e] FALSE [f] FALSE [g] FALSE [h] = Residuals FALSE Controlling 1 table X [a+d] = X1 X TRUE [a+f] = X1 X TRUE [b+d] = X2 X TRUE [b+e] = X2 X TRUE [c+e] = X3 X TRUE [c+f] = X3 X TRUE --- Use function 'rda' to test significance of fractions of interest 18 Significance of environmental variables controlling for space Run anova by margin (each variables agains all others) pval.n<-anova(rda(termite.n~.+condition(as.matrix(vec.sel)),data=predictors),by="margin",step=999) pval.n Model: rda(formula = termite.n ~ layer.1 + layer.12 + TCLandsat + Clay + lnca + lnk + lnp + Condition Permutation test for rda under reduced model Marginal effects of terms Df Var F N.Perm Pr(>F) layer layer TCLandsat * Clay * lnca lnk lnp Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 16

17 18.1 Plot variance partitioning graph # plot(varpart.table.n) title("abundance",cex.main=2) Abundance Residuals = 0.50 Values <0 not shown Figure 5: plot of chunk plotvarpart dev.copy2pdf(file="vennn.pdf") pdf 2 We then repeated the same procedure for species richness and PCoA1 and PCoA2. The code for these analyses is now shown to avoid being repetitive, but you can access the complete code in the clean R script at XXX 19 Supplementary Analysis: RDA and dbrda 19.1 RDA (Euclidean distance on Hellinger transformed data) Hellinger transformation Hellinger transform termite data (Poisson-like ---> Normal-like, weight in abundant species) termite.plot.h <- decostand(termite.plot, method="hellinger") termite.plot.h.w <- decostand(termite.plot.w, method="hellinger") termite.plot.h.s <- decostand(termite.plot.s, method="hellinger") 17

18 RDA 19.2 Distance-Based RDA 18

Ecography. Supplementary material

Ecography. Supplementary material Ecography ECOG-02663 Dambros, C. S., Moraris, J. W., Azevedo, R. A. and Gotelli, N. J. 2016. Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest.