Ecography. Supplementary material

Size: px
Start display at page:

Download "Ecography. Supplementary material"

Transcription

1 Ecography ECOG Dambros, C. S., Moraris, J. W., Azevedo, R. A. and Gotelli, N. J Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest. Ecography doi: / ecog Supplementary material

2 APPENDIX 1 DETAILED DESCRIPTION OF METHODS AND SUPPLEMENTARY RESULTS Text A1 Details on the development of a termite sampling protocol for ecological studies We used the data collected at Ducke Forest, central Amazonia, Brazil, to establish a sampling design for ecological studies of termites that would maximize the power of statistical tests. The results obtained were used to develop a sampling protocol to be used in other parts of Amazonia. At Ducke Forest, termites were sampled in m long transects. The 250m long transects are used for Long Term Ecological Research (LTER) and are ideal to sample termites and other groups, and to compare their distribution (Magnusson et al. 2005). The transects followed the elevation isocline to minimize variation in edaphic conditions within transects. The sampling protocol used within each transect was modified from Jones and Eggleton (2000). In each transect, 10 sections of 2x5m were sampled and termites were searched in logs, branches, leaves, and soil for 20 minutes by 3 investigators. The sampling yield a total sampling effort of 300 sections searched. Each transect was used as a sampling unit in regression analyses testing for the association of soil variables and termite species composition. Termite species composition was measured as the Bray- Curtis dissimilarity index between all pairs of transects. A Principal Coordinates Analyses (PCoA) was performed using the Bray-Curtis dissimilarity matrix and the first axis of PCoA was used as response variable in regression models. The weakest association detected was of termite species composition against soil bases (R 2 = 0.16; P = 0.03). Soil clay content was the stronger association detected (R 2 = 0.42; P < 0.01). Termite occurrence and species richness was associated only with ant predator density (Dambros et al. 2016). To develop the sampling design, we have considered the trade-off in costs of sampling a given number of transects and the number of sections surveyed within each transect. Considering that a research project has resources to sample a given number of sections, we determined the distribution of these sections that would maximize statistical power (ie. probability of detecting an association when it exists). For example, if one has the resources and time to sample 150 sections, what is the best arrangement of sections that maximize statistical power? Intensively sample 15 transects with 10 sections each, or spread the sections in 30 transects with 5 sections each? Of course, the costs of sampling two sections in a single transect is not the same as sampling one section in two transects. We would need to include several other aspects to truly establish the most cost effective sampling design. The costs of sampling additional sections within a transect or additional transects probably differ among studies considering the logistics of sampling areas, etc. However, in our study we are considering LTER sites, where trails and transects were pre-established, so most of the logistic costs are not relevant for this study. In other words, in our study, sampling four sections in one transect or two transects with two sections each have about the same cost. It is also important to note that our transects followed the elevation isocline (not necessarily linear), and there is little environmental variation within each transect. Therefore, the similarity in termite species richness and composition between two sections within a transect is higher in our study than in other termite studies (eg. Jones and Eggleton 2000; Davies et al. 2003). This is relevant because sections within a transect are highly redundant in our study and provide similar information, whereas in other studies sections can be complementary. To test the effect of the reduction in the number of transects and sections on the association of termite species composition and soil bases and soil clay content, we rarefied the number of transects and sections sampled per transect. We then re-run all analyses using the reduced dataset 1000 times and determined if the association would be detected at an alpha level of The rarefaction of sections is similar to what was conducted by Jones and Eggleton (2000), but in our study we are not evaluating the

3 number of species sampled, but the power of statistical tests commonly used in ecological studies. For both soil bases and soil clay content, higher statistical power was detected when increasing the number of transects and sections sampled (Fig. A1). However, the increase in the number of transects had a stronger influence in the power of statistical tests than the number of sections sampled per transect (Fig. A1). For example, sampling 150 sections spread in 25 transects (6 sections each) would lead to the detection of the association between soil bases and termite species composition ~40% of the time. In contrast, the association would be detected only ~20% of the time when sampling 150 sections spread in 15 transects (10 sections each). Statistical power is still relatively high when sampling only one section per transect if 30 transects (or more) are sampled. Our results support the sampling of fewer than 10 sections per transect in ecological studies of termites when multiple transects (>30) are sampled. However, the particularities of our sampling design should be taken into account when designing future studies.

4 Text A2 - Detailed description of rarefaction procedure applied to individual transects with more than five sections. To calculate the abundance of each species expected by sampling five sections in those transects with more than five sections, we divided the species abundances by the number of sections sampled in a given transect. This measurement represents the density of termites from a particular species in a transect. For example, a species with abundance of 10 colonies in a transect where 10 sections were sampled has a density of 1 colony per section. To obtain the abundance expected in five sections, we multiplied the species density in a given transect by five. The expected abundance for all species within a transect was measured as the sum of the expected abundances for individual species. To calculate the probability of a species to occur in a given transect by sampling only five sections, or the expected presence of a particular species in a given transect, we derived the following formula: where N represents the number of sections surveyed, N i represents the number of sections where species i was present, and n represents the number of sections to be subsampled (in our case n = 5 for all transects). The code to run this calculation in R is 1-(factorial(N-n)/factorial(N))*(factorial(N-Ni)/factorial(N-Ni-n)) Note that this formula calculates the number of species that would be sampled in 5 sections in a single try (without replacement). This calculation is different from sequentially sampling one section, replacing it, and repeating the procedure until five sections were obtained. In the later case, the calculation would be simply, The estimated species richness per transect was calculated as the sum of the probability of occurrence for all species sampled in each transect, or The results obtained using these formulas provide the same results as randomly selecting five sections in each transect, and recording the species abundances, species richness, and presence and absence for each species. To demonstrate this, we randomly selected only five sections in all transects (rarefaction), and used measures, such as termite abundance, obtained in five sections for analyses. The random selection of sections was repeated 999 times for each transect, and the mean abundance, mean

5 species richness, and mean abundance per species was recorded. Note that for transects where only five sections were sampled, the results from rarefaction are identical to the observed values because there is only one possible combination of five sections that could be selected in a randomization. For transects with more than five sections, the resulting values represented the average values that would be obtained by sampling only five transects. This procedure should not change the type I error rate of our analyses, but should increase the statistical power of our models (compared to sampling only five sections in all transects) because the values obtained in those transects with more than five sections represent a more precise measurement, closer to the true expected value.

6 Text A3 - Detailed description for the construction of Moran Eigenvector Maps and associated weighting matrix, w. In our study, two sampling designs were used. In each of 12 grids within the Amazonian forest, we sampled from five to 32 transects spaced regularly in intervals of 1 km. The transects were organized within regular grids, whereas the grids had an irregular distribution (Fig 1 in main text). We determined that transects within a grid should be much more connected than transects in distinct grids. The idea in our procedure was to represent a local community within a grid, and a metacommunity among grids in a hierarchy. We established that 1) transects close to each other within a grid would be connected; and 2) that the connectivity (eg. dispersal probability) between two transects within a grid would be equal to the connectivity of a transect with all transects outside the grid combined. The connectivity matrix between pairs of transects within a grid was created by connecting each transect to all its adjacent transects in a radius of (Moore neighborhood; 1 if connected, zero otherwise; Fig. 1b in manuscript). We then multiplied the within grid connectivity matrix by 1/(1 + n i ), where n i represents the number of neighbors to which a given transect is connected to. We added 1 in the denominator because each transect was later connected to other transects outside the grid (Fig. A2). The connectivity between grids was determined by a Gabriel graph (Legendre and Legendre 2012) and was used to determine the connectivity between pairs of transects in distinct grids (1 if connected, zero otherwise). The matrix of connectivity between transects in distinct grids was then multiplied element-wise by 1/[(1 + n i )g j ], where g j represents the number of transects sampled in the grid where a given transect is located. Finally, we summed both matrices to obtain w. If we had only two grids with 2 transects each, the connectivity between the transects within a grid would be 1/(1+1), or 0.5. The connectivity of two transects in distinct grids would be 1/[(1+1)2], or Moran Eigenvector Maps construction and selection To create MEMs, we run an eigen analyses on the final connectivity matrix w. The eigen analysis generated 197 vectors representing spatial autocorrelation from broad to fine spatial scales, which were determined from their associated eigenvalues (large and small eigenvalues represent broad and fine spatial autocorrelation, respectively; Dray et al. 2012). To reduce the number of vectors to be included in our models, we performed two further steps. First, we assessed the spatial autocorrelation of MEMs by calculating Moran's I, and selected only MEMs significantly correlated with the geographical distance separating transects (Dray et al. 2012). Second, we created a regression or RDA model, when appropriate, using only MEMs as predictor variables of termite abundance, species richness, and species composition (PCoA axis using the Bray-Curtis dissimilarity matrix). We then run a forward stepwise selection of MEMs based on the adjusted R 2 of the model (Dray et al. 2012; Legendre and Gauthier 2014). This procedure was conducted independently for each response variable, and the final number of MEMs depended on the explanatory power of each MEM for a particular variable. Note that because MEMs are orthogonal and independent, the inclusion of all MEMs in the analyses would explain 100% of the variation in any response variable. So, including all MEMs would not be very informative. After the forward selection, the selected MEMs were divided into two groups: Broad and fine scale MEMs. Finally, we applied a variance partitioning approach to separate the portion of variance in the response variable explained by 1) spatial autocorrelation in species distribution that could be a result of limited dispersal in fine scales; 2) spatial autocorrelation in species distribution that could be a result of limited dispersal in broad scales; 3) species association with environmental variables spatially structured in fine scales; 4) species association with environmental variables spatially structured in broad scales; 5) species association with non spatially structured variables; and 6) residual variation.

7 Figure A1. Probability of detecting an association of termite species composition with soil bases and soil clay content when the number of transects and the number of sections within each transect is rarefied. Termite species composition was measured as the first Principal Component of a Principal Coordinates Analysis (PCoA) on the Bray-Curtis pairwise dissimilarity matrix. Arrows represent two scenarios for spreading sections in transects with the same sampling effort (measured as the overall number of sections). Highest power is obtained by increasing the number of transects sampled. See main text for details on the measurements of soil bases and soil clay content.

8 Figure A2 Biplot based on a distance-based Redundancy Analysis (db-rda) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the abundance balanced component of the Bray-Curtis dissimilarity index (Baselga 2013; summarized in PCoA axes in db-rda analysis). Polygons represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.

9 Figure A3. Biplot based on a Non-metric Multidimensional Scaling analysis (NMDS) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the Bray-Curtis dissimilarity index (summarized in NMDS axes in NMDS analysis). Polygons in (a) and (c) represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.

10 References Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4, Dambros, C.S., Morais, J.W., Vasconcellos, A., Souza, J.L.P., Franklin, E. & Gotelli, N.J. (2016) Association of ant predators and edaphic conditions with termite diversity in an amazonian rain forest. Biotropica. Davies, R.G., Hernández, L.M., Eggleton, P., Didham, R.K., Fagan, L.L. & Winchester, N.N. (2003) Environmental and spatial influences upon species composition of a termite assemblage across neotropical forest islands. Journal of Tropical Ecology, 19, Dray, S., Legendre, P. & Peres-Neto, P.R. (2006) Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling, 196, Jones, D.T. & Eggleton, P. (2000) Sampling termite assemblages in tropical forests: testing a rapid biodiversity assessment protocol. Journal of Applied Ecology, 37, Magnusson, W.E., Lima, A.P., Luizão, R., Luizão, F., Costa, F.R., CAStIlHO, C.V. de & Kinupp, V.F. (2005) RAPELD: a modification of the Gentry method for biodiversity surveys in long-term ecological research sites. Biota neotropica, 5,

11 Appendix 2

12 Diversity and composition of termites in Amazonia CSDambros 19 October, 2016 Abstract This document describes the analyses conducted in the manuscript about termite distribution in Amazonia submitted to Ecography. Environmental data were previously extracted from rasters, and missing soil data were inputed as explained in the main text. All datasets used are publicly available on-line, and the R code provides links for their individual download. Contents 1 Load required packages 4 2 Import data 4 3 Dealing with missing values 5 4 Select only predictors of interest 6 5 Create site x species matrix Calculate species overall abundance and species richness Calculate similarity matrices MEMs construction Extract geogrpahical coordinates for transects and grids Create connectivity matrix Transect-transect within grids Transect-transect among grids Merge local and regional dispersal into a single matrix Create Moran Eigenvectors Maps (MEMs) Selection of MEMs with spatial autocorrelation Analyses (species richness and composition analyzed separatelly) Species richness Forward selection of MEMs Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) Regressions after the removal of spatial autocorrelation Variance partitioning

13 7.2 Species composition Forward selection of MEMs RDA analysis - Species composition Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Apendix: Additional Analyses Species composition using Non-Metric Multidimensional Scaling (NMDS) Run NMDS and extract scores Forward selection of MEMs Simple regression (without accounting for spatial autocorrelation) Regression after the removal of spatial autocorrelation Biplot of NMDS - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using only turnover component (nestedness removed) Calculate simpson s dissimilarities (turnover component of Sorensen) Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using balanced abundances (turnover component of bray Curtis) Calculate balanced dissimilarities (turnover component of Bray) Forward selection of MEMs RDA analysis simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning

14 Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using RDA on hellinger transformed termite data (not db-rda) Hellinger transformation Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation)

15 1 Load required packages Some of the analyses ran in our study were transformed in functions which are publicly available on-line. The following lines load the required libraries, and download and source the necessary functions library(vegan) installed using install.packages("vegan") library(spdep) installed using install.packages("spdep") library(sp) installed using install.packages("sp") library(spacemaker) # installed using: #install.packages("tripack") #install.packages("spacemaker",repos=" source(" 2 Import data Similarly to the functions, we made our datasets publicly available on-line. They are downloaded using the next chunk of code. If you have problems running these lines, you might have an Internet connection problem (proxies can be a problem when trying to download things directly from R). Try to download the files to your directory using the provided links, and then read the files directly from your folder by changing the path to file (eg. instead of use c://file_path). Some files imported here are not original datasets. These data were transformed to facilitate analyses. For example, environmental data from rasters were extracted previously to these analyses and were incorporated into the main environmental dataframe. The original datasets, as well as scripts with data transformation are available under request to the first author. # Termite data # Import termite record data isoptera<-read.csv(" # Import termite species trait data isopterataxonid<-read.csv(" # Import environmental data env<-read.csv(" head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast

16 4 I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA P P.input 1 NA NA NA NA NA NA Dealing with missing values Some soil variables were not available for all transects. Removing these data from the analyses could prevent the detection of the association of other variables with termite species richness and composition. For example, removing data from the Jaú National Park, a distinct biogeographic region with extreme values for temperature and precipitation could affect the association of termite species composition with rivers and climate. To overcome the problem of having missing data, we performed data inputation - filling missing data with values. For those transects with missing values, we randomly selected values from other transects. 1 Grids with environmental data were sampled in all sampling regions of the study, so data inputation was spread across the study region. Moreover, 137 transects had all environmental data available, and climatic variables, tree cover, and biogeographic information were available for all transects, so data inputation was not necessary for these variables. # Input missing soil data set.seed(102) # Guarantees the results will be exactly the same every time # For each column, replace missing values with a random sample from non-missing values for(i in 9:ncol(env)){ # Create random sample from non-missing entries sample.env<-sample(env[!is.na(env[,i]),i],size = nrow(env),replace = TRUE) env[,i]<-ifelse(is.na(env[,i]),sample.env,env[,i]) # Replace NAs with random sample } head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam This procedure will certainly add noise to the data and reduce the power of statistical tests (similar to the removal of sampling areas). The removal of areas with missing soil data provided similar results 5

17 5 campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca P P.input Select only predictors of interest To facilitate analyses, we create a new dataframe with only those variables used as predictor variables in the sttatistical models. This makes the following code more simple, especially when all variables are used in a regression model. # Select only variables to be used as predictors, combine and log transform variables predictors<-data.frame(env[,c("temp","prec","treecover")], lnp=log(env$p+1),lnbases=log(env$k+env$mg+env$ca+1),clay=env$clay) # Define the varible rivers (biogeographic region) rivers<-factor(env$region, levels=c("guianawest","guianaeast","negro","inambari","rondonia")) # Standardize predictor variables predictors.std<-decostand(predictors,"standardize") # Visualize the first rows of predictors head(predictors) Temp Prec TreeCover lnp lnbases Clay

18 Create site x species matrix The termite data provided is in the long format, that is, each occurrence is a row in the original spreadsheet. Species, sampling location, etc, are attributes (columns) in this spreadsheet. To run our analyses, we started by transforming the termite data into a short table, where each row represents a sampling location and each column represents a species. Note that long format contains more information and is preferable for storage. Before starting, the variables PlotID and TaxonID in table isoptera were reordered to be in the same order than these variables in the environmental table (env) and species info table (isopterataxonid). The number of subsamples sampled within each transect was also recorded # Make sure transects in termite dataset are in the same order as #in environmental dataset isoptera$plotid<-factor(isoptera$plotid,levels=env$plotid) # Make sure species in termite dataset are in the same order from species dataset isoptera$taxonid<-factor(isoptera$taxonid,levels=isopterataxonid$taxonid) attach(isoptera) # Create transect X species matrix termite.plot.obs<-tapply(isoptera$n,list(plotid,taxonid),sum) termite.plot.obs[is.na(termite.plot.obs)]<-0 # NAs are true zeros effort.plot<-tapply(n_subplots,plotid,mean) # Number of sections sampled per transect detach(isoptera) The number of subsamples in each transect varied as can be checked by the following code range(effort.plot) [1] 5 12 To avoid comparing transects with different sampling effort, transects with more than 5 sections were rarefied. In other words, 5 sections were randomly selected in each transect (without replacement), and the average abundance of individual species, probability of occurrence (average presence), species overall abundance, and species richness were calculated. If all transects had only 5 sections, the results would be exactly the same for the abundance and presence/absence matrices, species abundance per transect, and species richness per transect. Abundance matrix termite.plot<-(5*termite.plot.obs)/as.vector(effort.plot) Presence-Absence matrix (probability of occurrence) 7

19 #With replacement #termite.plot.pa<-1-((1-termite.plot.obs/as.vector(effort))^5) # Without replacement (as simulated and used;ignore possible warnings) termite.plot.pa<-poccur(as.vector(effort.plot),termite.plot.obs,5) # NaNs are 1s termite.plot.pa[is.nan(termite.plot.pa)]< Calculate species overall abundance and species richness Now that the matrix of estimated species abundance and presence/absence is calculated, it is ease to calculate the expected overall abundace and species richness for all species. The procedure is the same as if we were not rarefing the community: Overall species abundance is the sum of abundance for all species, and the expected species richness is the sum of the probability of occurrence of all individual species. Total abundance per transect # The sum of individual abundances is overall abundance termite.n<-rowsums(termite.plot,na.rm=true) Total species richness per transect # The sum of presences is species richness termite.s<-rowsums(termite.plot.pa,na.rm=true) 5.2 Calculate similarity matrices To quantify the changes in termite species composition, the Bray-Curtis pairwise dissimilarity matrix was calculated using the vegdist function. We added a column to the site x species dataframe with 1s in all entries so that sites without any shared species are not considered completely dissimilar, and sites without any species can also be included. Composition termite.bray<-vegdist(cbind(termite.plot.pa,1),"bray")# Used in dbrda analyses termite.pcoa<-cmdscale(termite.bray,k=2,add = TRUE) 6 MEMs construction This section can be skipped if the reader is not interested in the particularities of the sampling design used in this paper. A more general and simpler way to construct MEMs is provided in Dray et al. (2012). In this paper, transects were nested within grids, and this nested design was used to create an overall connectivity matrix. The connectivity matrix represents two hierarchical levels, and assume that transects within a grid are much more connected to each other than transects in separate grids. Moreover, all transects within a grid have the same connectivity with a transect in another grid. The hierarchical matrix was designed to represent local communities (within grid), and a broad metacommunity (between grids). 8

20 6.1 Extract geogrpahical coordinates for transects and grids In this step we will start preparing the data for the construction of Moran Eigenvector Maps. The first step is to create a matrix with two columns representing the spatial coordinates of the sampling transects (eg. LatLong). We then aggregated the coordinates of transects within each grid. To obtain the coordinates of the grid, the mean of the coordinates of individual transects in the grid was calculated. coords<-env[,c("utm_easting","utm_northing")] # The same as LongLat but in UTM (meters) regional.coords<-aggregate(coords,list(env$gridid),mean)# Spatial Coordinates for grids 6.2 Create connectivity matrix Because we wanted to treat differently the transects within the same grid from those in distinct grids (such as in communities within a metacommunity), we started by creating two connectivity matrices: transect-transect within individual grids and transect-transect between grids. Note that if you have a simpler sampling design, you can create a single matrix representing the pairwise connectivity between all pairs of sites, and then run the eigen analysis on it (see R code in Dray et al. 2012). However, using this simple method in our analyses would produce a connectivity matrix in which a single transect from a grid is connected to a single transect in the adjacent grid. Moreover, some transects would be more connected to other transects in other grids than to other transects within the same grid Transect-transect within grids In the within matrix, all transects that are separated by less than 1700 meters are connected. This magic number is just any number greater than the distance between transects in the diagonal (Hypotenuse = square root of (1km in the side + 1km in the other side); see map in the main article). This number can be referred as the truncation treshold in the literature. # Create dispersal matrix to represent spatial autocorrelation Local communities # Connect plots that are less than 1700 meters apart, but not self local.nb.mat<-((as.matrix(dist(coords))<1700&as.matrix(dist(coords))>5)*1) # Prob of leaving a plot to metacom = prob of leaving to another plot within grid pleave<-1/(colsums(local.nb.mat)+1) # probs to disperse to neighboring plots within grid within<-t(local.nb.mat/(colsums(local.nb.mat)+1)) Transect-transect among grids The between grid connectivity matrix is similar to the within grid, but the criteria to connect transects is different. Now there is an extra step to create a Gabriel graph that will inform which grids are connected to one another. # Regional community (dispersal between grids of plots) # determine in which grid (hub in metacomm), a particular transect is reg.hub<-as.integer(env$gridid) 9

21 # Use Delaunay triangulation to create metacommunity structure (requires spdep package) regional.nb<-graph2nb(gabrielneigh(as.matrix(regional.coords[,2:3])), sym=t,row.names = regional.coords[,1]) # Calculate the number of neighbors a grid (hub) has reg.nlink<-sapply(regional.nb,length) # Create a matrix of 1s & 0s of dispersal from transects to grids regional.nb.mat<-sapply(regional.nb,function(x){(1*matrix(reg.hub%in%as.integer(x)))}) # Create matrix of dispersal from plot to plot through the metacommunity regional.nb.mat1<-t(t(regional.nb.mat)*(1/reg.nlink))[,reg.hub] # Define how many plots each grid is connected to hub.out<-(1/(table(reg.hub)))[reg.hub] # Final probability of dispersal plot-plot in metacommunity between<-matrix(hub.out,nrow(env),nrow(env))*regional.nb.mat1 between.nb<-sapply(as.data.frame((t(pleave*t(between)))),function(x)(1:length(x))[x!=0]) attributes(between.nb)<-list(class="nb",sym=true) Merge local and regional dispersal into a single matrix In a final step, the within and between connectivity matrices were merged. In this final matrix, pairs of transects were connected by being in the same grid, or by being in distinct grids that were close to each other. We determined that the probability of an individual to leave the grid would be the same as the probability to move from one transect to the next within the grid. 2 # merge local and regional: # p(plot2plot)=p(leave & arrive by metacomm hub) or p(stay in grid and move) plot2plot<-(t(pleave*t(between)))+within # transform into a list removing zeros (sparse matrix) plot2plot.nb<-sapply(as.data.frame(plot2plot),function(x)(1:length(x))[x!=0]) # Transform into a class nb (spdep), just for plotting attributes(plot2plot.nb)<-list(class="nb",sym=true) After all this hard work, here is a graph showing the between and within grid connectivity. Remember that all this complexity was very particular for this study, and other sampling designs and other ecological questions might require different approaches. 6.3 Create Moran Eigenvectors Maps (MEMs) Now that a single connectivity matrix W was created, W needs to be centered, and then an eigen analysis using this matrix generates multiple vectors (called Moran Eigenvector Maps). We used the functions in the spdep and spacemaker packages to do this. 2 This procedure is different from the usual in PCNM analyses. In most studies, the connectivity is 1 divided by the total number of connections, so that the connectivity of one transect to any other is the same. Here, the connectivity of one transect to another within the grid is 1/(neighbors in the grid)+1, whereas the connectivity of one transect to another in another grid is 1/(neighbors in same grid + transects in the other grid). 10

22 Figure 1: Connectivity of all transects with detail for one of the sampling grids Calculate Moran EigenVectors representing spatial autocorrelation glist<-sapply(as.data.frame(plot2plot),function(x)x[x!=0])# Create spatial weights list lw1 <- nb2listw(plot2plot.nb,glist)# transform nbb obj into listw (with spatial weights) MEMs <- scores.listw(lw1)#calculate Eigenvalues and vectors for w listw not symmetric, (w+t(w)) used in the place of w 6.4 Selection of MEMs with spatial autocorrelation Because the eigen analysis generated 197 vectors that could be individualy used as predictor variables in regression and RDA models, we selected only those vectors with significant spatial autocorrelation. Note that the alternative hypothesis for MEMs with positive and negetive associated eigenvalues is different. This happens because the vectors with negative eigenvalues represent negative spatial autocorrelation (so we are interested in the probability of getting values so low or even lower than observed). # Detect if moran eigenvectors have significant spatial autocorrelation pvals.less<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="less",reps=1000) pvals.greater<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="greater", reps=1000) pvals<-ifelse(mems$values<0,pvals.less,pvals.greater) #pvals<-pvals.greater # When using only positive eigenvalues # Select only vectors with significant spatial aucorrelation 11

23 MEMs.signif.vec <- as.data.frame(mems$vectors[,pvals<0.01]) MEMs.signif.val <- MEMs$values[pvals<0.01] names(mems.signif.val)<-colnames(mems.signif.vec) 7 Analyses (species richness and composition analyzed separatelly) In the previous section MEMs were created to represent spatial structure in the data. These MEMs are vectors (similar to vectors representing environmental variables, lat, long, etc) that can be incorporated into any classical statistical analysis, such as regressions. In this section, we show how the analyses using MEMs as co-variates are performed to test the effect of the environment and space on species richness and composition. The first step in all these models is to reduce the number of MEMs in a way that only MEMs with high explanatory power are included in statistical models. 7.1 Species richness Forward selection of MEMs To further reduce the number of spatial predictor variables in our regression and RDA models, we selected only MEMs that had a high explanatory power for an individual response variable. Note that the rda function from the vegan package is used even for termite species richness (univariate). It turns out that RDA is an extension of regression when multiple response variables exist, and RDA is exactly the same as a regression if used with a single response variable. To facilitate the forward selection using R2, and to standardize our procedures, we used the rda function for uni and multivariate models. set.seed(102)# Use to obtain exactly the same results Regression models with intercept as predictor reg.s0 <- rda(termite.s ~ 1, data=mems.signif.vec) # Create model with all MEMs as predictors reg.s.mem <- rda(termite.s ~., data=mems.signif.vec) #use rda to perform forward selection based on R2 reg.s.fw <- ordir2step(reg.s0, formula(reg.s.mem), steps = 10000, direction="forward",r2scope=false) sel<-attr(reg.s.fw$terms,"term.labels") MEMs.S.sel<-MEMs.signif.vec[,sel] MEMvals.S.sel<-MEMs.signif.val[sel] Either using the rda or lm functions, it is simple to calculate the variation in termite species richness explained by broad and local scale spatial autocorrelation: Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) After obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors and biogeographic regions. 12

24 # Model with only environemntal data reg.s<-lm(termite.s~.,data=cbind(predictors.std,rivers));summary(reg.s) Call: lm(formula = termite.s ~., data = cbind(predictors.std, rivers)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Temp ** Prec * TreeCover lnp lnbases Clay riversguianaeast riversnegro riversinambari riversrondonia Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 187 DF, p-value: 2.255e Regressions after the removal of spatial autocorrelation Because the output of the forward selection is a regression model with the spatial predictors, we can extract the residuals from this model. The residuals can then be regressed agains the environmental variables. In other words, we are asking whether the variance not explained by space can be explained by the environment. #combine all data into a single dataframe alldata<-cbind(predictors.std,rivers,mems.s.sel) #regression analysis reg.s.full<-lm(termite.s~.,data=alldata);summary(reg.s.full) Call: lm(formula = termite.s ~., data = alldata) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) 13

25 (Intercept) e-10 *** Temp Prec * TreeCover lnp lnbases Clay riversguianaeast * riversnegro ** riversinambari * riversrondonia ** V V V e-05 *** V V ** V ** V ** V ** V *** V ** V V * V V * V V * V * V * V * V * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 167 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 8.98 on 30 and 167 DF, p-value: < 2.2e-16 reg.s.noriver<-lm(termite.s~.,data=cbind(predictors.std,mems.s.sel)) anova(reg.s.full,reg.s.noriver) Analysis of Variance Table Model 1: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + rivers + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Model 2: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Res.Df RSS Df Sum of Sq F Pr(>F) * 14

26 --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Using rda and permutation to test for significance of individual variables #(similar to regression) #to use varpart with mix of quantitative and categorical variables hs1 <- dudi.hillsmith(rivers, scannf = F, nf = 50) # Temp individually anova(rda(termite.s,predictors.std[,1],cbind(predictors.std[,-1],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 1], Z = cbind(predictors.std[, -1], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model Residual # Prec individually anova(rda(termite.s,predictors.std[,2],cbind(predictors.std[,-2],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 2], Z = cbind(predictors.std[, -2], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #... # Not interested in individual variables at this moment Variance partitioning We can particionate the variance into many different components and use the varpart function to simplify this procedure #Decompose R2 into Rivers, Environment, and Space # Importance of components plot(varpart(termite.s,hs1$li,predictors.std,mems.s.sel),xnames=c("riv","env","dist")) 15

27 Riv Env Dist Residuals = 0.45 Values <0 not shown # Test for signifcance of Rivers, Environment, and Space anova(rda(termite.s,hs1$li,cbind(mems.s.sel,predictors)))#rivers Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = hs1$li, Z = cbind(mems.s.sel, predictors)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,predictors,cbind(mems.s.sel,hs1$li)))#environment Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors, Z = cbind(mems.s.sel, hs1$li)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,mems.s.sel,cbind(predictors,hs1$li)))#space Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = MEMs.S.sel, Z = cbind(predictors, hs1$li)) Df Variance F Pr(>F) 16

28 Model *** Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' Species composition Forward selection of MEMs Similar to what was performed previously, the first step in the analysis of species composition was to filter spatial predictors (MEMs) with high explanatory power for speies composition. Because we are using Bray-Curtis dissimilarities as the response variable in the Redundancy Analysis, we will use the capscale function (or distance-based RDA or simply db-rda). set.seed(102)# Use to obtain exactly the same results db-rda models with intercept as predictor rda.comp0 <- capscale(termite.bray ~ 1, data=mems.signif.vec,add=true) # Create model with all MEMs as predictors rda.comp.mem <- capscale(termite.bray ~., data=mems.signif.vec,add=true) #use rda to perform forward selection based on R2 rda.comp.fw <- ordir2step(rda.comp0, formula(rda.comp.mem),direction="forward") sel<-attr(rda.comp.fw$terms,"term.labels") MEMs.comp.sel<-MEMs.signif.vec[,sel] MEMvals.comp.sel<-MEMs.signif.val[sel] Again, we calculate the total variation in termite species composition (dbrda axes) explained by space RDA analysis - Species composition As with richness, after obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors Simple RDA (without accounting for spatial autocorrelation) # Model with only environemntal data rda.comp<-capscale(termite.bray~.,data=cbind(predictors.std),add=true) anova(rda.comp,by="margin") Permutation test for capscale under reduced model Marginal effects of terms Permutation: free Number of permutations: 999 Model: capscale(formula = termite.bray ~ Temp + Prec + TreeCover + lnp + lnbases + Clay, data = cbind Df SumOfSqs F Pr(>F) Temp *** 17

Diversity and composition of termites in Amazonia CSDambros 09 January, 2015

Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Put the abstract here Missing code is being cleaned. Abstract Contents 1 Intro 3 2 Load required packages 3 3 Import data 3

More information

Analysis of Multivariate Ecological Data

Analysis of Multivariate Ecological Data Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques

More information

Community surveys through space and time: testing the space-time interaction in the absence of replication

Community surveys through space and time: testing the space-time interaction in the absence of replication Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre, Miquel De Cáceres & Daniel Borcard Département de sciences biologiques, Université

More information

Community surveys through space and time: testing the space-time interaction in the absence of replication

Community surveys through space and time: testing the space-time interaction in the absence of replication Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/

More information

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)

4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata) 0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584

More information

Appendix A : rational of the spatial Principal Component Analysis

Appendix A : rational of the spatial Principal Component Analysis Appendix A : rational of the spatial Principal Component Analysis In this appendix, the following notations are used : X is the n-by-p table of centred allelic frequencies, where rows are observations

More information

Community surveys through space and time: testing the space time interaction

Community surveys through space and time: testing the space time interaction Suivi spatio-temporel des écosystèmes : tester l'interaction espace-temps pour identifier les impacts sur les communautés Community surveys through space and time: testing the space time interaction Pierre

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data

Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca

More information

Figure 43 - The three components of spatial variation

Figure 43 - The three components of spatial variation Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,

More information

Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest

Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest Ecography 40: 1242 1250, 2017 doi: 10.1111/ecog.02663 2016 The Authors. Ecography 2016 Nordic Society Oikos Subject Editor: Andres Baselga. Editor-in-Chief: Catherine Graham. Accepted 12 September 2016

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Supplementary Material

Supplementary Material Supplementary Material The impact of logging and forest conversion to oil palm on soil bacterial communities in Borneo Larisa Lee-Cruz 1, David P. Edwards 2,3, Binu Tripathi 1, Jonathan M. Adams 1* 1 Department

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis

VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca

More information

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R

Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual

More information

Continuous soil attribute modeling and mapping: Multiple linear regression

Continuous soil attribute modeling and mapping: Multiple linear regression Continuous soil attribute modeling and mapping: Multiple linear regression Soil Security Laboratory 2017 1 Multiple linear regression Multiple linear regression (MLR) is where we regress a target variable

More information

2/7/2018. Strata. Strata

2/7/2018. Strata. Strata The strata option allows you to control how permutations are done. Specifically, to constrain permutations. Why would you want to do this? In this dataset, there are clear differences in area (A vs. B),

More information

Spatial eigenfunction modelling: recent developments

Spatial eigenfunction modelling: recent developments Spatial eigenfunction modelling: recent developments Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Outline of the presentation

More information

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)

DETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008) Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND

More information

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Analysis of Multivariate Ecological Data

Analysis of Multivariate Ecological Data Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Partial regression and variation partitioning

Partial regression and variation partitioning Partial regression and variation partitioning Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Outline of the presentation

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

Analysis of community ecology data in R

Analysis of community ecology data in R Analysis of community ecology data in R Jinliang Liu ( 刘金亮 ) Institute of Ecology, College of Life Science Zhejiang University Email: jinliang.liu@foxmail.com http://jinliang.weebly.com R packages ###

More information

Historical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles

Historical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles Electronic Supplementary Material Historical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles Ignacio Morales-Castilla 1,2 *, Jonathan T. Davies 3 and

More information

Natureza & Conservação Brazilian Journal of Nature Conservation

Natureza & Conservação Brazilian Journal of Nature Conservation NAT CONSERVACAO. 2014; 12(1):42-46 Natureza & Conservação Brazilian Journal of Nature Conservation Supported by O Boticário Foundation for Nature Protection Research Letters Spatial and environmental patterns

More information

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures

Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data

2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Inference with Heteroskedasticity

Inference with Heteroskedasticity Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

R Demonstration ANCOVA

R Demonstration ANCOVA R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Multiple Regression Introduction to Statistics Using R (Psychology 9041B)

Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment

More information

General Linear Statistical Models - Part III

General Linear Statistical Models - Part III General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Tests of Linear Restrictions

Tests of Linear Restrictions Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some

More information

Principal components

Principal components Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,

More information

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,

More information

Variance Decomposition and Goodness of Fit

Variance Decomposition and Goodness of Fit Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings

More information

Creating and Managing a W Matrix

Creating and Managing a W Matrix Creating and Managing a W Matrix Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Junel 22th, 2016 C. Hurtado (UIUC - Economics) Spatial Econometrics

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

Lab 7. Direct & Indirect Gradient Analysis

Lab 7. Direct & Indirect Gradient Analysis Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.

More information

MEMGENE package for R: Tutorials

MEMGENE package for R: Tutorials MEMGENE package for R: Tutorials Paul Galpern 1,2 and Pedro Peres-Neto 3 1 Faculty of Environmental Design, University of Calgary 2 Natural Resources Institute, University of Manitoba 3 Département des

More information

What are the important spatial scales in an ecosystem?

What are the important spatial scales in an ecosystem? What are the important spatial scales in an ecosystem? Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca http://www.bio.umontreal.ca/legendre/ Seminar,

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2014 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200 Spring 2014 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2014 University of California, Berkeley D.D. Ackerly April 16, 2014. Community Ecology and Phylogenetics Readings: Cavender-Bares,

More information

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...

Chaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ... Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A

More information

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context. by Introduction Problem Do the patterns of forest fires change over time? Do forest fires occur in clusters, and do the clusters change over time? Is this information useful in fighting forest fires? This

More information

Package PVR. February 15, 2013

Package PVR. February 15, 2013 Package PVR February 15, 2013 Type Package Title Computes phylogenetic eigenvectors regression (PVR) and phylogenetic signal-representation curve (PSR) (with null and Brownian expectations) Version 0.2.1

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment

More information

Lecture 5 Geostatistics

Lecture 5 Geostatistics Lecture 5 Geostatistics Lecture Outline Spatial Estimation Spatial Interpolation Spatial Prediction Sampling Spatial Interpolation Methods Spatial Prediction Methods Interpolating Raster Surfaces with

More information

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix

Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Michael Friendly and Matthew Sigal September 18, 2017 Contents Introduction 1 1 Visualizing mean differences: The HE plot framework

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder

Last updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder Last updated: Oct 18, 2012 LINEAR REGRESSION Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. Simple Linear Objectives

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly

G562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly Statistical Tests Basic components of GMM Procrustes This aligns shapes and minimizes differences between them to ensure that only real shape differences are measured. PCA (primary use) This creates a

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ) 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response

More information

Lecture 8. Spatial Estimation

Lecture 8. Spatial Estimation Lecture 8 Spatial Estimation Lecture Outline Spatial Estimation Spatial Interpolation Spatial Prediction Sampling Spatial Interpolation Methods Spatial Prediction Methods Interpolating Raster Surfaces

More information

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield

Areal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield Areal data Reminder about types of data Geostatistical data: Z(s) exists everyhere, varies continuously Can accommodate sudden changes by a model for the mean E.g., soil ph, two soil types with different

More information

Differentially Private Linear Regression

Differentially Private Linear Regression Differentially Private Linear Regression Christian Baehr August 5, 2017 Your abstract. Abstract 1 Introduction My research involved testing and implementing tools into the Harvard Privacy Tools Project

More information

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) 22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are

More information

Psychology 405: Psychometric Theory

Psychology 405: Psychometric Theory Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

Multiple Linear Regression. Chapter 12

Multiple Linear Regression. Chapter 12 13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Multivariate analysis of genetic data: an introduction

Multivariate analysis of genetic data: an introduction Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July

More information

STAT 3022 Spring 2007

STAT 3022 Spring 2007 Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so

More information

Basics of Geographic Analysis in R

Basics of Geographic Analysis in R Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization

More information