Ecography. Supplementary material
|
|
- Elinor Heath
- 5 years ago
- Views:
Transcription
1 Ecography ECOG Dambros, C. S., Moraris, J. W., Azevedo, R. A. and Gotelli, N. J Isolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest. Ecography doi: / ecog Supplementary material
2 APPENDIX 1 DETAILED DESCRIPTION OF METHODS AND SUPPLEMENTARY RESULTS Text A1 Details on the development of a termite sampling protocol for ecological studies We used the data collected at Ducke Forest, central Amazonia, Brazil, to establish a sampling design for ecological studies of termites that would maximize the power of statistical tests. The results obtained were used to develop a sampling protocol to be used in other parts of Amazonia. At Ducke Forest, termites were sampled in m long transects. The 250m long transects are used for Long Term Ecological Research (LTER) and are ideal to sample termites and other groups, and to compare their distribution (Magnusson et al. 2005). The transects followed the elevation isocline to minimize variation in edaphic conditions within transects. The sampling protocol used within each transect was modified from Jones and Eggleton (2000). In each transect, 10 sections of 2x5m were sampled and termites were searched in logs, branches, leaves, and soil for 20 minutes by 3 investigators. The sampling yield a total sampling effort of 300 sections searched. Each transect was used as a sampling unit in regression analyses testing for the association of soil variables and termite species composition. Termite species composition was measured as the Bray- Curtis dissimilarity index between all pairs of transects. A Principal Coordinates Analyses (PCoA) was performed using the Bray-Curtis dissimilarity matrix and the first axis of PCoA was used as response variable in regression models. The weakest association detected was of termite species composition against soil bases (R 2 = 0.16; P = 0.03). Soil clay content was the stronger association detected (R 2 = 0.42; P < 0.01). Termite occurrence and species richness was associated only with ant predator density (Dambros et al. 2016). To develop the sampling design, we have considered the trade-off in costs of sampling a given number of transects and the number of sections surveyed within each transect. Considering that a research project has resources to sample a given number of sections, we determined the distribution of these sections that would maximize statistical power (ie. probability of detecting an association when it exists). For example, if one has the resources and time to sample 150 sections, what is the best arrangement of sections that maximize statistical power? Intensively sample 15 transects with 10 sections each, or spread the sections in 30 transects with 5 sections each? Of course, the costs of sampling two sections in a single transect is not the same as sampling one section in two transects. We would need to include several other aspects to truly establish the most cost effective sampling design. The costs of sampling additional sections within a transect or additional transects probably differ among studies considering the logistics of sampling areas, etc. However, in our study we are considering LTER sites, where trails and transects were pre-established, so most of the logistic costs are not relevant for this study. In other words, in our study, sampling four sections in one transect or two transects with two sections each have about the same cost. It is also important to note that our transects followed the elevation isocline (not necessarily linear), and there is little environmental variation within each transect. Therefore, the similarity in termite species richness and composition between two sections within a transect is higher in our study than in other termite studies (eg. Jones and Eggleton 2000; Davies et al. 2003). This is relevant because sections within a transect are highly redundant in our study and provide similar information, whereas in other studies sections can be complementary. To test the effect of the reduction in the number of transects and sections on the association of termite species composition and soil bases and soil clay content, we rarefied the number of transects and sections sampled per transect. We then re-run all analyses using the reduced dataset 1000 times and determined if the association would be detected at an alpha level of The rarefaction of sections is similar to what was conducted by Jones and Eggleton (2000), but in our study we are not evaluating the
3 number of species sampled, but the power of statistical tests commonly used in ecological studies. For both soil bases and soil clay content, higher statistical power was detected when increasing the number of transects and sections sampled (Fig. A1). However, the increase in the number of transects had a stronger influence in the power of statistical tests than the number of sections sampled per transect (Fig. A1). For example, sampling 150 sections spread in 25 transects (6 sections each) would lead to the detection of the association between soil bases and termite species composition ~40% of the time. In contrast, the association would be detected only ~20% of the time when sampling 150 sections spread in 15 transects (10 sections each). Statistical power is still relatively high when sampling only one section per transect if 30 transects (or more) are sampled. Our results support the sampling of fewer than 10 sections per transect in ecological studies of termites when multiple transects (>30) are sampled. However, the particularities of our sampling design should be taken into account when designing future studies.
4 Text A2 - Detailed description of rarefaction procedure applied to individual transects with more than five sections. To calculate the abundance of each species expected by sampling five sections in those transects with more than five sections, we divided the species abundances by the number of sections sampled in a given transect. This measurement represents the density of termites from a particular species in a transect. For example, a species with abundance of 10 colonies in a transect where 10 sections were sampled has a density of 1 colony per section. To obtain the abundance expected in five sections, we multiplied the species density in a given transect by five. The expected abundance for all species within a transect was measured as the sum of the expected abundances for individual species. To calculate the probability of a species to occur in a given transect by sampling only five sections, or the expected presence of a particular species in a given transect, we derived the following formula: where N represents the number of sections surveyed, N i represents the number of sections where species i was present, and n represents the number of sections to be subsampled (in our case n = 5 for all transects). The code to run this calculation in R is 1-(factorial(N-n)/factorial(N))*(factorial(N-Ni)/factorial(N-Ni-n)) Note that this formula calculates the number of species that would be sampled in 5 sections in a single try (without replacement). This calculation is different from sequentially sampling one section, replacing it, and repeating the procedure until five sections were obtained. In the later case, the calculation would be simply, The estimated species richness per transect was calculated as the sum of the probability of occurrence for all species sampled in each transect, or The results obtained using these formulas provide the same results as randomly selecting five sections in each transect, and recording the species abundances, species richness, and presence and absence for each species. To demonstrate this, we randomly selected only five sections in all transects (rarefaction), and used measures, such as termite abundance, obtained in five sections for analyses. The random selection of sections was repeated 999 times for each transect, and the mean abundance, mean
5 species richness, and mean abundance per species was recorded. Note that for transects where only five sections were sampled, the results from rarefaction are identical to the observed values because there is only one possible combination of five sections that could be selected in a randomization. For transects with more than five sections, the resulting values represented the average values that would be obtained by sampling only five transects. This procedure should not change the type I error rate of our analyses, but should increase the statistical power of our models (compared to sampling only five sections in all transects) because the values obtained in those transects with more than five sections represent a more precise measurement, closer to the true expected value.
6 Text A3 - Detailed description for the construction of Moran Eigenvector Maps and associated weighting matrix, w. In our study, two sampling designs were used. In each of 12 grids within the Amazonian forest, we sampled from five to 32 transects spaced regularly in intervals of 1 km. The transects were organized within regular grids, whereas the grids had an irregular distribution (Fig 1 in main text). We determined that transects within a grid should be much more connected than transects in distinct grids. The idea in our procedure was to represent a local community within a grid, and a metacommunity among grids in a hierarchy. We established that 1) transects close to each other within a grid would be connected; and 2) that the connectivity (eg. dispersal probability) between two transects within a grid would be equal to the connectivity of a transect with all transects outside the grid combined. The connectivity matrix between pairs of transects within a grid was created by connecting each transect to all its adjacent transects in a radius of (Moore neighborhood; 1 if connected, zero otherwise; Fig. 1b in manuscript). We then multiplied the within grid connectivity matrix by 1/(1 + n i ), where n i represents the number of neighbors to which a given transect is connected to. We added 1 in the denominator because each transect was later connected to other transects outside the grid (Fig. A2). The connectivity between grids was determined by a Gabriel graph (Legendre and Legendre 2012) and was used to determine the connectivity between pairs of transects in distinct grids (1 if connected, zero otherwise). The matrix of connectivity between transects in distinct grids was then multiplied element-wise by 1/[(1 + n i )g j ], where g j represents the number of transects sampled in the grid where a given transect is located. Finally, we summed both matrices to obtain w. If we had only two grids with 2 transects each, the connectivity between the transects within a grid would be 1/(1+1), or 0.5. The connectivity of two transects in distinct grids would be 1/[(1+1)2], or Moran Eigenvector Maps construction and selection To create MEMs, we run an eigen analyses on the final connectivity matrix w. The eigen analysis generated 197 vectors representing spatial autocorrelation from broad to fine spatial scales, which were determined from their associated eigenvalues (large and small eigenvalues represent broad and fine spatial autocorrelation, respectively; Dray et al. 2012). To reduce the number of vectors to be included in our models, we performed two further steps. First, we assessed the spatial autocorrelation of MEMs by calculating Moran's I, and selected only MEMs significantly correlated with the geographical distance separating transects (Dray et al. 2012). Second, we created a regression or RDA model, when appropriate, using only MEMs as predictor variables of termite abundance, species richness, and species composition (PCoA axis using the Bray-Curtis dissimilarity matrix). We then run a forward stepwise selection of MEMs based on the adjusted R 2 of the model (Dray et al. 2012; Legendre and Gauthier 2014). This procedure was conducted independently for each response variable, and the final number of MEMs depended on the explanatory power of each MEM for a particular variable. Note that because MEMs are orthogonal and independent, the inclusion of all MEMs in the analyses would explain 100% of the variation in any response variable. So, including all MEMs would not be very informative. After the forward selection, the selected MEMs were divided into two groups: Broad and fine scale MEMs. Finally, we applied a variance partitioning approach to separate the portion of variance in the response variable explained by 1) spatial autocorrelation in species distribution that could be a result of limited dispersal in fine scales; 2) spatial autocorrelation in species distribution that could be a result of limited dispersal in broad scales; 3) species association with environmental variables spatially structured in fine scales; 4) species association with environmental variables spatially structured in broad scales; 5) species association with non spatially structured variables; and 6) residual variation.
7 Figure A1. Probability of detecting an association of termite species composition with soil bases and soil clay content when the number of transects and the number of sections within each transect is rarefied. Termite species composition was measured as the first Principal Component of a Principal Coordinates Analysis (PCoA) on the Bray-Curtis pairwise dissimilarity matrix. Arrows represent two scenarios for spreading sections in transects with the same sampling effort (measured as the overall number of sections). Highest power is obtained by increasing the number of transects sampled. See main text for details on the measurements of soil bases and soil clay content.
8 Figure A2 Biplot based on a distance-based Redundancy Analysis (db-rda) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the abundance balanced component of the Bray-Curtis dissimilarity index (Baselga 2013; summarized in PCoA axes in db-rda analysis). Polygons represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.
9 Figure A3. Biplot based on a Non-metric Multidimensional Scaling analysis (NMDS) representing the association of termite species composition (response variable) and environmental variables (predictor variables) before (a-b) and after (c-d) the removal of spatial structure on termite data. Termite species composition was measured using the Bray-Curtis dissimilarity index (summarized in NMDS axes in NMDS analysis). Polygons in (a) and (c) represent clusters of transects delimited by the major rivers in Amazonia. Temp: mean annual temperature; Prec: mean annual precipitation.
10 References Baselga, A. (2013) Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients. Methods in Ecology and Evolution, 4, Dambros, C.S., Morais, J.W., Vasconcellos, A., Souza, J.L.P., Franklin, E. & Gotelli, N.J. (2016) Association of ant predators and edaphic conditions with termite diversity in an amazonian rain forest. Biotropica. Davies, R.G., Hernández, L.M., Eggleton, P., Didham, R.K., Fagan, L.L. & Winchester, N.N. (2003) Environmental and spatial influences upon species composition of a termite assemblage across neotropical forest islands. Journal of Tropical Ecology, 19, Dray, S., Legendre, P. & Peres-Neto, P.R. (2006) Spatial modelling: a comprehensive framework for principal coordinate analysis of neighbour matrices (PCNM). Ecological Modelling, 196, Jones, D.T. & Eggleton, P. (2000) Sampling termite assemblages in tropical forests: testing a rapid biodiversity assessment protocol. Journal of Applied Ecology, 37, Magnusson, W.E., Lima, A.P., Luizão, R., Luizão, F., Costa, F.R., CAStIlHO, C.V. de & Kinupp, V.F. (2005) RAPELD: a modification of the Gentry method for biodiversity surveys in long-term ecological research sites. Biota neotropica, 5,
11 Appendix 2
12 Diversity and composition of termites in Amazonia CSDambros 19 October, 2016 Abstract This document describes the analyses conducted in the manuscript about termite distribution in Amazonia submitted to Ecography. Environmental data were previously extracted from rasters, and missing soil data were inputed as explained in the main text. All datasets used are publicly available on-line, and the R code provides links for their individual download. Contents 1 Load required packages 4 2 Import data 4 3 Dealing with missing values 5 4 Select only predictors of interest 6 5 Create site x species matrix Calculate species overall abundance and species richness Calculate similarity matrices MEMs construction Extract geogrpahical coordinates for transects and grids Create connectivity matrix Transect-transect within grids Transect-transect among grids Merge local and regional dispersal into a single matrix Create Moran Eigenvectors Maps (MEMs) Selection of MEMs with spatial autocorrelation Analyses (species richness and composition analyzed separatelly) Species richness Forward selection of MEMs Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) Regressions after the removal of spatial autocorrelation Variance partitioning
13 7.2 Species composition Forward selection of MEMs RDA analysis - Species composition Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Apendix: Additional Analyses Species composition using Non-Metric Multidimensional Scaling (NMDS) Run NMDS and extract scores Forward selection of MEMs Simple regression (without accounting for spatial autocorrelation) Regression after the removal of spatial autocorrelation Biplot of NMDS - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using only turnover component (nestedness removed) Calculate simpson s dissimilarities (turnover component of Sorensen) Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using balanced abundances (turnover component of bray Curtis) Calculate balanced dissimilarities (turnover component of Bray) Forward selection of MEMs RDA analysis simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning
14 Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation) Species composition using RDA on hellinger transformed termite data (not db-rda) Hellinger transformation Forward selection of MEMs RDA analysis Simple RDA (without accounting for spatial autocorrelation) RDA after the removal of spatial autocorrelation Variance partitioning Biplot of RDA using turnover - Composition in biogeographic regions and environment Raw (not removing spatial autocorrelation) Residual (after removal of spatial autocorrelation)
15 1 Load required packages Some of the analyses ran in our study were transformed in functions which are publicly available on-line. The following lines load the required libraries, and download and source the necessary functions library(vegan) installed using install.packages("vegan") library(spdep) installed using install.packages("spdep") library(sp) installed using install.packages("sp") library(spacemaker) # installed using: #install.packages("tripack") #install.packages("spacemaker",repos=" source(" 2 Import data Similarly to the functions, we made our datasets publicly available on-line. They are downloaded using the next chunk of code. If you have problems running these lines, you might have an Internet connection problem (proxies can be a problem when trying to download things directly from R). Try to download the files to your directory using the provided links, and then read the files directly from your folder by changing the path to file (eg. instead of use c://file_path). Some files imported here are not original datasets. These data were transformed to facilitate analyses. For example, environmental data from rasters were extracted previously to these analyses and were incorporated into the main environmental dataframe. The original datasets, as well as scripts with data transformation are available under request to the first author. # Termite data # Import termite record data isoptera<-read.csv(" # Import termite species trait data isopterataxonid<-read.csv(" # Import environmental data env<-read.csv(" head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast
16 4 I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA P P.input 1 NA NA NA NA NA NA Dealing with missing values Some soil variables were not available for all transects. Removing these data from the analyses could prevent the detection of the association of other variables with termite species richness and composition. For example, removing data from the Jaú National Park, a distinct biogeographic region with extreme values for temperature and precipitation could affect the association of termite species composition with rivers and climate. To overcome the problem of having missing data, we performed data inputation - filling missing data with values. For those transects with missing values, we randomly selected values from other transects. 1 Grids with environmental data were sampled in all sampling regions of the study, so data inputation was spread across the study region. Moreover, 137 transects had all environmental data available, and climatic variables, tree cover, and biogeographic information were available for all transects, so data inputation was not necessary for these variables. # Input missing soil data set.seed(102) # Guarantees the results will be exactly the same every time # For each column, replace missing values with a random sample from non-missing values for(i in 9:ncol(env)){ # Create random sample from non-missing entries sample.env<-sample(env[!is.na(env[,i]),i],size = nrow(env),replace = TRUE) env[,i]<-ifelse(is.na(env[,i]),sample.env,env[,i]) # Replace NAs with random sample } head(env) PlotID GridID LONG LAT UTM_Easting UTM_Northing 1 campusufam_1_1 campusufam campusufam_1_10 campusufam campusufam_1_11 campusufam campusufam_1_12 campusufam This procedure will certainly add noise to the data and reduce the power of statistical tests (similar to the removal of sampling areas). The removal of areas with missing soil data provided similar results 5
17 5 campusufam_1_13 campusufam campusufam_1_14 campusufam Region0 Region Temp Prec lnp.input lnbases.input PC1.input 1 I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast I GuianaEast PC2.input TreeCover Clay.input K.input Mg.input Ca.input Clay K Mg Ca P P.input Select only predictors of interest To facilitate analyses, we create a new dataframe with only those variables used as predictor variables in the sttatistical models. This makes the following code more simple, especially when all variables are used in a regression model. # Select only variables to be used as predictors, combine and log transform variables predictors<-data.frame(env[,c("temp","prec","treecover")], lnp=log(env$p+1),lnbases=log(env$k+env$mg+env$ca+1),clay=env$clay) # Define the varible rivers (biogeographic region) rivers<-factor(env$region, levels=c("guianawest","guianaeast","negro","inambari","rondonia")) # Standardize predictor variables predictors.std<-decostand(predictors,"standardize") # Visualize the first rows of predictors head(predictors) Temp Prec TreeCover lnp lnbases Clay
18 Create site x species matrix The termite data provided is in the long format, that is, each occurrence is a row in the original spreadsheet. Species, sampling location, etc, are attributes (columns) in this spreadsheet. To run our analyses, we started by transforming the termite data into a short table, where each row represents a sampling location and each column represents a species. Note that long format contains more information and is preferable for storage. Before starting, the variables PlotID and TaxonID in table isoptera were reordered to be in the same order than these variables in the environmental table (env) and species info table (isopterataxonid). The number of subsamples sampled within each transect was also recorded # Make sure transects in termite dataset are in the same order as #in environmental dataset isoptera$plotid<-factor(isoptera$plotid,levels=env$plotid) # Make sure species in termite dataset are in the same order from species dataset isoptera$taxonid<-factor(isoptera$taxonid,levels=isopterataxonid$taxonid) attach(isoptera) # Create transect X species matrix termite.plot.obs<-tapply(isoptera$n,list(plotid,taxonid),sum) termite.plot.obs[is.na(termite.plot.obs)]<-0 # NAs are true zeros effort.plot<-tapply(n_subplots,plotid,mean) # Number of sections sampled per transect detach(isoptera) The number of subsamples in each transect varied as can be checked by the following code range(effort.plot) [1] 5 12 To avoid comparing transects with different sampling effort, transects with more than 5 sections were rarefied. In other words, 5 sections were randomly selected in each transect (without replacement), and the average abundance of individual species, probability of occurrence (average presence), species overall abundance, and species richness were calculated. If all transects had only 5 sections, the results would be exactly the same for the abundance and presence/absence matrices, species abundance per transect, and species richness per transect. Abundance matrix termite.plot<-(5*termite.plot.obs)/as.vector(effort.plot) Presence-Absence matrix (probability of occurrence) 7
19 #With replacement #termite.plot.pa<-1-((1-termite.plot.obs/as.vector(effort))^5) # Without replacement (as simulated and used;ignore possible warnings) termite.plot.pa<-poccur(as.vector(effort.plot),termite.plot.obs,5) # NaNs are 1s termite.plot.pa[is.nan(termite.plot.pa)]< Calculate species overall abundance and species richness Now that the matrix of estimated species abundance and presence/absence is calculated, it is ease to calculate the expected overall abundace and species richness for all species. The procedure is the same as if we were not rarefing the community: Overall species abundance is the sum of abundance for all species, and the expected species richness is the sum of the probability of occurrence of all individual species. Total abundance per transect # The sum of individual abundances is overall abundance termite.n<-rowsums(termite.plot,na.rm=true) Total species richness per transect # The sum of presences is species richness termite.s<-rowsums(termite.plot.pa,na.rm=true) 5.2 Calculate similarity matrices To quantify the changes in termite species composition, the Bray-Curtis pairwise dissimilarity matrix was calculated using the vegdist function. We added a column to the site x species dataframe with 1s in all entries so that sites without any shared species are not considered completely dissimilar, and sites without any species can also be included. Composition termite.bray<-vegdist(cbind(termite.plot.pa,1),"bray")# Used in dbrda analyses termite.pcoa<-cmdscale(termite.bray,k=2,add = TRUE) 6 MEMs construction This section can be skipped if the reader is not interested in the particularities of the sampling design used in this paper. A more general and simpler way to construct MEMs is provided in Dray et al. (2012). In this paper, transects were nested within grids, and this nested design was used to create an overall connectivity matrix. The connectivity matrix represents two hierarchical levels, and assume that transects within a grid are much more connected to each other than transects in separate grids. Moreover, all transects within a grid have the same connectivity with a transect in another grid. The hierarchical matrix was designed to represent local communities (within grid), and a broad metacommunity (between grids). 8
20 6.1 Extract geogrpahical coordinates for transects and grids In this step we will start preparing the data for the construction of Moran Eigenvector Maps. The first step is to create a matrix with two columns representing the spatial coordinates of the sampling transects (eg. LatLong). We then aggregated the coordinates of transects within each grid. To obtain the coordinates of the grid, the mean of the coordinates of individual transects in the grid was calculated. coords<-env[,c("utm_easting","utm_northing")] # The same as LongLat but in UTM (meters) regional.coords<-aggregate(coords,list(env$gridid),mean)# Spatial Coordinates for grids 6.2 Create connectivity matrix Because we wanted to treat differently the transects within the same grid from those in distinct grids (such as in communities within a metacommunity), we started by creating two connectivity matrices: transect-transect within individual grids and transect-transect between grids. Note that if you have a simpler sampling design, you can create a single matrix representing the pairwise connectivity between all pairs of sites, and then run the eigen analysis on it (see R code in Dray et al. 2012). However, using this simple method in our analyses would produce a connectivity matrix in which a single transect from a grid is connected to a single transect in the adjacent grid. Moreover, some transects would be more connected to other transects in other grids than to other transects within the same grid Transect-transect within grids In the within matrix, all transects that are separated by less than 1700 meters are connected. This magic number is just any number greater than the distance between transects in the diagonal (Hypotenuse = square root of (1km in the side + 1km in the other side); see map in the main article). This number can be referred as the truncation treshold in the literature. # Create dispersal matrix to represent spatial autocorrelation Local communities # Connect plots that are less than 1700 meters apart, but not self local.nb.mat<-((as.matrix(dist(coords))<1700&as.matrix(dist(coords))>5)*1) # Prob of leaving a plot to metacom = prob of leaving to another plot within grid pleave<-1/(colsums(local.nb.mat)+1) # probs to disperse to neighboring plots within grid within<-t(local.nb.mat/(colsums(local.nb.mat)+1)) Transect-transect among grids The between grid connectivity matrix is similar to the within grid, but the criteria to connect transects is different. Now there is an extra step to create a Gabriel graph that will inform which grids are connected to one another. # Regional community (dispersal between grids of plots) # determine in which grid (hub in metacomm), a particular transect is reg.hub<-as.integer(env$gridid) 9
21 # Use Delaunay triangulation to create metacommunity structure (requires spdep package) regional.nb<-graph2nb(gabrielneigh(as.matrix(regional.coords[,2:3])), sym=t,row.names = regional.coords[,1]) # Calculate the number of neighbors a grid (hub) has reg.nlink<-sapply(regional.nb,length) # Create a matrix of 1s & 0s of dispersal from transects to grids regional.nb.mat<-sapply(regional.nb,function(x){(1*matrix(reg.hub%in%as.integer(x)))}) # Create matrix of dispersal from plot to plot through the metacommunity regional.nb.mat1<-t(t(regional.nb.mat)*(1/reg.nlink))[,reg.hub] # Define how many plots each grid is connected to hub.out<-(1/(table(reg.hub)))[reg.hub] # Final probability of dispersal plot-plot in metacommunity between<-matrix(hub.out,nrow(env),nrow(env))*regional.nb.mat1 between.nb<-sapply(as.data.frame((t(pleave*t(between)))),function(x)(1:length(x))[x!=0]) attributes(between.nb)<-list(class="nb",sym=true) Merge local and regional dispersal into a single matrix In a final step, the within and between connectivity matrices were merged. In this final matrix, pairs of transects were connected by being in the same grid, or by being in distinct grids that were close to each other. We determined that the probability of an individual to leave the grid would be the same as the probability to move from one transect to the next within the grid. 2 # merge local and regional: # p(plot2plot)=p(leave & arrive by metacomm hub) or p(stay in grid and move) plot2plot<-(t(pleave*t(between)))+within # transform into a list removing zeros (sparse matrix) plot2plot.nb<-sapply(as.data.frame(plot2plot),function(x)(1:length(x))[x!=0]) # Transform into a class nb (spdep), just for plotting attributes(plot2plot.nb)<-list(class="nb",sym=true) After all this hard work, here is a graph showing the between and within grid connectivity. Remember that all this complexity was very particular for this study, and other sampling designs and other ecological questions might require different approaches. 6.3 Create Moran Eigenvectors Maps (MEMs) Now that a single connectivity matrix W was created, W needs to be centered, and then an eigen analysis using this matrix generates multiple vectors (called Moran Eigenvector Maps). We used the functions in the spdep and spacemaker packages to do this. 2 This procedure is different from the usual in PCNM analyses. In most studies, the connectivity is 1 divided by the total number of connections, so that the connectivity of one transect to any other is the same. Here, the connectivity of one transect to another within the grid is 1/(neighbors in the grid)+1, whereas the connectivity of one transect to another in another grid is 1/(neighbors in same grid + transects in the other grid). 10
22 Figure 1: Connectivity of all transects with detail for one of the sampling grids Calculate Moran EigenVectors representing spatial autocorrelation glist<-sapply(as.data.frame(plot2plot),function(x)x[x!=0])# Create spatial weights list lw1 <- nb2listw(plot2plot.nb,glist)# transform nbb obj into listw (with spatial weights) MEMs <- scores.listw(lw1)#calculate Eigenvalues and vectors for w listw not symmetric, (w+t(w)) used in the place of w 6.4 Selection of MEMs with spatial autocorrelation Because the eigen analysis generated 197 vectors that could be individualy used as predictor variables in regression and RDA models, we selected only those vectors with significant spatial autocorrelation. Note that the alternative hypothesis for MEMs with positive and negetive associated eigenvalues is different. This happens because the vectors with negative eigenvalues represent negative spatial autocorrelation (so we are interested in the probability of getting values so low or even lower than observed). # Detect if moran eigenvectors have significant spatial autocorrelation pvals.less<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="less",reps=1000) pvals.greater<-apply(mems$vectors,2,morani.pvals,w=plot2plot,alternative="greater", reps=1000) pvals<-ifelse(mems$values<0,pvals.less,pvals.greater) #pvals<-pvals.greater # When using only positive eigenvalues # Select only vectors with significant spatial aucorrelation 11
23 MEMs.signif.vec <- as.data.frame(mems$vectors[,pvals<0.01]) MEMs.signif.val <- MEMs$values[pvals<0.01] names(mems.signif.val)<-colnames(mems.signif.vec) 7 Analyses (species richness and composition analyzed separatelly) In the previous section MEMs were created to represent spatial structure in the data. These MEMs are vectors (similar to vectors representing environmental variables, lat, long, etc) that can be incorporated into any classical statistical analysis, such as regressions. In this section, we show how the analyses using MEMs as co-variates are performed to test the effect of the environment and space on species richness and composition. The first step in all these models is to reduce the number of MEMs in a way that only MEMs with high explanatory power are included in statistical models. 7.1 Species richness Forward selection of MEMs To further reduce the number of spatial predictor variables in our regression and RDA models, we selected only MEMs that had a high explanatory power for an individual response variable. Note that the rda function from the vegan package is used even for termite species richness (univariate). It turns out that RDA is an extension of regression when multiple response variables exist, and RDA is exactly the same as a regression if used with a single response variable. To facilitate the forward selection using R2, and to standardize our procedures, we used the rda function for uni and multivariate models. set.seed(102)# Use to obtain exactly the same results Regression models with intercept as predictor reg.s0 <- rda(termite.s ~ 1, data=mems.signif.vec) # Create model with all MEMs as predictors reg.s.mem <- rda(termite.s ~., data=mems.signif.vec) #use rda to perform forward selection based on R2 reg.s.fw <- ordir2step(reg.s0, formula(reg.s.mem), steps = 10000, direction="forward",r2scope=false) sel<-attr(reg.s.fw$terms,"term.labels") MEMs.S.sel<-MEMs.signif.vec[,sel] MEMvals.S.sel<-MEMs.signif.val[sel] Either using the rda or lm functions, it is simple to calculate the variation in termite species richness explained by broad and local scale spatial autocorrelation: Regression analysis - Species richness Simple regressions (without accounting for spatial autocorrelation) After obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors and biogeographic regions. 12
24 # Model with only environemntal data reg.s<-lm(termite.s~.,data=cbind(predictors.std,rivers));summary(reg.s) Call: lm(formula = termite.s ~., data = cbind(predictors.std, rivers)) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Temp ** Prec * TreeCover lnp lnbases Clay riversguianaeast riversnegro riversinambari riversrondonia Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 187 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 187 DF, p-value: 2.255e Regressions after the removal of spatial autocorrelation Because the output of the forward selection is a regression model with the spatial predictors, we can extract the residuals from this model. The residuals can then be regressed agains the environmental variables. In other words, we are asking whether the variance not explained by space can be explained by the environment. #combine all data into a single dataframe alldata<-cbind(predictors.std,rivers,mems.s.sel) #regression analysis reg.s.full<-lm(termite.s~.,data=alldata);summary(reg.s.full) Call: lm(formula = termite.s ~., data = alldata) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) 13
25 (Intercept) e-10 *** Temp Prec * TreeCover lnp lnbases Clay riversguianaeast * riversnegro ** riversinambari * riversrondonia ** V V V e-05 *** V V ** V ** V ** V ** V *** V ** V V * V V * V V * V * V * V * V * --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: on 167 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 8.98 on 30 and 167 DF, p-value: < 2.2e-16 reg.s.noriver<-lm(termite.s~.,data=cbind(predictors.std,mems.s.sel)) anova(reg.s.full,reg.s.noriver) Analysis of Variance Table Model 1: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + rivers + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Model 2: termite.s ~ Temp + Prec + TreeCover + lnp + lnbases + Clay + V2 + V1 + V10 + V4 + V43 + V17 + V36 + V39 + V153 + V48 + V8 + V126 + V47 + V118 + V13 + V92 + V163 + V159 + V149 + V26 Res.Df RSS Df Sum of Sq F Pr(>F) * 14
26 --- Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Using rda and permutation to test for significance of individual variables #(similar to regression) #to use varpart with mix of quantitative and categorical variables hs1 <- dudi.hillsmith(rivers, scannf = F, nf = 50) # Temp individually anova(rda(termite.s,predictors.std[,1],cbind(predictors.std[,-1],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 1], Z = cbind(predictors.std[, -1], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model Residual # Prec individually anova(rda(termite.s,predictors.std[,2],cbind(predictors.std[,-2],hs1$li,mems.s.sel))) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors.std[, 2], Z = cbind(predictors.std[, -2], hs1$li, MEMs.S.sel Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #... # Not interested in individual variables at this moment Variance partitioning We can particionate the variance into many different components and use the varpart function to simplify this procedure #Decompose R2 into Rivers, Environment, and Space # Importance of components plot(varpart(termite.s,hs1$li,predictors.std,mems.s.sel),xnames=c("riv","env","dist")) 15
27 Riv Env Dist Residuals = 0.45 Values <0 not shown # Test for signifcance of Rivers, Environment, and Space anova(rda(termite.s,hs1$li,cbind(mems.s.sel,predictors)))#rivers Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = hs1$li, Z = cbind(mems.s.sel, predictors)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,predictors,cbind(mems.s.sel,hs1$li)))#environment Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = predictors, Z = cbind(mems.s.sel, hs1$li)) Df Variance F Pr(>F) Model * Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(rda(termite.s,mems.s.sel,cbind(predictors,hs1$li)))#space Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(x = termite.s, Y = MEMs.S.sel, Z = cbind(predictors, hs1$li)) Df Variance F Pr(>F) 16
28 Model *** Residual Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' Species composition Forward selection of MEMs Similar to what was performed previously, the first step in the analysis of species composition was to filter spatial predictors (MEMs) with high explanatory power for speies composition. Because we are using Bray-Curtis dissimilarities as the response variable in the Redundancy Analysis, we will use the capscale function (or distance-based RDA or simply db-rda). set.seed(102)# Use to obtain exactly the same results db-rda models with intercept as predictor rda.comp0 <- capscale(termite.bray ~ 1, data=mems.signif.vec,add=true) # Create model with all MEMs as predictors rda.comp.mem <- capscale(termite.bray ~., data=mems.signif.vec,add=true) #use rda to perform forward selection based on R2 rda.comp.fw <- ordir2step(rda.comp0, formula(rda.comp.mem),direction="forward") sel<-attr(rda.comp.fw$terms,"term.labels") MEMs.comp.sel<-MEMs.signif.vec[,sel] MEMvals.comp.sel<-MEMs.signif.val[sel] Again, we calculate the total variation in termite species composition (dbrda axes) explained by space RDA analysis - Species composition As with richness, after obtaining the spatial predictors of our models (MEMs), we can proceed and use them as covariates along with the environmental predictors Simple RDA (without accounting for spatial autocorrelation) # Model with only environemntal data rda.comp<-capscale(termite.bray~.,data=cbind(predictors.std),add=true) anova(rda.comp,by="margin") Permutation test for capscale under reduced model Marginal effects of terms Permutation: free Number of permutations: 999 Model: capscale(formula = termite.bray ~ Temp + Prec + TreeCover + lnp + lnbases + Clay, data = cbind Df SumOfSqs F Pr(>F) Temp *** 17
Diversity and composition of termites in Amazonia CSDambros 09 January, 2015
Diversity and composition of termites in Amazonia CSDambros 09 January, 2015 Put the abstract here Missing code is being cleaned. Abstract Contents 1 Intro 3 2 Load required packages 3 3 Import data 3
More informationAnalysis of Multivariate Ecological Data
Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques
More informationCommunity surveys through space and time: testing the space-time interaction in the absence of replication
Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre, Miquel De Cáceres & Daniel Borcard Département de sciences biologiques, Université
More informationCommunity surveys through space and time: testing the space-time interaction in the absence of replication
Community surveys through space and time: testing the space-time interaction in the absence of replication Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/
More information4/4/2018. Stepwise model fitting. CCA with first three variables only Call: cca(formula = community ~ env1 + env2 + env3, data = envdata)
0 Correlation matrix for ironmental matrix 1 2 3 4 5 6 7 8 9 10 11 12 0.087451 0.113264 0.225049-0.13835 0.338366-0.01485 0.166309-0.11046 0.088327-0.41099-0.19944 1 1 2 0.087451 1 0.13723-0.27979 0.062584
More informationAppendix A : rational of the spatial Principal Component Analysis
Appendix A : rational of the spatial Principal Component Analysis In this appendix, the following notations are used : X is the n-by-p table of centred allelic frequencies, where rows are observations
More informationCommunity surveys through space and time: testing the space time interaction
Suivi spatio-temporel des écosystèmes : tester l'interaction espace-temps pour identifier les impacts sur les communautés Community surveys through space and time: testing the space time interaction Pierre
More information4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.
GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual
More informationTemporal eigenfunction methods for multiscale analysis of community composition and other multivariate data
Temporal eigenfunction methods for multiscale analysis of community composition and other multivariate data Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca
More informationFigure 43 - The three components of spatial variation
Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,
More informationIsolation by distance, not rivers, control the distribution of termite species in the Amazonian rain forest
Ecography 40: 1242 1250, 2017 doi: 10.1111/ecog.02663 2016 The Authors. Ecography 2016 Nordic Society Oikos Subject Editor: Andres Baselga. Editor-in-Chief: Catherine Graham. Accepted 12 September 2016
More informationChapter 11 Canonical analysis
Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform
More informationSupplementary Material
Supplementary Material The impact of logging and forest conversion to oil palm on soil bacterial communities in Borneo Larisa Lee-Cruz 1, David P. Edwards 2,3, Binu Tripathi 1, Jonathan M. Adams 1* 1 Department
More informationMultivariate Analysis of Ecological Data using CANOCO
Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie
More informationVarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis
VarCan (version 1): Variation Estimation and Partitioning in Canonical Analysis Pedro R. Peres-Neto March 2005 Department of Biology University of Regina Regina, SK S4S 0A2, Canada E-mail: Pedro.Peres-Neto@uregina.ca
More informationPart II { Oneway Anova, Simple Linear Regression and ANCOVA with R
Part II { Oneway Anova, Simple Linear Regression and ANCOVA with R Gilles Lamothe February 21, 2017 Contents 1 Anova with one factor 2 1.1 The data.......................................... 2 1.2 A visual
More informationContinuous soil attribute modeling and mapping: Multiple linear regression
Continuous soil attribute modeling and mapping: Multiple linear regression Soil Security Laboratory 2017 1 Multiple linear regression Multiple linear regression (MLR) is where we regress a target variable
More information2/7/2018. Strata. Strata
The strata option allows you to control how permutations are done. Specifically, to constrain permutations. Why would you want to do this? In this dataset, there are clear differences in area (A vs. B),
More informationSpatial eigenfunction modelling: recent developments
Spatial eigenfunction modelling: recent developments Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Outline of the presentation
More informationDETECTING BIOLOGICAL AND ENVIRONMENTAL CHANGES: DESIGN AND ANALYSIS OF MONITORING AND EXPERIMENTS (University of Bologna, 3-14 March 2008)
Dipartimento di Biologia Evoluzionistica Sperimentale Centro Interdipartimentale di Ricerca per le Scienze Ambientali in Ravenna INTERNATIONAL WINTER SCHOOL UNIVERSITY OF BOLOGNA DETECTING BIOLOGICAL AND
More informationDissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal
and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationINTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationSTATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002
Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.
More informationAnalysis of Multivariate Ecological Data
Analysis of Multivariate Ecological Data School on Recent Advances in Analysis of Multivariate Ecological Data 24-28 October 2016 Prof. Pierre Legendre Dr. Daniel Borcard Département de sciences biologiques
More informationLecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)
Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc
More informationPartial regression and variation partitioning
Partial regression and variation partitioning Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Outline of the presentation
More informationIntroduction to multivariate analysis Outline
Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence
More informationAnalysis of community ecology data in R
Analysis of community ecology data in R Jinliang Liu ( 刘金亮 ) Institute of Ecology, College of Life Science Zhejiang University Email: jinliang.liu@foxmail.com http://jinliang.weebly.com R packages ###
More informationHistorical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles
Electronic Supplementary Material Historical contingency, niche conservatism and the tendency for some taxa to be more diverse towards the poles Ignacio Morales-Castilla 1,2 *, Jonathan T. Davies 3 and
More informationNatureza & Conservação Brazilian Journal of Nature Conservation
NAT CONSERVACAO. 2014; 12(1):42-46 Natureza & Conservação Brazilian Journal of Nature Conservation Supported by O Boticário Foundation for Nature Protection Research Letters Spatial and environmental patterns
More informationDistance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures
Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationMultivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques
Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationEXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False
EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis
More information2/19/2018. Dataset: 85,122 islands 19,392 > 1km 2 17,883 with data
The group numbers are arbitrary. Remember that you can rotate dendrograms around any node and not change the meaning. So, the order of the clusters is not meaningful. Taking a subset of the data changes
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More information1 Multiple Regression
1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationInference with Heteroskedasticity
Inference with Heteroskedasticity Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables.
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the
More informationOrdination & PCA. Ordination. Ordination
Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation
More informationMultiple Predictor Variables: ANOVA
Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment
More informationSpatial Analysis I. Spatial data analysis Spatial analysis and inference
Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with
More informationST430 Exam 2 Solutions
ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving
More information1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal
1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method
More informationRegression and the 2-Sample t
Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression
More informationR Demonstration ANCOVA
R Demonstration ANCOVA Objective: The purpose of this week s session is to demonstrate how to perform an analysis of covariance (ANCOVA) in R, and how to plot the regression lines for each level of the
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationMultiple Regression Introduction to Statistics Using R (Psychology 9041B)
Multiple Regression Introduction to Statistics Using R (Psychology 9041B) Paul Gribble Winter, 2016 1 Correlation, Regression & Multiple Regression 1.1 Bivariate correlation The Pearson product-moment
More informationGeneral Linear Statistical Models - Part III
General Linear Statistical Models - Part III Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Interaction Models Lets examine two models involving Weight and Domestic in the cars93 dataset.
More informationData Preprocessing Tasks
Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can
More informationTests of Linear Restrictions
Tests of Linear Restrictions 1. Linear Restricted in Regression Models In this tutorial, we consider tests on general linear restrictions on regression coefficients. In other tutorials, we examine some
More informationPrincipal components
Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,
More information> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing
Homework #1 Key Spring 2014 Psyx 501, Montana State University Prof. Colleen F Moore Preliminary comments: The design is a 4x3 factorial between-groups. Non-athletes do aerobic training for 6, 4 or 2 weeks,
More informationVariance Decomposition and Goodness of Fit
Variance Decomposition and Goodness of Fit 1. Example: Monthly Earnings and Years of Education In this tutorial, we will focus on an example that explores the relationship between total monthly earnings
More informationCreating and Managing a W Matrix
Creating and Managing a W Matrix Carlos Hurtado Department of Economics University of Illinois at Urbana-Champaign hrtdmrt2@illinois.edu Junel 22th, 2016 C. Hurtado (UIUC - Economics) Spatial Econometrics
More information8. FROM CLASSICAL TO CANONICAL ORDINATION
Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical
More informationLab 7. Direct & Indirect Gradient Analysis
Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.
More informationMEMGENE package for R: Tutorials
MEMGENE package for R: Tutorials Paul Galpern 1,2 and Pedro Peres-Neto 3 1 Faculty of Environmental Design, University of Calgary 2 Natural Resources Institute, University of Manitoba 3 Département des
More informationWhat are the important spatial scales in an ecosystem?
What are the important spatial scales in an ecosystem? Pierre Legendre Département de sciences biologiques Université de Montréal Pierre.Legendre@umontreal.ca http://www.bio.umontreal.ca/legendre/ Seminar,
More information4. Ordination in reduced space
Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination
More informationStat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb
Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra
More informationTypes of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics
The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:
More information"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2014 University of California, Berkeley
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200 Spring 2014 University of California, Berkeley D.D. Ackerly April 16, 2014. Community Ecology and Phylogenetics Readings: Cavender-Bares,
More informationChaper 5: Matrix Approach to Simple Linear Regression. Matrix: A m by n matrix B is a grid of numbers with m rows and n columns. B = b 11 b m1 ...
Chaper 5: Matrix Approach to Simple Linear Regression Matrix: A m by n matrix B is a grid of numbers with m rows and n columns B = b 11 b 1n b m1 b mn Element b ik is from the ith row and kth column A
More informationThis lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.
by Introduction Problem Do the patterns of forest fires change over time? Do forest fires occur in clusters, and do the clusters change over time? Is this information useful in fighting forest fires? This
More informationPackage PVR. February 15, 2013
Package PVR February 15, 2013 Type Package Title Computes phylogenetic eigenvectors regression (PVR) and phylogenetic signal-representation curve (PSR) (with null and Brownian expectations) Version 0.2.1
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationMultivariate Statistics Fundamentals Part 1: Rotation-based Techniques
Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing
More informationGlossary. The ISI glossary of statistical terms provides definitions in a number of different languages:
Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the
More informationMultiple Predictor Variables: ANOVA
What if you manipulate two factors? Multiple Predictor Variables: ANOVA Block 1 Block 2 Block 3 Block 4 A B C D B C D A C D A B D A B C Randomized Controlled Blocked Design: Design where each treatment
More informationLecture 5 Geostatistics
Lecture 5 Geostatistics Lecture Outline Spatial Estimation Spatial Interpolation Spatial Prediction Sampling Spatial Interpolation Methods Spatial Prediction Methods Interpolating Raster Surfaces with
More informationVisualizing Tests for Equality of Covariance Matrices Supplemental Appendix
Visualizing Tests for Equality of Covariance Matrices Supplemental Appendix Michael Friendly and Matthew Sigal September 18, 2017 Contents Introduction 1 1 Visualizing mean differences: The HE plot framework
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More information22s:152 Applied Linear Regression. Returning to a continuous response variable Y...
22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous
More informationLast updated: Oct 18, 2012 LINEAR REGRESSION PSYC 3031 INTERMEDIATE STATISTICS LABORATORY. J. Elder
Last updated: Oct 18, 2012 LINEAR REGRESSION Acknowledgements 2 Some of these slides have been sourced or modified from slides created by A. Field for Discovering Statistics using R. Simple Linear Objectives
More informationMODELS WITHOUT AN INTERCEPT
Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level
More informationG562 Geometric Morphometrics. Statistical Tests. Department of Geological Sciences Indiana University. (c) 2012, P. David Polly
Statistical Tests Basic components of GMM Procrustes This aligns shapes and minimizes differences between them to ensure that only real shape differences are measured. PCA (primary use) This creates a
More informationRegression on Faithful with Section 9.3 content
Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More information22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)
22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response
More informationLecture 8. Spatial Estimation
Lecture 8 Spatial Estimation Lecture Outline Spatial Estimation Spatial Interpolation Spatial Prediction Sampling Spatial Interpolation Methods Spatial Prediction Methods Interpolating Raster Surfaces
More informationAreal data. Infant mortality, Auckland NZ districts. Number of plant species in 20cm x 20 cm patches of alpine tundra. Wheat yield
Areal data Reminder about types of data Geostatistical data: Z(s) exists everyhere, varies continuously Can accommodate sudden changes by a model for the mean E.g., soil ph, two soil types with different
More informationDifferentially Private Linear Regression
Differentially Private Linear Regression Christian Baehr August 5, 2017 Your abstract. Abstract 1 Introduction My research involved testing and implementing tools into the Harvard Privacy Tools Project
More information22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)
22s:152 Applied Linear Regression Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA) We now consider an analysis with only categorical predictors (i.e. all predictors are
More informationPsychology 405: Psychometric Theory
Psychology 405: Psychometric Theory Homework Problem Set #2 Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 15 Outline The problem, part 1) The Problem, Part 2)
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY (formerly the Examinations of the Institute of Statisticians) GRADUATE DIPLOMA, 2007 Applied Statistics I Time Allowed: Three Hours Candidates should answer
More informationAlgebra of Principal Component Analysis
Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation
More informationMultiple Linear Regression. Chapter 12
13 Multiple Linear Regression Chapter 12 Multiple Regression Analysis Definition The multiple regression model equation is Y = b 0 + b 1 x 1 + b 2 x 2 +... + b p x p + ε where E(ε) = 0 and Var(ε) = s 2.
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More informationMultivariate analysis of genetic data: an introduction
Multivariate analysis of genetic data: an introduction Thibaut Jombart MRC Centre for Outbreak Analysis and Modelling Imperial College London XXIV Simposio Internacional De Estadística Bogotá, 25th July
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationBasics of Geographic Analysis in R
Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization
More information