Species distribution modelling with MAXENT Mikael von Numers Åbo Akademi
Why model species distribution? Knowledge about the geographical distribution of species is crucial for conservation and spatial planning. Detailed data on species distribution is usually not available and collecting such data is costly and labor intensive. Conservationists have in many cases to rely on predictive models for estimating patterns of species distribution and for making conservation strategies. SDMs provide one of the best ways to overcome sparseness typical of distributional data, by relating them to a set of geographic or environmental predictors.
What do we need for SDM? Reliable data on species presences (and absences) Environmental data as GIS rasters (predictors)
Typical workflow:
Maxent A short introduction Maxent is a presence-only (po) modelling method, which means that no absence data is needed. Maxent (or other po methods) might be a good choice for instance when: There is no absence data is available (which is often the case (absences are not recorded, data from museums, herbaria etc.) There is reason to believe that the absence data is not reliable. Several other reasons, for instance: The species is not stationary (satellite tagged animals (e.g. porpoises), radiotelemetry data) The species hard to detect (e.g. reptiles) The species is temporarily absent. The species occurs in patches. You have only a single observation within a large suitable territory (for instance singing bird males)
How it works The Maxent method does not need species absences; instead it uses background environmental data for the entire study area. The method focuses on how the environment where the species is known to occur, relates on the environment across the rest of the study area. The idea is find the probability distribution of maximum entropy (most spread out), subject to constraints imposed by information available regarding the species presences and the environmental conditions across the study area (more in Phillips et al. 2006 and Elith et al. 2011: A statistical explanation of MaxEnt for ecologists. Maxent has similarities to GAM and GLM but Maxent models a probability distribution over all pixels in the study area, and in no sense are pixels without species interpreted as absences, meaning that pseudoabsences are not used.
Advantages: Maxent can use both continuous and categorical environmental variables (predictors) Maxent is able to fit complex relationships between the species and the environmental variables (features in Maxent), also including interactions between the predictors. Produces test statistics, measures of variable importance and response curves. A possibility to make cross-validations. A possibility to shift regularization parameters. These determine how focused the output distribution is. A larger parameter will give a less localized prediction. Works well together with, for instance, ArcView. Is reported to be effective with a relatively small number of presences. The output raster represents a continuous measure of probability of occurrence. Maxent is a quite new method, but it has performed excellently in tests compared to other similar methods. It is quite easy to use and has an nice user friendly interface. Shareware, active discussion group, lots of published papers recently. Download from: www.cs.princeton.edu/~schapire/maxent/ Major conclusions drawn from Elith et al. 2006: Presence-only data are useful for modelling species distributions Presence-only data can be sufficiently accurate to be used in conservation planning New modelling methods, such as MAXENT, generally outperforms established methods
Drawbacks: a black box ; not easy to understand how the method works, compared to, for instance, to GLM or GAM According to the literature not as mature a statistical method as GAM or GLM. Sample selection bias is a bigger problem for presence-only methods than for presence -absence methods. If there is a bias you will get a model that combines the species distribution with the distribution of sampling effort. There are methods to deal with this problem: you can provide Maxent with a bias raster to correct for the bias in sampling effort. If absence data are available, a presence-absence method is a better choice than a po-method.
In this case a fitted model might be closer to a model of survey effort than of distribution.
The Maxent user interface
Zostera marina
Species data: 75 presence points of Zostera marina in the S. Archipelago Sea
Species X_coord Y_coord Zostera, 3214710, 6666810 Zostera, 3191860, 6681080 Zostera, 3195940, 6674130 Zostera, 3215030, 6679040 Zostera, 3208580, 6653860 Zostera, 3184780, 6642620 Zostera, 3205750, 6669300 Zostera, 3196800, 6646150 Zostera, 3213730, 6678190 Zostera, 3206280, 6678010 Zostera, 3199600, 6647510 Zostera, 3197280, 6646490 Zostera, 3200910, 6648660 Zostera, 3212160, 6647820 Zostera, 3212160, 6647890 Zostera, 3189660, 6683280 Zostera, 3205810, 6669390 Zostera, 3213530, 6654590 Fucus, 3209220, 6657510 Fucus, 3194840, 6646240 Fucus, 3196250, 6646940 Fucus, 3189310, 6683540 Species data format: data as a comma delimited *.csv file (use Excel). only 3 columns needed: species name(s) and co-ordinates.
Predictor layers describing the environmental variables the grids has to be in ascii raster format (ESRI.asc) the grids must have the same geographic bounds and cell size. the layers can be continuous or categorical. ncols 1827 nrows 2044 xllcorner 3176430 llcorner 6636626 cellsize 25 NODATA_value -9999-9999 -9999-9999 -9999-9999 -9999-0.1697558-0.3892355-0.629083-0.8858771-1.15194-1.418818-1.683608-1.943836-2.19765-2.453322-2.724256-3.016762-3.336428-3.700734-4.129993-4.631121-5.202521-5.847729-6.573002-7.368282-8.198206-9.017972-9.795128-10.51915-11.18508-11.76465-12.1964-12.40763-12.36905-12.19018-12.21704-12.41916-13.14217-14.7096-17.03474-19.10044-20.86929-22.32145-23.51356-24.54868-25.52947-26.52113-27.53157-28.51738-29.42035-30.20646-30.87352-31.43348-31.87958-32.1587-32.1725-31.80916-30.98529-29.68661-27.99283-26.07573-24.18093-22.60346-21.62301-21.31837-21.2968-21.14873-20.38395-18.66777-15.95716-13.17438-10.75567-8.945774-6.735695-4.542916-2.320879-9999 -9999-9999 -9999-9999 -9999-9999 -9999-9999 -9999-9999 -9999-9999 - 9999-9999 -0.5598915
Predictors: Depth (DEM)
Predictors: exposure
Predictors: distance from sand. A proxy for sandy substrate (that is not available).
Predictors: Slope (derived from the DEM)
The Maxent output probability raster is an ascii (.asc) raster, which is easy to exported to ArcView for further analysis and symbolisation.
Substrate data = categorical data
Cormorant fishing areas Substrate included as a categorical variable
Worth to remember when modelling: 1. Garbage in garbage out. 2. Use a sufficient number of records. No algorithm can model extremely sparse species data. Guideline > 30 records. 3. Each record should bring new information to the model; clusters of observations -> one observation. 4. Samples should spread across the whole area of interest. -> Stratified sampling. 5. Beware of sampling bias especially in po-methods. 6. Pre-process the predictors carefully. Resolution, collinearity etc. 7. Check the model fit. ( AUC, cross validation, learn-test datasets). Large literature available. 8. Many sources of error. -> predictions will always be uncertain. -> Be realistic and cautious when interpreting the results.
Workflow: 1. The Maxent program 2. The Maxent output 3. Do a Maxent run using Zostera data and four predictor layers (individually or together) 4. Import the Maxent predictions to ArcView (together) 5. Use ArcView to mask out part of the study area (together). 6. Do a new Maxent run and compare the results.