Maximum entropy modeling of species geographic distributions. Steven Phillips. with Miro Dudik & Rob Schapire

Similar documents
Maximum Entropy and Species Distribution Modeling

A Maximum Entropy Approach to Species Distribution Modeling. Rob Schapire Steven Phillips Miro Dudík Rob Anderson.

Machine Learning for NLP

Statistical Data Mining and Machine Learning Hilary Term 2016

Machine Learning Lecture 5

Classification: Logistic Regression from Data

Logistic Regression Trained with Different Loss Functions. Discussion

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Introduction to Machine Learning

Classification: Logistic Regression from Data

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

STA 4273H: Statistical Machine Learning

Ch 4. Linear Models for Classification

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression. Machine Learning Fall 2018

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Logistic Regression. INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber SLIDES ADAPTED FROM HINRICH SCHÜTZE

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning for NLP

Logistic Regression. COMP 527 Danushka Bollegala

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Overfitting, Bias / Variance Analysis

Stochastic Gradient Descent

Learning Binary Classifiers for Multi-Class Problem

Statistical Methods for Data Mining

Introduction to Machine Learning Midterm, Tues April 8

Machine Learning Linear Classification. Prof. Matteo Matteucci

Neural Network Training

Pattern Recognition and Machine Learning

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

10-701/ Machine Learning - Midterm Exam, Fall 2010

Tufts COMP 135: Introduction to Machine Learning

Linear Models for Regression CS534

DATA MINING AND MACHINE LEARNING

6.867 Machine Learning

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Machine Learning Basics: Maximum Likelihood Estimation

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Nonparametric Bayesian Methods (Gaussian Processes)

Support Vector Machines

Logistic Regression. Seungjin Choi

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Regression, Ridge Regression, Lasso

Linear & nonlinear classifiers

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

CSC321 Lecture 4: Learning a Classifier

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Introduction to Logistic Regression

CSC321 Lecture 5: Multilayer Perceptrons

Introduction to Logistic Regression and Support Vector Machine

Qualifying Exam in Machine Learning

Self Supervised Boosting

Statistical Machine Learning Hilary Term 2018

Model Selection. Frank Wood. December 10, 2009

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

ECE521 Lecture7. Logistic Regression

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

ECE521 week 3: 23/26 January 2017

Part 4: Conditional Random Fields

Outline. - Background of coastal and marine conservation - Species distribution modeling (SDM) - Reserve selection analysis. - Results & discussion

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Linear Models for Regression CS534

CS229 Supplemental Lecture notes

CPSC 540: Machine Learning

Incorporating Boosted Regression Trees into Ecological Latent Variable Models

Machine Learning Lecture 7

Algorithmisches Lernen/Machine Learning

STA 4273H: Statistical Machine Learning

Undirected Graphical Models

Natural Language Processing. Classification. Features. Some Definitions. Classification. Feature Vectors. Classification I. Dan Klein UC Berkeley

Introduction to Machine Learning. Regression. Computer Science, Tel-Aviv University,

Variations of Logistic Regression with Stochastic Gradient Descent

Correcting sample selection bias in maximum entropy density estimation

Lecture 3a: The Origin of Variational Bayes

Learning MN Parameters with Approximation. Sargur Srihari

Warm up: risk prediction with logistic regression

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

ECE521 lecture 4: 19 January Optimization, MLE, regularization

CSC321 Lecture 4: Learning a Classifier

CS229 Supplemental Lecture notes

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Probabilistic Graphical Models & Applications

Lecture 2 Machine Learning Review

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Log-Linear Models, MEMMs, and CRFs

Applied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson

Linear discriminant functions

Generative v. Discriminative classifiers Intuition

Machine Learning for Structured Prediction

Transcription:

Maximum entropy modeling of species geographic distributions Steven Phillips with Miro Dudik & Rob Schapire

Modeling species distributions Yellow-throated Vireo occurrence points environmental variables Predicted distribution 2

Estimating a probability distribution Given: Map divided into cells Environmental variables, with values in each cell Occurrence points: samples from an unknown distribution Our task is to estimate the unknown probability distribution Note: The distribution sums to 1 over the whole map Different from estimating probability of presence Pr(t y=1) instead of Pr(y=1 x) (t=cell, y=response, x=environ) 3

The Maximum Entropy Method Origins: Jaynes 1957, statistical mechanics Recent use: machine learning, eg. automatic language translation macroecology: SAD, SAR (Harte et al. 2009) To estimate an unknown distribution: 1. Determine what you know (constraints) 2. Among distributions satisfying constraints: Output the one with maximum entropy 4

Entropy More entropy : more spread out, closer to uniform distribution 2 nd law of thermodynamics: - Without external influences, a system moves to increase entropy Maximum entropy method: - Apply constraints to remove external influences - Species spreads out to fill areas with suitable conditions 5

Using Maxent for Species Distributions Features Constraints Regularization Free software: www.cs.princeton.edu/~schapire/maxent/ 6

Features impose constraints Feature = environmental variable, or function thereof precipitation sample average temperature find distribution of such maximum that entropy such that for all features f: mean(f) = sample average of f 7

Features Environmental variables or simple functions thereof. Maxent software has these classes of features (others are possible): 1. Linear variable itself 2. Quadratic square of variable 3. Product product of two variables 4. Binary (indicator) membership in a category 5. Threshold 1 0 Environmental variable 6. Hinge 1 0 Environmental variable 8

Constraints Each feature type imposes constraints on output distribution Linear features mean Quadratic features variance Product features covariance Threshold features proportion above threshold Hinge features mean above threshold Binary features (categorical) proportion in each category 9

Regularization confidence region precipitation true mean sample average temperature find distribution of maximum entropy such that Mean(f) in confidence region of sample average of f 10

The Maxent distribution is always a Gibbs distribution: q λ (x) = exp(σ j λ j f j (x)) / Z Z is a scaling factor so distribution sums to 1 f j is the j th feature λ j is a coefficient, calculated by the program 11

Maxent is penalized maximum likelihood Log likelihood: LogLikelihood(q λ ) = 1/m Σ i ln(q λ (x i )) where x 1 x m are the occurrence points. Maxent maximizes regularized likelihood: LogLikelihood(q λ )-Σ j β j λ j where β j is the width of the confidence interval for f j Similar to Akaike Information Criterion (AIC), lasso. 12

Performance guarantees If true mean lies in confidence region then for best Gibbs q λ : β Maxent software: β tuned on a reference data set 13

Estimating probability of presence Prevalence: Number of sites where the species is present, or sum of probability of presence Prevalence not identifiable from occurrence data (Ward et al. 2009) Example: sparrow and sparrow-hawk Both have same range map Both have same geographic distribution of occurrences Hawk is rarer within its range: lower prevalence Probability of presence & prevalence depend on sampling: Site size Observation time 14

Logistic output format Minimax: maximize performance for worst-case prevalence Exponential logistic model Offset term: entropy Scaled so typical presences have value 0.5 15

Response curves How does probability of presence depend on each variable? Simple features simpler model Complex features complex model Linear + quadratic (top) Threshold features (middle) All feature types (bottom) 16

Effect of regularization: multiplier = 0.2 Smaller confidence Intervals Lower entropy Less spread-out 17

Effect of regularization: over-fitting Regularization multiplier = 1.0 (not over-fit) Regularization multiplier = 0.2 (clearly over-fit) 18

The dangers of bias Virtual species in Ontario, Canada prefers mid-range of all climatic variables 19

Boosted regression tree model: biased p/a data Presence-absence model recovers species distribution 20

Model from biased occurrence data Model recovers sampling bias, not species distribution 21

Correcting bias: golden-crowned kinglet AUC=0.3 Maxent model from biased occurrence data 22

Correcting bias with target-group background AUC=0.8 Infer sampling distribution from other species records Target group, collected by same methods 23

Aligning Conservation Priorities Across Taxa in Madagascar with High- Resolution Planning Tools C. Kremen, A. Cameron et al. Science 320, 222 (2008) 24

Madagascar: Opportunity Knocks? 2002: 1.7 million ha = 2.9%? 2003 Durban Vision: 6 million ha = 10% 2006: 3.88 million ha = 6.3%????? 25

Study outline Gather biodiversity data 2315 species: lemurs, frogs, geckos, ants, butterflies, plants Presences only, limited data, sampling biases Model species distributions: Maxent New reserve selection software: Zonation 1 km2 resolution for entire country > 700,000 units 26

Mystrium mysticum, dracula ant 27

Adansonia grandidieri, Grandidier s baobab 28

Uroplatus fimbriatus, common leaf-tailed gecko 29

Indri indri 30

Propithecus diadema, diademed sifaka 31

Grandidier s baobab Dracula ant Indri 32

Ideal = unconstrained optimized Starting from PA system: Constrained, optimized Includes temporary areas through 2006 Multi-taxon Solutions TOP 5% 5 to 10% 10 to 15% 33

Spare slides 34

Maximum Entropy Principle The fact that a certain probability distribution maximizes entropy subject to certain constraints representing our incomplete information, is the fundamental property which justifies the use of that distribution for inference; it agrees with everything that is known but carefully avoids assuming anything that is not known (Jaynes, 1990). 35

Maximizing gain Unregularized gain: Gain(q λ ) = Log likelihood - ln(1/n) E.g. if UGain=1.5, then average training sample is exp(1.5) (about 4.5) times more likely than a random background pixel Maxent maximizes regularized gain: Gain(qλ) -Σ j β j λ j 36

Maxent algorithms Goal: maximize the regularized gain Algorithm: Start with uniform distribution (gain=0) Iteratively update λ to increase the gain The gain is convex: Variety of algorithms: gradient descent, conjugate gradient, Newton, iterative scaling Our algorithm: coordinate descent 37

Interpretation of regularization 38

Conditional vs unconditional Maxent One class: Distribution over sites: p(x y=1) Maximize entropy: - Σp(x y=1) ln(p(x y=1)) Multiclass: Conditional probability of presence: Pr(y z) Maximize conditional entropy: - Σp (z) p(y z) ln(p(y z)) Notation: y 0 or 1, species presence x a site in our study region z a vector of environmental conditions p (z) the empirical probability of z 39

Effect of regularization: multiplier = 5 Larger confidence Intervals Higher entropy More spread-out 40

Sample selection bias in Ontario birds 41

Performance guarantees Solution SOL returned by Maxent is almost as good as the best q λ relative entropy (KL divergence) Guarantees should depend on number of samples m number of features n (or complexity of features) complexity of the best q λ 42