Maximizing biological diversity

Similar documents
Biodiversity measures and the role of species similarity

Maximizing Diversity in Biology and Beyond

Entropy, diversity and magnitude

The categorical origins of entropy

T i t l e o f t h e w o r k : L a M a r e a Y o k o h a m a. A r t i s t : M a r i a n o P e n s o t t i ( P l a y w r i g h t, D i r e c t o r )

How to quantify biological diversity: taxonomical, functional and evolutionary aspects. Hanna Tuomisto, University of Turku

9. The determinant. Notation: Also: A matrix, det(a) = A IR determinant of A. Calculation in the special cases n = 2 and n = 3:

Studying the effect of species dominance on diversity patterns using Hill numbers-based indices

Calculation in the special cases n = 2 and n = 3:

The Gauss-Jordan Elimination Algorithm

(1) for all (2) for all and all

Name Block Date. The Quadrat Study: An Introduction

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Lemma 8: Suppose the N by N matrix A has the following block upper triangular form:

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Matroids and Greedy Algorithms Date: 10/31/16

Math 103, Summer 2006 Determinants July 25, 2006 DETERMINANTS. 1. Some Motivation

Dynkin Diagrams or Everything You Ever Wanted to Know About Lie Algebras (But Were Too Busy To Ask)

Regression Clustering

Adam Blank Spring 2017 CSE 311. Foundations of Computing I

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra /34

BIO 111: Biological Diversity and Evolution

Exam 2 Solutions. In class questions

A TASTE OF COMBINATORIAL REPRESENTATION THEORY. MATH B4900 5/02/2018

MATH 320, WEEK 7: Matrices, Matrix Operations

Math 416, Spring 2010 The algebra of determinants March 16, 2010 THE ALGEBRA OF DETERMINANTS. 1. Determinants

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Linear Algebra March 16, 2019

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS123 INTRODUCTION TO COMPUTER GRAPHICS. Linear Algebra 1/33

MATH 387 ASSIGNMENT 2

Distance-Based Functional Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011

Number of solutions of a system

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

(II.B) Basis and dimension

Lecture 12: Momentum

Markov Chains. CS70 Summer Lecture 6B. David Dinh 26 July UC Berkeley

Definition 2.3. We define addition and multiplication of matrices as follows.

RESERVE DESIGN INTRODUCTION. Objectives. In collaboration with Wendy K. Gram. Set up a spreadsheet model of a nature reserve with two different

ETIKA V PROFESII PSYCHOLÓGA

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

Draft. Lecture 14 Eigenvalue Problems. MATH 562 Numerical Analysis II. Songting Luo. Department of Mathematics Iowa State University

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Dynamic Programming (CLRS )

Quantum Mechanics for Scientists and Engineers. David Miller

Dynamic Programming: Matrix chain multiplication (CLRS 15.2)

. =. a i1 x 1 + a i2 x 2 + a in x n = b i. a 11 a 12 a 1n a 21 a 22 a 1n. i1 a i2 a in

The implications of neutral evolution for neutral ecology. Daniel Lawson Bioinformatics and Statistics Scotland Macaulay Institute, Aberdeen

PhD topics. Tom Leinster. This document is intended for students considering doing a PhD with me.

Functions and cardinality (solutions) sections A and F TA: Clive Newstead 6 th May 2014

2. Prime and Maximal Ideals

Chapter 15. Probability Rules! Copyright 2012, 2008, 2005 Pearson Education, Inc.

More on Unsupervised Learning

Pair Hidden Markov Models

Remote sensing of biodiversity: measuring ecological complexity from space

Notes on the Matrix-Tree theorem and Cayley s tree enumerator

CS261: A Second Course in Algorithms Lecture #8: Linear Programming Duality (Part 1)

Math 495 Dr. Rick Kreminski Colorado State University Pueblo November 19, 2014 The Riemann Hypothesis: The #1 Problem in Mathematics

Linear Algebra Basics

DEFINITION OF ABELIAN VARIETIES AND THE THEOREM OF THE CUBE

Stat 206: Linear algebra

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

1 Implicit Differentiation

Lecture Notes 1: Vector spaces

14 Singular Value Decomposition

Industrial Organization Lecture 3: Game Theory

Algorithmic approaches to fitting ERG models

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

An Intuitive Introduction to Motivic Homotopy Theory Vladimir Voevodsky

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

7. Dimension and Structure.

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Definition of a Logarithm

Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Solving with Absolute Value

Stat 206: Sampling theory, sample moments, mahalanobis

An Adaptive Association Test for Microbiome Data

Problem Set 1. Homeworks will graded based on content and clarity. Please show your work clearly for full credit.

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS 6820 Fall 2014 Lectures, October 3-20, 2014

LS.1 Review of Linear Algebra

VC-DENSITY FOR TREES

Support Vector Machines

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Parabolas and lines

A PRIMER ON SESQUILINEAR FORMS

Confidence Intervals

c 1 v 1 + c 2 v 2 = 0 c 1 λ 1 v 1 + c 2 λ 1 v 2 = 0

The Life System and Environmental & Evolutionary Biology II

Biodiversity of Hydrothermal Vents at Brothers Volcano

, 500, 250, 125, , 2, 4, 7, 11, 16, , 3, 9, 27, , 3, 2, 7, , 2 2, 4, 4 2, 8

Linear equations The first case of a linear equation you learn is in one variable, for instance:

Quadratic Equations Part I

Ch. 2 Math Preliminaries for Lossless Compression. Section 2.4 Coding

SE 3310b Theoretical Foundations of Software Engineering. Turing Machines. Aleksander Essex

Linear algebra comments. Sophie Marques

CS 188: Artificial Intelligence. Bayes Nets

Draft. Lecture 12 Gaussian Elimination and LU Factorization. MATH 562 Numerical Analysis II. Songting Luo

Math 416, Spring 2010 Matrix multiplication; subspaces February 2, 2010 MATRIX MULTIPLICATION; SUBSPACES. 1. Announcements

Transcription:

Maximizing biological diversity Tom Leinster School of Mathematics University of Edinburgh Boyd Orr Centre for Population and Ecosystem Health University of Glasgow

Where we re going I ll show you a maximum entropy theorem...... with entropy interpreted as biodiversity...... but maybe you can interpret the theorem in other interesting ways. Plan: 1. Measuring biological diversity 2. The theorem 3. Examples and consequences 4. Unanswered questions

1. Measuring biological diversity joint with Christina Cobbold

Conserving species is what matters A spectrum of viewpoints on biodiversity Rare species count for as much as common ones every species is precious Conserving communities is what matters Common species are the really important ones they shape the community This ÝÑ is more diverse than that ÝÑ ÐÝ This is less diverse than ÐÝ that

A spectrum of viewpoints on biodiversity Rare species are important Rare species are unimportant This ÝÑ is more diverse than that ÝÑ ÐÝ This is less diverse than ÐÝ that

Quantifying diversity model of community formula measure of diversity

Quantifying diversity similarity matrix Z model of community

Quantifying diversity similarity matrix Z totally dissimilar n ˆ n matrix (n number of species) Z ij similarity between ith and jth species Z ji 0 ď Z ij ď 1 and Z ii 1 identical E.g.: Naive model: Z 1 0... (species have nothing in common) 0 1 E.g.: Genetic similarity. $ & 1 if same species E.g.: Taxonomic: e.g. Z ij 0.7 if different species but same genus % 0 otherwise.

Quantifying diversity similarity matrix Z frequency distribution p model of community

Quantifying diversity similarity matrix Z frequency distribution p p p 1. p n p i relative frequency, or relative abundance, of the ith species p i ě 0 and ř p i 1

Quantifying diversity similarity matrix Z frequency distribution p model of community formula measure of diversity

Quantifying diversity similarity matrix Z frequency distribution p q 0: rare species are important Ó q 8: rare species are unimportant Ó q D Z ppq (diversity of order q) 0 viewpoint parameter q

Quantifying diversity similarity matrix Z frequency distribution p model of community The diversity of order q is ÿ q D Z ppq p i pzpq q 1 i i: p i ą0 1 1 q measure of diversity q D Z ppq q

q 0 (sensitive to rare species) p p 1 Visualizations more diverse q 10 (insensitive to rare species) p 1 list of all frequency distributions less diverse p q D Z p p 1 q

2. The theorem

The central questions Take a list of species, with known similarity matrix Z. Questions Which frequency distribution(s) maximize the diversity? What is the value of the maximum diversity? Remember the birds! In principle, the answers depend on the viewpoint parameter q.

The solution Theorem (2009) Neither answer depends on q. That is, There is a single frequency distribution p max that maximizes diversity of all orders q simultaneously (0 ď q ď 8) a best of all possible worlds. The maximum diversity, q D Z pp max q, is the same for all q. The proof of the theorem gives a construction of p max.

How is that possible? q 0 p max p p 1 q 10 p max p 1 q D Z p p 1 p max q p Different values of the viewpoint parameter q produce different judgements on which distributions are more diverse than which others. But there is a single distribution p max that is optimal for all q.

3. Examples and consequences

The naive model Put Z ij # 1 if i j, 0 if i j. (Then q D Z ppq is exponential of Rényi entropy.) The maximizing distribution is uniform: every species has equal abundance. Why? Can prove it directly, or reason as follows: One corollary of the theorem is that if a distribution is maximizing for some q ą 0, then it is maximizing for all q. We know that the uniform distribution maximizes the Shannon entropy (q 1). It follows that the uniform distribution maximizes all the Rényi entropies.

A three-species example 1 0.2 0.2 0.2 pine species A Z 0.2 1 0.9 oak 0.9 0.2 0.9 1 0.2 pine species B Which frequency distribution maximizes the diversity? Not this: 0.333... Or this: 0.5 In fact, it s this: 0.48 0.333... 0.333... 0.25 0.25 0.26 0.26 This distribution maximizes diversity for all q, with constant value 1.703....

Properties of maximizing distributions There can be more than one maximizing distribution. A maximizing distribution can eliminate some species entirely. How can that be? To understand, run time backwards... oak ash 10 species of pine eucalyptus

Properties of maximizing distributions There can be more than one maximizing distribution. A maximizing distribution can eliminate some species entirely. How can that be? To understand, run time backwards... oak ash eucalyptus 11 species of pine Introducing an 11th species of pine decreases diversity. So if we start with a forest containing all 11 pine species, eliminating the 11th will increase diversity.

Computing the maximizing distribution Start with an n ˆ n similarity matrix Z. Computing the maximizing distribution(s) takes 2 n steps. But each step is fast. E.g. can do 25 species in a few seconds. And that s for arbitrary similarity matrices Z. For some special types of Z, the computation is near-instant.

Tree-based similarity matrices Suppose we define similarity via a taxonomic or phylogenetic tree. Example Put $ 1 if same species pi jq & 0.7 if different species but same genus Z ij 0.3 if different genera but same family % 0 otherwise. Then: There s a unique maximizing distribution p max. It eliminates no species; that is, pp max q i ą 0 for all i.

4. Unanswered questions

Why is the theorem true? Why is there a single frequency distribution that maximizes diversity from all viewpoints simultaneously? We can prove it... And it s easy in the naive case where Z is the identity (Rényi entropies). Then the maximizing distribution is uniform. But we lack intuition as to why it s true in general.

What about maximization under constraints? The theorem concerns maximization of diversity without constraints. For some constraints, the theorem fails: there is no distribution that maximizes diversity for all q simultaneously. Under which types of constraint can we maximize diversity for all q simultaneously?

What is the significance of the maximum diversity itself? The theorem also says that the maximum diversity is independent of the choice of viewpoint parameter q. So, any similarity matrix Z ( list of species ) gives rise to a number, D max pzq q D Z pp max q, which does not depend on the choice of q. What does D max pzq mean? It has geometric significance... (Mark Meckes) How else can it be understood?

What are the biological implications of the theorem? The theorem is fundamentally a maximum entropy theorem. Interpreting entropy as diversity, it says there s a best of all possible worlds (if we think diversity is good). But there are other biological interpretations of entropy! Under those interpretations, what does the theorem tell us?

Summary

Summary There is a one-parameter family of diversity measures, p q D Z ppqq 0ďqď8, generalizing Rényi (and Shannon) entropy. They take into account not only the frequencies pi of the species, but also the inter-species similarities pz ij q. Different values of the parameter q correspond to different viewpoints on the relative importance of rare/common species. Different values of q produce different judgements on which communities are more diverse. Nevertheless: Given a similarity matrix Z, there s a single frequency distribution p max that maximizes diversity from all viewpoints q simultaneously. Moreover, the maximum diversity value q D Z pp max q is the same for all q. Still, we don t fully understand: why the theorem is true what its biological implications are.