Biology as Information Dynamics

Similar documents
Biology as Information Dynamics

Probability Theory and Simulation Methods

Information in Biology

Information in Biology

Some Basic Concepts of Probability and Information Theory: Pt. 1

CS 361: Probability & Statistics

1 Review of The Learning Setting

Quantitative Biology II Lecture 4: Variational Methods

Lecture 1: Probability Fundamentals

Belief-based Learning

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Introduction to Bayesian Statistics

Modeling Music as Markov Chains: Composer Identification

3 PROBABILITY TOPICS

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Evolutionary Game Theory

5 Mutual Information and Channel Capacity

MATH 3C: MIDTERM 1 REVIEW. 1. Counting

CHAPTER 10 Comparing Two Populations or Groups

Bayesian Inference. Introduction

PERFECT SECRECY AND ADVERSARIAL INDISTINGUISHABILITY

Probabilistic and Bayesian Machine Learning

Computational Cognitive Science

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

STOR 435 Lecture 5. Conditional Probability and Independence - I

Network Theory. 1. Tuesday 25 February, 3:30 pm: electrical circuits and signal-flow graphs.

Hypothesis tests

Notes from Week 9: Multi-Armed Bandit Problems II. 1 Information-theoretic lower bounds for multiarmed

Bayesian Concept Learning

Section 2.4 Bernoulli Trials

Terminology. Experiment = Prior = Posterior =

SIMPLE RANDOM WALKS: IMPROBABILITY OF PROFITABLE STOPPING

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Class 26: review for final exam 18.05, Spring 2014

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Network Theory III: Bayesian Networks, Information and Entropy. John Baez, Brendan Fong, Tobias Fritz, Tom Leinster

EnM Probability and Random Processes

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

Lecture 9 Examples and Problems

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Date: Tuesday, 18 February :00PM. Location: Museum of London

Bayesian Learning (II)

Problem Set #2 Due: 1:00pm on Monday, Oct 15th

Hypothesis Tests Solutions COR1-GB.1305 Statistics and Data Analysis

Expected Value II. 1 The Expected Number of Events that Happen

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

Probability and Statistics

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Quantitative Biology Lecture 3

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Bayesian Updating with Continuous Priors Class 13, Jeremy Orloff and Jonathan Bloom

Lab #12: Exam 3 Review Key

Sample Spaces, Random Variables

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Recitation 6. Randomization. 6.1 Announcements. RandomLab has been released, and is due Monday, October 2. It s worth 100 points.

Lecture 5: Introduction to Probability

Weighted Majority and the Online Learning Approach

What Causality Is (stats for mathematicians)

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

A Dynamic Level-k Model in Games

CSC321 Lecture 18: Learning Probabilistic Models

Discrete Distributions

Dynamics in Social Networks and Causality

CS 361: Probability & Statistics

Rebops. Your Rebop Traits Alternative forms. Procedure (work in pairs):

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

Chapter 7: Section 7-1 Probability Theory and Counting Principles

Frequentist Statistics and Hypothesis Testing Spring

Introductory Econometrics. Review of statistics (Part II: Inference)

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Population Genetics: a tutorial

CS 124 Math Review Section January 29, 2018

(1) Introduction to Bayesian statistics

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Game Theory, Evolutionary Dynamics, and Multi-Agent Learning. Prof. Nicola Gatti

Stochastic Variational Inference

PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric

P (E) = P (A 1 )P (A 2 )... P (A n ).

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Choosing priors Class 15, Jeremy Orloff and Jonathan Bloom

Game Theory and Algorithms Lecture 2: Nash Equilibria and Examples

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

Page Max. Possible Points Total 100

Simple Consumption / Savings Problems (based on Ljungqvist & Sargent, Ch 16, 17) Jonathan Heathcote. updated, March The household s problem X

Some Special Discrete Distributions

How do species change over time?

Universality for mathematical and physical systems

Machine Learning

A Generic Bound on Cycles in Two-Player Games

Massachusetts Institute of Technology

An Introduction to Laws of Large Numbers

Uni- and Bivariate Power

The problem of base rates

N/4 + N/2 + N = 2N 2.

The area under a probability density curve between any two values a and b has two interpretations:

Transcription:

Biology as Information Dynamics John Baez Biological Complexity: Can It Be Quantified? Beyond Center February 2, 2017

IT S ALL RELATIVE EVEN INFORMATION! When you learn something, how much information do you gain? It depends on what you believed before! We can model hypotheses as probability distributions. When you update your prior hypothesis p to a new one q, how much information did you gain? This much: I(q, p) = n q i log i=1 ( qi This is the information of q relative to p, also called the information gain or Kullback Leibler divergence. I(q, p) 0 and I(q, p) = 0 q = p p i )

For example, suppose we flip a coin you think is fair. Your prior hypothesis is this: p H = 1 2 p T = 1 2 Then you learn it landed heads up: q H = 1 q T = 0 The relative information is 1 bit: ( 1 I(q, p) = 1 log 1/2 ) + 0 log ( ) 0 = log 2 1/2 where we define 0 log 0 = 0. You have gained 1 bit of information.

But suppose you think there s only a 25% chance of heads: p H = 1 4 p T = 3 4 Then you learn the coin landed heads up: q H = 1 q T = 0 Now the relative information is higher: ( ) ( ) 1 0 I(q, p) = 1 log + 0 log 1/4 3/4 You have gained 2 bits of information! = log 4 = 2 log 2

FREE ENERGY AS RELATIVE INFORMATION The free energy of a system at temperature T is F = E TS where E is the expected value of its energy and S is its entropy. In equilibrium this is minimized. But in fact, the free energy of a system is proportional to the information it contains, relative to equilibrium.

Equilibrium is a hypothesis about the system s state, in which we guess that the probability of it being in its ith state is p i exp( E i /kt ) Here E i is the energy of the ith state and k is Boltzmann s constant.

But suppose we choose a different hypothesis, and say probability of finding the system in its ith state is q i. A fun calculation shows that F (q) F (p) = kt I(q, p) where F(q) is the free energy for q and F(p) is the free energy in equilibrium. So: the extra free energy of a system out of equilibrium is proportional to the information it has, relative to equilibrium. As a system approaches equilibrium, this information goes away.

NATURAL SELECTION AND RELATIVE INFORMATION Suppose we have self-replicating entities of different kinds: molecules of different chemicals organisms belonging to different species genes of different alleles restaurants belonging to different chains people with different beliefs game-players with different strategies etc. I ll call them organisms of different species.

Let P i be the population of the ith species, as a function of time. Suppose the replicator equation holds: dp i dt = F i P i where F i = F i (P 1,..., P n ), the fitness of the ith species, can depend on the populations of all the species.

The probability that a randomly chosen organism belongs to the ith species is p i = P i j P j If we think of each species as a strategy, the probability distribution p i can be seen as a hypothesis about the best strategy. The quality of a strategy is measured by its fitness F i. We can show dp i = (F i F)p i dt where F = j F j p j is the mean fitness. This is a continuous-time version of Bayesian hypothesis updating: Marc Harper, The replicator equation as an inference dynamic, arxiv:0911.1763.

Now suppose that q is some fixed probability distribution of species, while p i (t) changes as above. A fun calculation shows that d ( Fi F ) q i dt I(q, p(t)) = i

But what does this equation mean? d ( Fi F ) q i dt I(q, p(t)) = i I(q, p(t)) is the information left to learn in going from p(t) to q. ( i Fi F ) q i is the average excess fitness of a small invader population with distribution q i. It says how much fitter the invaders are, on average, than the general population P i (t). If this average excess fitness is 0 for all choices of P i (t), we call q a dominant distribution.

So: if there is a dominant distribution q, the actual probability distribution of species p(t) tends to approach this and the information left to learn, I(q, p(t)), tends to decrease! But real biology is much more interesting! At all levels, biological systems only approach steady states in limited regions for limited times. Even using just the replicator equation dp i dt = F i P i we can already see more complex phenomena.

THE RED QUEEN HYPOTHESIS "Now, here, you see, it takes all the running you can do, to keep in the same place."

The Red Queen Hypothesis: replicators must keep changing simply to survive amid other changing replicators. For example, in males of the common side-blotched lizard, orange beats blue, blue beats yellow, and yellow beats orange:

We can model this using the replicator equation by assuming lizards play randomly chosen opponents in the rock-paperscissors game, with fitness determined by the game s outcome: Sinervo and Lively, The rock-paper-scissors game and the evolution of alternative male strategies, Nature 380 (1996).

The replicator equation gives this dynamics for the probability distribution of strategies rock, paper and scissors : There is a steady state, but it is not an attractor. In general, the population never stops learning new information!

INFORMATION GEOMETRY How can we quantify the rate of learning? Here we face a paradox : For any probability distribution p(t) that changes smoothly with time, we have d dt I(p(t), p(t 0)) = 0 t=t0 for all times t 0. To first order, you re never learning anything.

However, as long as the velocity ṗ(t 0 ) is nonzero, we have d 2 dt 2 I(p(t), p(t 0)) > 0 t=t0 To second order, you re always learning something... unless your opinions are fixed. This lets us define a rate of learning that is, the speed of the changing probability distribution p(t).

Namely, define the length of the vector ṗ(t 0 ) by ṗ(t 0 ) 2 = d 2 dt 2 I(p(t), p(t 0)) t=t0 This notion of length defines a Riemannian metric on the space of probability distributions: the Fisher information metric. This makes the space of probability distributions round:

Now suppose we have populations obeying the replicator equation: dp i = F i P i dt so the probability distribution of species evolves via dp i dt = (F i F)p i Then a fun calculation shows dp 2 dt = i (F i F ) 2 p i The square of the rate of learning is the variance of the fitness! This is a clean, general statement of Fisher s fundamental theorem of natural selection.

These are some tiny steps toward a theory of biology as information dynamics.