Reinforcement Learning: Part 3 Evolution

Similar documents
MCMC: Markov Chain Monte Carlo

URN MODELS: the Ewens Sampling Lemma

Monte Carlo Methods. Leon Gu CSD, CMU

Introduction to Machine Learning CMU-10701

Bayesian GLMs and Metropolis-Hastings Algorithm

Markov Chains Handout for Stat 110

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Markov chain Monte Carlo Lecture 9

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

CSC 2541: Bayesian Methods for Machine Learning

Markov Chains, Random Walks on Graphs, and the Laplacian

Bayesian nonparametrics

Reproduction- passing genetic information to the next generation

Markov chain Monte Carlo

16 : Approximate Inference: Markov Chain Monte Carlo

Lect4: Exact Sampling Techniques and MCMC Convergence Analysis

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Example: physical systems. If the state space. Example: speech recognition. Context can be. Example: epidemics. Suppose each infected

Forecasting & Futurism

Markov chain Monte Carlo

MCMC Methods: Gibbs and Metropolis

Introduction to Optimization

Stat 516, Homework 1

Introduction to Machine Learning CMU-10701

Computational statistics

Probabilistic Machine Learning

Bayesian Methods for Machine Learning

STOCHASTIC PROCESSES Basic notions

6 Markov Chain Monte Carlo (MCMC)

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Lecture 15: Genetic Algorithms

Markov chain Monte Carlo

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

CSC 446 Notes: Lecture 13

Convergence Rate of Markov Chains

Introduction to population genetics & evolution

Bayesian Methods with Monte Carlo Markov Chains II

Lecture 9 Classification of States

Markov Chains on Countable State Space

Machine Learning for Data Science (CS4786) Lecture 24

Convex Optimization CMU-10725

Markov Chain Model for ALOHA protocol

Chapter 7. Markov chain background. 7.1 Finite state space

Reinforcement Learning

Advanced Sampling Algorithms

Introduction to MCMC. DB Breakfast 09/30/2011 Guozhang Wang

Haploid & diploid recombination and their evolutionary impact

Lecture 9 Evolutionary Computation: Genetic algorithms

2 Discrete-Time Markov Chains

Stochastic optimization Markov Chain Monte Carlo

Evolutionary Computation: introduction

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Introduction Probabilistic Programming ProPPA Inference Results Conclusions. Embedding Machine Learning in Stochastic Process Algebra.

CS 343: Artificial Intelligence


Markov Chain Monte Carlo Methods

Computational statistics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

WXML Final Report: Chinese Restaurant Process

CPSC 540: Machine Learning

Fundamentals of Genetic Algorithms

Lecture 6: Markov Chain Monte Carlo

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

LECTURE 15 Markov chain Monte Carlo

Quantifying Uncertainty

References. Markov-Chain Monte Carlo. Recall: Sampling Motivation. Problem. Recall: Sampling Methods. CSE586 Computer Vision II

Probabilistic Models in Theoretical Neuroscience

Reading for Lecture 13 Release v10

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

STA 4273H: Statistical Machine Learning

Markov Chains and MCMC

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

The Origin of Deep Learning. Lili Mou Jan, 2015

Crossover Techniques in GAs

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

STA 4273H: Statistical Machine Learning

Robert Collins CSE586, PSU Intro to Sampling Methods

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Part IV: Monte Carlo and nonparametric Bayes

Math 456: Mathematical Modeling. Tuesday, March 6th, 2018

Introduction to Restricted Boltzmann Machines

Markov Chains and MCMC

MARKOV CHAIN MONTE CARLO

CSC 4510 Machine Learning

Markov-Chain Monte Carlo

Foundations of Nonparametric Bayesian Methods

Probabilistic Graphical Models

Mixing Times and Hitting Times

Bayes Nets: Sampling

Part 2- Biology Paper 2 Inheritance and Variation Knowledge Questions

Expectations, Markov chains, and the Metropolis algorithm

Session 3A: Markov chain Monte Carlo (MCMC)

MSc MT15. Further Statistical Methods: MCMC. Lecture 5-6: Markov chains; Metropolis Hastings MCMC. Notes and Practicals available at

Transcription:

1 Reinforcement Learning: Part 3 Evolution Chris Watkins Department of Computer Science Royal Holloway, University of London July 27, 2015

2 Cross-entropy method for TSP Simple genetic style methods can be very effective, such as the following algorithm for learning a near-optimal tour in a TSP. Initialise: Given: asymmetric distances between C cities Initialise P, a matrix of transition probabilities between cities (to uniform distributions) Repeat until convergence: 1. Generate a large number ( 10C 2 ) of random tours according to P, starting from city 1, under the constraint that no city may be visited twice. 2. Select the shortest 1% of these tours, and tabulate the transition counts for each city. 3. Adjust P towards the transition counts

3 ρ = 0.5 0.8 0.9 Weak selec,on Frac,on selected in each genera,on is ρ ρ = 0.01 0.03 0.1 Strong selec,on

4 Venn diagram Evolutionary computation Machine learning

5 Evolutionary computation Machine learning Estimation of distribution algorithms (EDA) (Pelikan and many others) Cross-entropy method (CEM) (Rubinstein, Kroese, Mannor 2000-2005) Evolution of reward functions (Niv, Singh) Simple model of evolution and learning (Hinton and Nowlan 1987)

6 Perhaps we should take evolution seriously? Nature s number one learning algorithm: asexual evolution Nature s number two learning algorithm: sexual evolution Machine learning

7 Needed: an enabling model of evolution The best model of a cat is a cat. Norbert Wiener Wiener is wryly pointing out that a model should be as simple as possible, and should contain only what is essential for the problem at hand. We should discard as many biological details as possible, while keeping the computational essence. If someone says But biological detail X is important!, then we can try putting it back in and see if it makes any difference.

8 Aim: Can we construct a machine learning style model of evolution, which includes both genetic evolution and individual learning?

9 Genetic mechanisms of sexual reproduction At DNA level, the mechanisms of sexual reproduction are well known and tantalisingly simple! Genetic mechanisms of sexual reproduction were fixed > 10 9 years ago, in single-celled protozoa; since then, spontaneous evolution of advanced representational systems for multicellular anatomy and complex instincts. Evolution is so robust you can t stop it... Hypothesis: fair copying is the essence of sexual reproduction Each child each new member of the population is a combination of genetic material copied (with errors) from other members of the population. The copying is fair : all genetic material from all members of the population has an equal chance of being copied into the child

10 A Markov chain of populations Suppose we breed individuals under constant conditions. Procedures for breeding, recombination, mutation and for selection are constant. Then each population depends only on the previous population, so the sequence of populations is a Markov chain. We will consider a sequence of populations where at each transition just one individual is removed, and just one is added. (In genetic language, this is the Moran model of overlapping populations)

1 All the Markov chains we consider will be irreducible, aperiodic, and positive recurrent: such chains have a unique stationary distribution over states. (Any reasonable model of mutation allows some probability of any genome changing to any other. ) 11 Markov Chains: unique stationary distribution A Markov chain is specified by its transition probabilities T ji = T (i j) = P(X t+1 = j X t = i) There is a unique 1 stationary distribution π over the states such that T π = π In genetic language, the stationary distribution of the Markov chain of populations is the mutation-selection equilibrium.

Reversibility Detailed balance Kolmogorov cycle 12 Markov Chains: detailed balance and reversibility Reversibility A Markov chain T is reversible iff T π = π = T T π Detailed balance condition For any two states X, X, π(x )T (X X ) = π(x )T (X X ) Kolmogorov cycle condition For all n > 2, X 1, X 2,..., X n, T (X 1 X 2 )T (X 2 X 3 ) T (X n X 1 ) = T (X n X n 1 )T (X n 1 X n 2 ) T (X 1 X n )

13 Markov Chain Monte-Carlo (MCMC) in a nutshell MCMC is a computational approach for sampling from a known, (typically complicated) probability distribution π. Given a probability distribution π, construct a Markov chain T with stationary distribution π, and run the Markov chain to (eventually) get samples from π. But how to construct a suitable T? Easiest way: define T that satisfies detailed balance for π. Most common techniques for constructing reversible chains for a specified stationary distribution are: Metropolis-Hastings algorithm Gibbs sampling Note that only a small class of Markov chains are reversible: there is no advantage in reversible chains except that it may be easier to characterise the stationary distribution.

14 Evolution as MCMC Idea: can we find a computational model of evolution that satisfies detailed balance (and so is reversible)? A nice stationary distribution would be: where π(g 1,..., g N ) = p B (g 1,..., g N )f (g 1 ) f (g N ) π is an (unnormalised) stationary distribution of a population of N genomes g 1,..., g N p B (g 1,..., g N ) is the breeding distribution, the stationary probability of the population with no selection f (g) is the fitness of genome g. We require f (g) > 0.

Reversibility: a problem When a reversible chain has reached its stationary distribution, an observer cannot tell if it is running forwards or backwards, because every cycle of state transitions has the same probability in both directions (as in physics). Any model where a child must have two parents is not reversible Given parents g 1, g 2, and their child c, the child is more similar to each parent, than the parents are similar to each other. e.g. for genomes of binary values, HammingDistance(g 1,c) 1 2 HammingDistance(g 1,g 2 ) Hence in small populations with large genomes, we can identify parents and children, and so identify the direction of time So natural evolution (and Holland s genetic algorithms) are not reversible Markov chains. 15

16 Key idea: Exchangeable Breeding Generative probability model for a sequence of genomes g 1, g 2,... Breeding only, no selection. g 1 consists entirely of mutations g 2 consists of elements copied from g 1, with some mutations g 3 is elements copied from g 1 or g 2, with fewer mutations Each element of g N is copied from one of g 1,..., g N 1, with few additional mutations... Generating the sequence g 1, g 2,... does not seem biologically realistic but conditionally sampling g N+1 given g 1,..., g N can be much more plausible. The sequence g 1, g 2,... needs to be exchangeable

Definition of exchangeability Exchangeable sequences A sequence of random variables g 1, g 2,... is infinitely exchangeable iff for any N and any permutation σ of {1,..., N}, p(g 1,..., g N ) = p(g σ1,..., g σn ) Which sequences are exchangeable? de Finetti s theorem g 1, g 2,... is exchangeable off there is some prior distribution Θ, and a hidden parameter θ Θ, such that, given knowledge of θ g 1, g 2,... are all i.i.d. samples, distributed according to θ But de Finetti s theorem does not help us much at this point we want an example of a plausible generative breeding distribution. 17

18 Example: Blackwell-MacQueen urn model Pick a mutation distribution H. A new genetic element can be sampled from H, independently of other mutations. Pick a concentration parameter α > 0; this will determine the mutation rate Generate a sequence θ 1, θ 2,... by: θ 1 H 1 With prob. 1+α, θ α 2 = θ 1, else w.p. 1+α, θ 2 H n 1 With prob. n 1+α, θ n is randomly chosen (copied) from a uniform random choice of θ 1,..., θ n 1, else with prob. α n 1+α, there is a new mutation θ n H This exchangeable sequence is well known in machine learning: it samples from the predictive distribution of a Dirichlet process DP(H, α)

19 Example: Cartesian product of DPs Each genome g i = (θ i1,..., θ il ) Generated with L independent Blackwell-MacQueen urn processes: For each j, 1 j L, the sequence θ 1j, θ 2j,..., θ Nj is a sample from the jth urn process. Sequence of genomes g 1, g 2,..., g N is exchangeable, and is a sample from a Cartesian product of L Dirichlet processes. This is the simplest plausible exchangeable breeding model for sexual reproduction. Many elaborations possible...

20 Examples of possible breeding distributions There is a rich and growing family of highly expressive non-parametric distributions, which achieve exchangeablility through random copying. Dirichlet process is the simplest. Can construct elegant exchangeable distributions of networks, and exchangeable sequences of machines with shared components...

Exchangeable breeding with tournaments (EBT) Fitness function f, s.t. for all genomes g, f (g) > 0. Current population of N genomes g 1,..., g N, with fitnesses f 1,..., f N. Repeat forever: 1. Breed g N+1 by sampling from p B conditional on the current population. g N+1 p B ( g 1,..., g N ) 2. f N+1 f (g N+1 ). Add g N+1 into the population. 3. Select one genome i to remove from the population Pr(remove g i ) = 1 f i 1 f 1 + + 1 f N+1 The genomes {g 1,..., g N+1 } \ {g i } become the next population of N genomes. 21

22 EBT satisfies detailed balance Let G = {g 1, g 2,..., g N+1 } To prove: π(g \N+1 )T (G \N+1 G \i ) = π(g \i )T (G \i G \N+1 ) Proof: π(g \N+1 )T (G \N+1 G \i ) = p B (G \N+1 ) f 1 f N p B (g N+1 G \N+1 ) = p B (G) f 1 f N+1 1 f i f 1 N+1 f 1 + + 1 f N+1 1 f i 1 f 1 + + 1 f N+1 which is symmetric between g i and g N+1. QED.

23 X 1 X 2 X 3 X 4 X 5 Current generation Breeding phase: Any number of children exchangeably sampled in sequence. X 1 X 2 X 3 X 4 X 5 X 6 X 6 p B ( X 1,..., X 5 ) X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 7 p B ( X 1,..., X 6 ) X 1 X 2 X 3 X 4 X 5 X 6 X 7 Selection phase: Current generation given survival tickets X 3 X 7 X 3 X 7 X 1 X 2 X 3 X 4 X 5 X 6 X 7 Any number of tournaments between randomly chosen elements with and without tickets f(x i) P r(x i wins the ticket against X j) = f(x i) + f(x j) X 1 X 3 X 1 X 3 X 5 X 6 X 5 X 6 Tournaments are shown between X 3 and X 7, then between X 1, X 3, and between X 5, X 6 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 2 X 3 X 4 X 5 X 7 Next generation are the ticket owners at the end of the selection phase

24 Role of the population EBT reduces to Metropolis-Hastings with a population size of 1. Why have a population larger than 1? Two reasons: 1. Larger population concentrates the conditional breeding distribution p B ( g 1,..., g N ). 2. Can scale log fitness as 1/N, thus improving acceptance rate for newly generated individuals.

25 Stochastically generated environments If environments are generated from an exchangeable distribution that is independent of the genome, then the environment plays a formally similar role to the genome, in that the stationary distribution for EBT with environments v 1,..., v N exchangeably sampled from a distribution P V ( ) is: π(g 1,..., g N, v 1,..., v N ) p B (g 1,..., g N )p V (v 1,..., v N )f (g 1, v 1 ) f (g N, v N ) (1) An intuitive interpretation of this is that each genome is born into its own randomly generated circumstances, or environment ; the current population will consist of individuals that are particularly suited to the environments into which they happen to have been born. Indeed, each gene is part of the environment of other genes.

26 Individual learning An abstract model of individual learning in different environments is that a genome g in an environment v 1. performs an experiment, observing a result x. The experiment that is performed, and the results obtained, depend on both g and v: let us write x = experiment(g, v). 2. given g and x, the organism develops a post-learning phenotype learn(g, x). 3. the fitness of g in v is then evaluated as f (g, v) = f (learn(g, experiment(g, v)), v)

27 Individual learning This model captures the notion that an organism: obtains experience of its environment; the experience depends both on the environment and on its own innate decision making, which depends on the interaction between its genome and the environment; the organism then uses this experience to develop its extended phenotype in some way; the organism s fitness depends on the interaction of its extended phenotype with its environment. Minimal assumptions about learning We do not assume: learning is rational in any sense organism has subjective utilities or reinforcement