Sequential Monte Carlo Algorithms

Similar documents
15-780: Graduate Artificial Intelligence. Bayesian networks: Construction and inference

STA 4273H: Statistical Machine Learning

Bayesian Classification and Regression Trees

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

STA 4273H: Statistical Machine Learning

Sampling Methods. Bishop PRML Ch. 11. Alireza Ghane. Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo


Modern Phylogenetics. An Introduction to Phylogenetics. Phylogenetics and Systematics. Phylogenetic Tree of Whales

STA 414/2104: Machine Learning

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Bayes Net Representation. CS 188: Artificial Intelligence. Approximate Inference: Sampling. Variable Elimination. Sampling.

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9 Forward-backward algorithm, sum-product on factor graphs

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Bayesian Phylogenetics:

Bayesian Inference and Traffic Analysis

Study Notes on the Latent Dirichlet Allocation

Theory of Evolution Charles Darwin

Monte Carlo Methods. Geoff Gordon February 9, 2006

Bayesian Phylogenetics

Undirected Graphical Models

Conditional probabilities and graphical models

A008 THE PROBABILITY PERTURBATION METHOD AN ALTERNATIVE TO A TRADITIONAL BAYESIAN APPROACH FOR SOLVING INVERSE PROBLEMS

Molecular Evolution & Phylogenetics

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Week 7: Bayesian inference, Testing trees, Bootstraps

Stochastic Simulation

Big model configuration with Bayesian quadrature. David Duvenaud, Roman Garnett, Tom Gunter, Philipp Hennig, Michael A Osborne and Stephen Roberts.

Monte Carlo in Bayesian Statistics

Announcements. CS 188: Artificial Intelligence Spring Bayes Net Semantics. Probabilities in BNs. All Conditional Independences

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Inference in Bayesian networks

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Adaptive Crowdsourcing via EM with Prior

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Evaluation Methods for Topic Models

EVOLUTIONARY DISTANCES

Bayes factors, marginal likelihoods, and how to choose a population model

Theory of Evolution. Charles Darwin

Phylogenetic Tree Reconstruction

Structure Learning in Sequential Data

Bayesian Methods for Machine Learning

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Strong Lens Modeling (II): Statistical Methods

Recent Advances in Bayesian Inference Techniques

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Introduction to Machine Learning CMU-10701

Kernel adaptive Sequential Monte Carlo

Markov Chain Monte Carlo Methods

Phylogenetic Inference via Sequential Monte Carlo

Chapter 3: Phylogenetics

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Learning Energy-Based Models of High-Dimensional Data

MCMC: Markov Chain Monte Carlo

Time-Sensitive Dirichlet Process Mixture Models

Annealing Between Distributions by Averaging Moments

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

The Expectation Maximization Algorithm

CS Lecture 3. More Bayesian Networks

Bayesian networks: approximate inference

CSCI-567: Machine Learning (Spring 2019)

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Example: multivariate Gaussian Distribution

Decentralized decision making with spatially distributed data

Hidden Markov Models for precipitation

CS 343: Artificial Intelligence

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

CPSC 540: Machine Learning

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Simulated Annealing for Constrained Global Optimization

CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

Decision Tree Fields

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Artificial Intelligence

Basic Sampling Methods

Algorithms in Bioinformatics

Probabilistic Graphical Models

CS 188: Artificial Intelligence. Bayes Nets

VCMC: Variational Consensus Monte Carlo

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Multimodal Nested Sampling

PROBABILISTIC REASONING SYSTEMS

6 Markov Chain Monte Carlo (MCMC)

CSE 150. Assignment 6 Summer Maximum likelihood estimation. Out: Thu Jul 14 Due: Tue Jul 19

Web Structure Mining Nodes, Links and Influence

Sum-Product Networks: A New Deep Architecture

Discrete & continuous characters: The threshold model

A new iterated filtering algorithm

Vers un apprentissage subquadratique pour les mélanges d arbres

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Latent Variable Models

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Lecture 13 Fundamentals of Bayesian Inference

Bayesian Networks BY: MOHAMAD ALSABBAGH

Transcription:

ayesian Phylogenetic Inference using Sequential Monte arlo lgorithms lexandre ouchard-ôté *, Sriram Sankararaman *, and Michael I. Jordan *, * omputer Science ivision, University of alifornia erkeley epartment of Statistics, University of alifornia erkeley

Phylogenetic tree inference Topic of this talk: integration over the space of trees using Sequential Monte arlo (SM) Motivation: ayesian approach to phylogenetic inference Put a prior on trees, use the posterior for reconstruction Heavy use of integrals over the space of trees: e.g. for handling nuisance parameters, computing minimum risk estimators, ayes factors, etc. Prelude: a parallel with the simpler problem of maximization over the space of trees

Maximization over phylogenies Two strategies: Local and sequential search Key difference: representation Local Sequential State t Trees over the observed species

Maximization: local strategy Meta-algorithm: 1.Start at arbitrary state

Maximization: local strategy Meta-algorithm: 1.Start at arbitrary state 2.Iterate: i. Evaluate neighbors ii. Move to a nearby tree

Maximization: local strategy Meta-algorithm: 1.Start at arbitrary state 2.Iterate: i. Evaluate neighbors ii. Move to a nearby tree 3.Return best state visited Example: stochastic annealing

Maximization over phylogenies Two strategies: Local and sequential search Key difference: representation Local Sequential State t Trees over the observed species Partial state p Forests over the observed species

Maximization over phylogenies Two strategies: Local and sequential search Key difference: representation Local Sequential State t Partial state p Trees over the observed species Forests over the observed species

Maximization: sequential strategy Meta-algorithm: 1.Start at the initial, unconstrained partial state =

Maximization: sequential strategy Meta-algorithm: 1.Start at the initial, unconstrained partial state 2.Iterate: i. Extend partial state ii. Estimate best successor =

Maximization: sequential strategy Meta-algorithm: 1.Start at the initial, unconstrained partial state 2.Iterate: i. Extend partial state ii. Estimate best successor 3.Return best final state = Example: neighbor joining

Parallel lassification of phylogenetic algorithms Local strategy Sequential strategy Maximization Stochastic annealing, Neighbor-joining, Integration

Parallel lassification of phylogenetic algorithms Local strategy Sequential strategy Maximization Stochastic annealing, Neighbor-joining, Integration MM algorithms???

Parallel lassification of phylogenetic algorithms Local strategy Sequential strategy Maximization Stochastic annealing, Neighbor-joining, Integration MM algorithms??? Sequential Monte arlo (SM)

Outline ackground: Importance sampling and Sequential Monte arlo SM for phylogenetic inference Framework for designing proposals Experiments: comparisons with MM

Preview: omparative advantages SM MM + Trivial to parallelize + Easier to get data likelihood estimate + No burn-in + Easier to resample hyper-parameters + Easier to design proposal distribution

Preview: omparative advantages SM MM + Trivial to parallelize + Easier to get data likelihood estimate + No burn-in + Easier to resample hyper-parameters + Easier to design proposal distribution Not exclusive: the two approaches can be combined

Phylogenetic setup: ultrametric trees root G.. G.. } Tree T Hidden sequences X E G..} G.. G.. GGT.. TTT.. GT.. GT.. Observations Y = y

Phylogenetic setup: ultrametric trees root G.. G.. height } Tree T Hidden sequences X E G..} G.. G.. GGT.. TTT.. GT.. GT.. Observations Y = y

Phylogenetic setup: ultrametric trees root G.. G.. height } Tree T Hidden sequences X E G..} G.. G.. GGT.. TTT.. GT.. GT.. Observations Y = y Target distribution: with density: T Y d = π γ(t) Z

Phylogenetic setup: ultrametric trees root G.. G.. height } Tree T Hidden sequences X E G..} Joint density evaluated G.. at G.. (t, y) GGT.. TTT.. GT.. GT.., summing over Observations Y = y hidden x Target distribution: with density: T Y d = π γ(t) Z

Phylogenetic setup: ultrametric trees root G.. G.. height } Tree T Hidden sequences X E G..} Joint density evaluated G.. at G.. (t, y) GGT.. TTT.. GT.. GT.., summing over Observations Y = y x P(Y = y) T Y = d π hidden Target distribution: ata likelihood (intractable) with density: γ(t) Z

Sequential Monte arlo (SM) ackground: Importance Sampling (IS) t1 t2 t3 IS : pproximation for π q 1.Sample trees from a proposal q: ti ~ q

Sequential Monte arlo (SM) ackground: Importance Sampling (IS) IS : pproximation for π w1 w2 w3 1.Sample trees from a proposal q: ti ~ q 2. ompute weights w i = γ(t i )/q(t i ) 3. Normalize weights

Sequential Monte arlo (SM) ackground: Importance Sampling (IS) IS : pproximation for π w1 w2 w3 1.Sample trees from a proposal q: ti ~ q 2. ompute weights w i = γ(t i )/q(t i ) "Particle" 3. Normalize weights

Sequential Monte arlo (SM) ackground: Problem with importance sampling: π is high-dimensional Most particles will have tiny weights

Sequential Monte arlo (SM) ackground: Importance Sampling q π γ Z

Sequential Monte arlo (SM) ackground: Importance Sampling q π γ Z SM: a sequence of proposals q q q q π 1 π 2 π R = π γ Z

Sequential Monte arlo (SM) ackground: Importance Sampling q π γ Z SM: a sequence of proposals q q q q π 1 π 2 π R = π γ Z SM for phylogenies: states (forest) π r are distributions over partial

Sequential Monte arlo (SM) 1. Initialize [] π 1

Sequential Monte arlo (SM) SM : pproximation for π 1. Initialize [] π 2 2. Iterate : i. Sample partial states p i π 1 q π 1

Sequential Monte arlo (SM) SM : pproximation for π 1. Initialize [] π 2 2. Iterate : i. Sample partial states p i π 1 q π 1 p1 p3 p2

Sequential Monte arlo (SM) SM : pproximation for π 1. Initialize [] π 2 p 1 p 2 p 3 2. Iterate : i. Sample partial states p i π 1 q π 1 p1 p3 p2

Sequential Monte arlo (SM) SM : pproximation for π 1. Initialize [] π 2 2. Iterate : i. Sample partial states w1 w2 w3 p i π 1 q ii. ompute weights w i = γ(p i ) γ(p i ) 1 q(p i p i ) iii. Normalize weights

Sequential Monte arlo (SM) π SM : pproximation for π 1. Initialize [] π 2 2. Iterate : i. Sample partial states p i π 1 q π 1 ii. ompute weights w i = γ(p i ) γ(p i ) 1 q(p i p i ) iii. Normalize weights

Intuition: why it works asic result: SM is asymptotically consistent p w = γ(p ) γ(p ) 1 q(p p ) p p w = γ(p ) γ(p) 1 q(p p ) w = γ(p) 1 1 q( p)

Intuition: why it works asic result: SM is asymptotically consistent p w = γ(p ) γ(p ) 1 q(p p ) p p w = γ(p ) γ(p) 1 q(p p ) w = γ(p) 1 1 q( p) w w w = γ(p ) q( p )

Intuition: why it works asic result: SM is asymptotically consistent p w = γ(p ) γ(p ) 1 q(p p ) p p w = γ(p ) γ(p) 1 q(p p ) w = γ(p) 1 1 q( p) w w w = γ(p ) q( p )

Intuition: why it works ompare: Weights along a SM path Importance sampling w w w = γ(p ) q( p ) w = γ(t) q(t)

esigning a proposal q Issue: Over-counting Two ways of t1 t2 One way of building t2 building t1

esigning a proposal q Useful abstraction: q induce a partial order (poset) P

esigning a proposal q Useful abstraction: q induce a partial order (poset) P p1 p2 if q can propose a path from p1 to p2

esigning a proposal q Useful abstraction: q induce a partial order (poset) P Poset s Hesse diagram:

esigning a proposal q Useful abstraction: q induce a partial order (poset) P Poset s Hesse diagram: Use proposal that have tree-shaped Hesse diagrams

esigning a proposal q Example: a proposal that has a tree-shaped Hesse diagram. 1. Pick a pair of trees to merge uniformly at random 2. Pick a height for the new tree such that = height( ) < height( )

esigning a proposal q t1 t2 = < height( ) height( )

Experiments: setup Synthetic-small Synthetic-med Real data Source Generated from the model Subset of HGP Likelihood model rownian motion on frequencies Number of sites 100 11,511 Number of nodes 25 51 25 Number of leaves 13 26 13

Synthetic experiments Goal: comparison against MM ompetitor: standard MM sampler, 4 tempering chains, shared sum-product implementation Metric: symmetric clade difference of the Minimum ayes Risk reconstructed tree to the generating tree atapoints computed by increasing the number of particles (for SM) and the number of sampling steps (for MM)

omparison with MM Synthetic-small Symmetric clade difference 0.4 0.3 0.2 0.1 SM MM 10 100 1000 10000 100000 Wall clock time (ms) in logscale

omparison with MM Synthetic-medium Symmetric clade difference 0.5 0.4 0.3 0.2 0.1 MM SM 10 100 1000 10000 100000 1x10 6 1x10 7 Wall clock time (ms) in logscale

Experiments on real data Goal: show that the method scales to large number of sites Number of particle (10,000) determined using synthetic experiments, timing experiments with different numbers of cores: Wall clock time (ms) 65000 55000 45000 1 2 3 4 Number of cores

onclusion SM can be applied to a wide range of phylogenetic models; previous work limited to oalescent priors [Teh et al. 07] Order theoretic framework for designing proposals Experiments: There are regimes where SM outperforms MM Promising applications of SM in phylogenetic inference: 1. Quickly analyze large datasets 2. Initialization and large step proposal for MM chains