Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Similar documents
Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Machine Learning for Data Science (CS4786) Lecture 24

Bayesian Phylogenetics

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

A Bayesian Approach to Phylogenetics

Infer relationships among three species: Outgroup:

Bayesian Phylogenetics:

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Bayesian Inference and MCMC

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Answers and expectations

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Lecture 12: Bayesian phylogenetics and Markov chain Monte Carlo Will Freyman

Bayesian Regression Linear and Logistic Regression

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Strong Lens Modeling (II): Statistical Methods

an introduction to bayesian inference

Principles of Bayesian Inference

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Introduction to Bayes

Principles of Bayesian Inference

Molecular Epidemiology Workshop: Bayesian Data Analysis

An introduction to Bayesian reasoning in particle physics

Bayesian Concept Learning

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

MCMC: Markov Chain Monte Carlo

Density Estimation. Seungjin Choi

ECE295, Data Assimila0on and Inverse Problems, Spring 2015

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Markov Chain Monte Carlo, Numerical Integration

Bayesian Methods for Machine Learning

An Introduction to Bayesian Networks: Representation and Approximate Inference

Introduc)on to Bayesian Methods

MCMC and Gibbs Sampling. Kayhan Batmanghelich

A Probabilistic Framework for solving Inverse Problems. Lambros S. Katafygiotis, Ph.D.

MCMC notes by Mark Holder

Bayesian inference: what it means and why we care

Sampling Algorithms for Probabilistic Graphical models

David Giles Bayesian Econometrics

Markov Chain Monte Carlo methods

Computer intensive statistical methods Lecture 1

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Molecular Evolution & Phylogenetics

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Probabilistic Graphical Networks: Definitions and Basic Results

Introduction to Bayesian Learning

Part IV: Monte Carlo and nonparametric Bayes

Probability, Entropy, and Inference / More About Inference

Markov Chain Monte Carlo

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

Bayesian Learning Extension

Bayesian Inference in Astronomy & Astrophysics A Short Course

Lecture notes on Regression: Markov Chain Monte Carlo (MCMC)

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

The connection of dropout and Bayesian statistics

Introduction to Bayesian thinking

Naive Bayes classification

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

CS 630 Basic Probability and Information Theory. Tim Campbell

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Principles of Bayesian Inference

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Probability and (Bayesian) Data Analysis

Principles of Bayesian Inference

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Probabilistic Machine Learning

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Bayes Nets: Sampling

CS 343: Artificial Intelligence

One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

Doing Bayesian Integrals

Markov Chain Monte Carlo

Basics of Bayesian analysis. Jorge Lillo-Box MCMC Coffee Season 1, Episode 5

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

eqr094: Hierarchical MCMC for Bayesian System Reliability

Monte Carlo in Bayesian Statistics

Markov chain Monte Carlo methods in atmospheric remote sensing

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Implementing Machine Reasoning using Bayesian Network in Big Data Analytics

Basic Probabilistic Reasoning SEG

Learning the hyper-parameters. Luca Martino

Artificial Intelligence

Bayesian Networks in Educational Assessment

Down by the Bayes, where the Watermelons Grow

Robust Monte Carlo Methods for Sequential Planning and Decision Making

CS 188: Artificial Intelligence. Bayes Nets

Introduction to Artificial Intelligence. Unit # 11

Bayesian Phylogenetics

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Quantitative Biology II Lecture 4: Variational Methods

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Transcription:

Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Background: Conditional probability A P (B A) = A,B P (A, B) P (A) B P (A, B) =P (B A)P (A)

Background: Conditional probability Danish Swedish Plays the accordion Among people working at DTU: P(Swedish) = 0.16 P(Swedish, plays accordion) = 0.08 P (plays the accordion Swedish) = P (plays accordion, Swedish) P (Swedish) = 0.08 0.16 =0.5

Background: Conditional probability A A,D B B,D C,D C D P (D) = P (A, D)+P (B,D)+P (C, D) = P (A D)P (D)+P (B D)P (D)+P (C D)P (D)

Background: Conditional probability Danish Swedish Plays the accordion Among people working at DTU: P(Swedish) = 0.16 P(Danish) = 0.35 P(Other) = 0.49 P(plays the accordion Swedish) = 0.5 P(plays the accordion Danish) = 0.05 P(plays the accordion Other) = 0.06 P (plays the accordion) = P (plays the accordion Swedish)P (Swedish) + P (plays the accordion Danish)P (Danish) + P (plays the accordion other)p (other) = 0.5 0.16 + 0.05 0.35 + 0.06 0.49 = 0.1269

Bayesians vs. Frequentists Meaning of probability: Frequentist: long-run frequency of event in repeatable experiment Bayesian: degree of belief, way of quantifying uncertainty Extracting information about reality from empirical data: Frequentist: parameters in model are fixed constants whose true values we are trying to find good (point) estimates for. Bayesian: uncertainty concerning model parameters expressed by means of probability distribution over possible parameter values

Probabilities as extended logic Polya, Cox, Jeffreys, Jaynes: probabilities are the only consistent basis for plausible reasoning (reasoning when there is insufficient information for deductive reasoning). Probabilities should form basis of all scientific inference Evidence from different sources integrated by using simple laws of probability (multiplication and summation...)

Bayes Theorem Reverend Thomas Bayes (1702-1761) P (B A) = P (A B) P (B) P (A)

Bayes Theorem Danish Swedish Plays the accordion P (Swedish plays the accordion) = = Among people working at DTU: P(Swedish) = 0.16 P(plays accordion) = 0.1269 P(plays accordion Swedish) = 0.5 P (plays the accordion Swedish)P (Swedish) P (plays the accordion) 0.5 0.16 =0.6304 0.1269 Knowledge about reality updated by data via Bayes theorem: Before data: P(Swedish) = 0.16 After data: P(Swedish Data) = 0.63

Bayes Theorem Reverend Thomas Bayes (1702-1761) P(H): Prior probability of hypothesis P (H D) = P(D H): Probability of data given hypothesis = likelihood P (D H)P (H) P (D) P(H D): Posterior probability of hypothesis P(D): Marginal probability of observing data. Essentially a normalizing constant so posterior will sum to one (but useful for model comparison)

Bayes Theorem P (H D) = P (w D) = P (D H)P (H) P (D) P (D w)p (w) P (D) P (w D, H 1 )= P (D w, H 1)P (w H 1 ) P (D H 1 ) Update knowledge about hypothesis H based on data D Focus is on learning about w, the parameters within some model H1: We sometimes want to make it explicit that parameters w belong to (are conditioned upon) this particular model/hypothesis P (w D) = P (D w)p (w) NP P (D w j )P (w j ) j=1 P(D) can be found by summing over P(D wj)p(wj) for all the mutually exclusive, possible parameter values wj

MCMC: Markov chain Monte Carlo P (w i D, H 1 )= P (D w i,h 1 )P (w i,h 1 ) NP P (D w j,h 1 )P (w j,h 1 ) j=1 Can be difficult or impossible to compute (either analytically or numerically) Solution: Markov chain Monte Carlo (MCMC)

MCMC: Markov chain Monte Carlo Starting point: Parameter space (covering all possible parameter values for all parameters in model) For each possible parameter value we can compute the likelihood = P(D parameter values) For each parameter value we know the prior probability = P(parameter values) We can therefore compute prior x likelihood for any given point in parameter space

MCMC: Markov chain Monte Carlo Start in random position on probability landscape (X). Compute prior x likelihood here. Let s call that PX. Based on current position: attempt move to new position (Y) by randomly drawing from proposal distribution : q(y X) (For example, the proposal distribution can be a normal distribution with mean X and standard deviation 1) Compute prior x likelihood at new position. We ll call that PY (a) If move ends higher up, i.e. PY > PX: accept move (b) If move ends below: accept move with probability P (accept) = P Y q(x Y ) P X q(y X) If q(x Y) = q(y X), i.e., q is symmetric, this becomes: P (accept) = P Y P X Write parameter values for accepted moves in file (if proposed move is not accepted: write previous values again). After many, many repetitions points will be sampled in proportion to the height of the probability landscape

MCMCMC: Metropolis-coupled Markov Chain Monte Carlo Problem: If there are multiple peaks in the probability landscape, then MCMC may get stuck on one of them Solution: Metropolis-coupled Markov Chain Monte Carlo = MCMCMC = MC 3 MC 3 essential features: Run several Markov chains simultaneously One chain cold : this chain performs MCMC sampling Rest of chains are heated : move faster across valleys Each turn the cold and warm chains may swap position (swap probability is proportional to ratio between heights) More peaks will be visited More chains means better chance of visiting all important peaks, but each additional chain increases run-time (unless you use parallelization)

MCMCMC for inference of phylogeny Result of run: (a) (b) Substitution parameters, nucleotide frequencies Tree topologies

Posterior probability distributions of substitution parameters

Posterior Probability Distribution over Trees MAP (maximum a posteriori) estimate of phylogeny: tree topology occurring most often in MCMCMC output Clade support: posterior probability of group = frequency of clade in sampled trees. 95% credible set of trees: order trees from highest to lowest posterior probability, then add trees with highest probability until the cumulative posterior probability is 0.95