One-minute responses. Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today.

Similar documents
"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

A Bayesian Approach to Phylogenetics

Bayesian Phylogenetics

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference

Biol 206/306 Advanced Biostatistics Lab 12 Bayesian Inference Fall 2016

Estimating Evolutionary Trees. Phylogenetic Methods

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Bayesian Inference. Anders Gorm Pedersen. Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU)

Bayesian phylogenetics. the one true tree? Bayesian phylogenetics

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Inferring Molecular Phylogeny

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Bayesian Phylogenetics

Infer relationships among three species: Outgroup:

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Molecular Evolution & Phylogenetics

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Physics 509: Bootstrap and Robust Parameter Estimation

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bayesian Estimation An Informal Introduction

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bayesian Regression Linear and Logistic Regression

Constructing Evolutionary/Phylogenetic Trees

MCMC: Markov Chain Monte Carlo

Constructing Evolutionary/Phylogenetic Trees

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Assessing Regime Uncertainty Through Reversible Jump McMC

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Bayesian Learning Extension

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Intro to Learning Theory Date: 12/8/16

Machine Learning for Data Science (CS4786) Lecture 24

Integrative Biology 200A "PRINCIPLES OF PHYLOGENETICS" Spring 2008

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Lecture 6 Phylogenetic Inference

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Lecture 02: Summations and Probability. Summations and Probability

Fundamental Probability and Statistics

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Bayesian Methods for Machine Learning

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Dr. Amira A. AL-Hosary

Physics 509: Non-Parametric Statistics and Correlation Testing

Bayesian Statistics: An Introduction

Generative Clustering, Topic Modeling, & Bayesian Inference

Relationship between Least Squares Approximation and Maximum Likelihood Hypotheses

Bayes Nets III: Inference

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Bayesian Phylogenetics:

Systematic uncertainties in statistical data analysis for particle physics. DESY Seminar Hamburg, 31 March, 2009

STA 4273H: Statistical Machine Learning

Lab 9: Maximum Likelihood and Modeltest

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Principles of Bayesian Inference

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

PhyQuart-A new algorithm to avoid systematic bias & phylogenetic incongruence

Phylogeny and Molecular Evolution. Introduction

Artificial Intelligence

STA 4273H: Sta-s-cal Machine Learning

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

The Bayes classifier

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Bayesian Networks BY: MOHAMAD ALSABBAGH

Departamento de Economía Universidad de Chile

MCMC 2: Lecture 2 Coding and output. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

A Re-Introduction to General Linear Models (GLM)

Lecture 6: Markov Chain Monte Carlo

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

AN INTRODUCTION TO TOPIC MODELS

Bayesian Learning (II)

Accounting for Phylogenetic Uncertainty in Comparative Studies: MCMC and MCMCMC Approaches. Mark Pagel Reading University.

Phylogeny and Molecular Evolution. Introduction

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

A first model of learning

Bayesian inference: what it means and why we care

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Probability and Statistics

Particle-Based Approximate Inference on Graphical Model

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Principles of Bayesian Inference

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Principles of Bayesian Inference

Systematics - Bio 615

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Bayes factors, marginal likelihoods, and how to choose a population model

Transcription:

One-minute responses Nice class{no complaints. Your explanations of ML were very clear. The phylogenetics portion made more sense to me today. The pace/material covered for likelihoods was more dicult than previous lectures' topics, but I think I just need to look it over for longer. The bioinformatics portion was pretty fuzzy for me.

Introduction to Phylogenies: Bayesian methods Bayes' theorem Bayesian phylogeny estimation Markov chain Monte Carlo Consensus trees

{ The { The { Normalized Bayes' theorem P(H jd) = P(DjH)P(H) P H P(DjH)P(H) The probability that our hypothesis is correct is: support given by the data times prior probability of that hypothesis by a sum over all hypotheses

Bayes' theorem Bayes's theorem is true, but use of it can be controversial Often we don't really know P(H), the prior probability Is it okay to use a not-quite-correct prior? Also, the denominator can be dicult to compute Maximum likelihood methods are an attempt to avoid Bayes' theorem... What if we bite the bullet and try to use it?

{ Branch Bayes' theorem P(H jd) = P(DjH)P(H) P H P(DjH)P(H) In phylogenetics, P(DjH) is the probability of the data for a given tree hypothesis This comes from a mutational model and is straightforward (it's what ML methods maximize) P(H) is the prior chance that any given tree is the right one This has two components: Topology { lengths

Bayesian prior on trees: topology We can assume that every topology is equally likely a priori Unfortunately there are two dierent ways to count them! In practice this choice does not seem to matter much A really good prior would reect our prior knowledge of reasonable trees No one has even attempted this

Bayesian prior on trees: topology We assume a minimum and maximum length for branches Usually the minimum is 0 and the maximum is some large value We assume a at prior between those boundaries Again, this is not a very satisfying representation of our prior knowledge, but no one has been able to do better

Bayes' theorem P(H jd) = P(DjH)P(H) P H P(DjH)P(H) We have P(DjH) (from mutational model) We have P(H) (from prior) How can the enormous summation be handled?

Markov chain Monte Carlo (MCMC)

Markov chain Monte Carlo (MCMC) Monte Carlo means solving a problem by simulation A Markov chain is a simulation that moves around on its surface in steps In our case, we move from genealogy to genealogy, being guided by P(DjH)P(H)

Show MCRobot here This is the MCRobot program of Paul Lewis. It's available at: http://www.eeb.uconn.edu/people/plewis/software.php Unfortunately it requires Windows, but if you have the chance I encourage you to try it out.

Bayesian phylogenetics Search the space of possible trees, guided by P(DjH)P(H) Can report the best tree ever found A better option is often making a consensus tree of all trees found

Consensus trees Here are three trees which don't entirely agree:

Consensus trees Here is their majority-rule consensus:

Consensus trees Majority-rule consensus: Include all groupings with more than 50% support All of these must t on a single tree (why?) Finish up the tree, if necessary, using the best groupings among the less than 50% category Branches are often labeled with their support

{ Similiar { Gives { Vulnerable { Because { Search Strengths and weaknesses of Bayesian approach Strengths: to likelihood in power, robustness, and consistency information about which parts of tree are well supported Weaknesses: to bad choice of priors it searches among whole trees, user may easily stop too soon can be extremely dicult for some data sets

Clade probabilities H C G O H C G O H C G O H C G O H C G O H C G O

Progress toward assessing clade probabilities Courtesy of David Swoord

Courtesy of David Swoord, circa 2004

Summary Bayesian methods are quite new They are similar to likelihood methods but add a prior They must use MCMC to search among possible trees Their most powerful output is not a single tree but a set of possible trees Consensus tree algorithm converts this set of trees into a useful diagram Bayesian methods are vulnerable to too-short MCMC searches

{ are { does { are { transition/transversion { base { proportion Summary Other uses of Bayesian methods besides nding the best tree: Clade support: bats monophyletic? the human mtdna tree root in Africa? pandas closer to bears or raccoons? Model parameters: ratio frequencies of sites which do not vary