BMI/CS 776 Lecture 4. Colin Dewey

Similar documents
Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Markov Chains. Sarah Filippi Department of Statistics TA: Luke Kelly

Substitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A

Machine Learning for Data Science (CS4786) Lecture 24

Maximum Likelihood Tree Estimation. Carrie Tribble IB Feb 2018

Evolutionary Tree Analysis. Overview

Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Reading for Lecture 13 Release v10

Evolutionary Models. Evolutionary Models

Phylogenetics: Likelihood

π b = a π a P a,b = Q a,b δ + o(δ) = 1 + Q a,a δ + o(δ) = I 4 + Qδ + o(δ),

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Lecture 4. Models of DNA and protein change. Likelihood methods

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Reconstruire le passé biologique modèles, méthodes, performances, limites

How should we go about modeling this? Model parameters? Time Substitution rate Can we observe time or subst. rate? What can we observe?

Mutation models I: basic nucleotide sequence mutation models

Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26

EVOLUTIONARY DISTANCES

Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22

Phylogenetic Tree Reconstruction

进化树构建方法的概率方法 第 4 章 : 进化树构建的概率方法 问题介绍. 部分 lid 修改自 i i f l 的 ih l i

Lecture Notes: Markov chains

Lecture 4 October 18th

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Bayesian Networks: Belief Propagation in Singly Connected Networks

Phylogenetics: Parsimony and Likelihood. COMP Spring 2016 Luay Nakhleh, Rice University

4 : Exact Inference: Variable Elimination

Quantitative Model Checking (QMC) - SS12

2 : Directed GMs: Bayesian Networks

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Inferring Molecular Phylogeny

Basic math for biology

Bayesian Graphical Models

EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

4.1 Notation and probability review

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Molecular Evolution, course # Final Exam, May 3, 2006

9 Forward-backward algorithm, sum-product on factor graphs

Using algebraic geometry for phylogenetic reconstruction

Mean-field dual of cooperative reproduction

Week 5: Distance methods, DNA and protein models

Lecture 9: PGM Learning

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)


Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Markov Models & DNA Sequence Evolution

Dynamic Approaches: The Hidden Markov Model

Discrete & continuous characters: The threshold model

An Introduction to Bayesian Networks: Representation and Approximate Inference

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Conditional Random Field

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

arxiv: v1 [q-bio.pe] 4 Sep 2013

Constructing Evolutionary/Phylogenetic Trees

Phylogenetic Inference and Hypothesis Testing. Catherine Lai (92720) BSc(Hons) Department of Mathematics and Statistics University of Melbourne

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Graphical Models and Kernel Methods

1. what conditional independencies are implied by the graph. 2. whether these independecies correspond to the probability distribution

A Minimum Spanning Tree Framework for Inferring Phylogenies

Probabilistic Graphical Networks: Definitions and Basic Results

A (short) introduction to phylogenetics

Evolutionary Analysis of Viral Genomes

CS Lecture 3. More Bayesian Networks

Probabilistic Graphical Models

1. Can we use the CFN model for morphological traits?

Phylogenetics: Building Phylogenetic Trees

13: Variational inference II

11 The Max-Product Algorithm

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

5. Sum-product algorithm

Artificial Intelligence

Announcements. CS 188: Artificial Intelligence Fall Causality? Example: Traffic. Topology Limits Distributions. Example: Reverse Traffic

Intelligent Systems:

Bayesian Analysis of Elapsed Times in Continuous-Time Markov Chains

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Machine Learning. Torsten Möller.

Letter to the Editor. Department of Biology, Arizona State University

CPSC 540: Machine Learning

Is the equal branch length model a parsimony model?

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

11. Learning graphical models

CSC 412 (Lecture 4): Undirected Graphical Models

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Probabilistic Graphical Models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Lecture 8: Bayesian Networks

Statistical Approaches to Learning and Discovery

Bayesian Machine Learning - Lecture 7

Transcription:

BMI/CS 776 Lecture 4 Colin Dewey 2007.02.01

Outline Common nucleotide substitution models Directed graphical models Ancestral sequence inference

Poisson process continuous Markov process X t0 X t1 X t2 X tn Y t0 Y t1 Y t2 Y tn P[X ti+1 = x i+1 X ti = x i ] = e λ(t i+1 t i ) (λ(t i+1 t i )) x i+1 x i (x i+1 x i )! P[Y ti+1 = b Y ti = a, X ti+1 = x i+1, X ti = x i ] = (R x i+1 x i ) ab

Jukes-Cantor Model 1 0 0 0 Π = 1 0 1 0 0 4 0 0 1 0 0 0 0 1 Jukes-Cantor model. 1 1 1 1 3 1 1 1 R = 1 1 1 1 1 4 1 1 1 1 1 3 1 1 4 1 1 3 1. 1 1 1 1 1 1 1 3 P(t) = 1 4 1 + 3e λt 1 e λt 1 e λt 1 e λt 1 e λt 1 + 3e λt 1 e λt 1 e λt 1 e λt 1 e λt 1 + 3e λt 1 e λt 1 e λt 1 e λt 1 e λt 1 + 3e λt Jukes-Cantor model satisfies two important properties which a

Properties of Jukes- Cantor ergodic lim t P(t) = ΠJ time reversible ΠQ = Q T Π

Transitions vs. Transversions Transition purine purine pyrimidine pyrimidine Transversion pyrimidine purine purine pyrimidine Purines A G Pyrimidines C T

HKY85 R = 1 π G α π Y β π C β π G α π T β Q = π A β 1 π G α π R β π G β π T α π A α π C β 1 π A α π Y β π T β π A β π C α π G β 1 π C α π R β π G α π Y β π C β π G α π T β π A β π G α π R β π G β π T α π A α π C β π A α π Y β π T β π A β π C α π G β π C α π R β., Here α, β 0 and 0 α + 2β < 1 and π A + π C + π G + π T π R = π A + π G and π Y = π C + π T. = 1 with

Branch lengths What do the lengths of tree edges mean? Expected number of mutations per site time (t) and rate (λ) are not identifiable, only their product is: λt P(mutation event) = 1 x Σ π x R xx = 1 trace(πr) = trace(πq). Expected number of mutations = λt ( trace(πq)) Typically scale λ so that λt ( trace(πq)) = 1

Estimating branch length Given empirical P(t), can estimate branch length with the logdet transform logdet(p(t)) = log(det(p(t))) logdet(p(t)) = λt ( trace(q)).

Directed Graphical Models Same thing as Bayesian networks Directed graph G = (V, E) Vertices correspond to random variables Missing edges represent independence statements

X4 X2 Example DGM X6 X1 X4 X2 X1 X6 X5 X3 X3 X5

Tree model P[x 1, x 2,..., x 9 ] = π x9 P xi x j (t ij ) X9 L(data model) = (x i,x j ) E x 6,x 7,x 8,x 9 P(x 1, x 2,..., x 9 ) t97 t98 X8 data = (x 1, x 2, x 3, x 4, x 5 ) model = (T, P(t), Π) X7 t71 t72 t83 t64 t86 X6 t65 X1 X2 X3 X4 X5

Likelihood computation L k (a): Likelihood of subtree rooted by k with x k = a for each vertex k in a topological ordering: if k is a leaf: { 1 if xk = a set L k (a) = 0 otherwise else: let i and j be the child vertices of k set L k (a) = b,c P ab (t ki )L i (b)p ac (t kj )L j (c) return L(data model) = a π a L r (a), where r is the root vertex (Felsenstein s algorithm, sum-product algorithm)

Marginal reconstruction most likely character at root given the leaf characters compute L r (a) using the likelihood algorithm return argmax a π a L r (a)

Joint reconstruction character at root for most likely assignment to all interior vertices given the leaf characters V k (a): Likelihood of most likely interior vertex assignments for subtree rooted by k, with x k = a for each vertex k in a topological ordering: if k is a leaf: { 1 if xk = a set V k (a) = 0 otherwise else: let i and j be the child vertices of k set V k (a) = max b,c return argmax a P ab(t ki )V i (b)p ac (t kj )V j (c) π a V r (a), where r is the root vertex