CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1. Ben Raphael January 21, Course Par3culars

Similar documents
Phylogeny and Molecular Evolution. Introduction

Phylogeny and Molecular Evolution. Introduction

Phylogene)cs. IMBB 2016 BecA- ILRI Hub, Nairobi May 9 20, Joyce Nzioki

CSCI1950 Z Computa3onal Methods for Biology Lecture 24. Ben Raphael April 29, hgp://cs.brown.edu/courses/csci1950 z/ Network Mo3fs

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

CSCI1950 Z Computa4onal Methods for Biology Lecture 5

CSCI1950 Z Computa4onal Methods for Biology Lecture 4. Ben Raphael February 2, hhp://cs.brown.edu/courses/csci1950 z/ Algorithm Summary

Lecture 11 Friday, October 21, 2011

Networks. Can (John) Bruce Keck Founda7on Biotechnology Lab Bioinforma7cs Resource

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Evolutionary trees. Describe the relationship between objects, e.g. species or genes

Phylogeny: building the tree of life

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Introduction to Bioinformatics

Gene Regulatory Networks II Computa.onal Genomics Seyoung Kim

C3020 Molecular Evolution. Exercises #3: Phylogenetics

Comparative Network Analysis

BINF6201/8201. Molecular phylogenetic methods

A (short) introduction to phylogenetics

Theory of Evolution Charles Darwin

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Graph Alignment and Biological Networks

Algorithms in Bioinformatics

Examples of Phylogenetic Reconstruction

Processes of Evolution

A Phylogenetic Network Construction due to Constrained Recombination

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

What is Phylogenetics

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

SCIENTIFIC EVIDENCE TO SUPPORT THE THEORY OF EVOLUTION. Using Anatomy, Embryology, Biochemistry, and Paleontology

Constructing Evolutionary/Phylogenetic Trees

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Evolutionary Tree Analysis. Overview

List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi

Phylogenetic Tree Reconstruction

Dr. Amira A. AL-Hosary

Sec$on 9. Evolu$onary Rela$onships

Phylogenetic inference

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Phylogenetics: Building Phylogenetic Trees

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Modern Evolutionary Classification. Section 18-2 pgs

Announcements. Topics: Work On: - sec0ons 1.2 and 1.3 * Read these sec0ons and study solved examples in your textbook!

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,


Announcements. Topics: Homework: - sec0ons 1.2, 1.3, and 2.1 * Read these sec0ons and study solved examples in your textbook!

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Theory of Evolution. Charles Darwin

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

networks in molecular biology Wolfgang Huber

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

BIOLOGY 432 Midterm I - 30 April PART I. Multiple choice questions (3 points each, 42 points total). Single best answer.

Network alignment and querying

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

C.DARWIN ( )

Priors in Dependency network learning

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

How should we organize the diversity of animal life?

8/23/2014. Phylogeny and the Tree of Life

Phylogenetic Analysis

Phylogenetic Analysis

Phylogenetics. BIOL 7711 Computational Bioscience

Constructing Evolutionary/Phylogenetic Trees

Evolu&on, Popula&on Gene&cs, and Natural Selec&on Computa.onal Genomics Seyoung Kim

Phylogeny Fig Overview: Inves8ga8ng the Tree of Life Phylogeny Systema8cs

EVOLUTIONARY DISTANCES

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Casey Leonard. Multiregional model vs. Out of Africa theory SLCC

Phylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?

Phylogenetic Analysis

How to read and make phylogenetic trees Zuzana Starostová

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Basics on bioinforma-cs Lecture 7. Nunzio D Agostino

PHYLOGENY AND SYSTEMATICS

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Computational Biology Course Descriptions 12-14

Phylogenies & Classifying species (AKA Cladistics & Taxonomy) What are phylogenies & cladograms? How do we read them? How do we estimate them?

Evolu&on Cont d. h:p:// content/uploads/2009/09/evolu&on.jpg. 7 th Grade Biology Mr. Joanides

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Multiple Sequence Alignment. Sequences

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

I. Short Answer Questions DO ALL QUESTIONS

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Overview: In addi:on to considering various summary sta:s:cs, it is also common to consider some visual display of the data Outline:

Bayesian Inference using Markov Chain Monte Carlo in Phylogenetic Studies

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Phylogenetics in the Age of Genomics: Prospects and Challenges

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Supplementary Materials for

Transcription:

CSCI1950 Z Computa3onal Methods for Biology* (*Working Title) Lecture 1 Ben Raphael January 21, 2009 Course Par3culars Three major topics 1. Phylogeny: ~50% lectures 2. Func3onal Genomics: ~25% lectures 3. Network/Systems Biology: ~25% lectures Tools Computer Science: Algorithms and discrete math (e.g. graph theory), Programming Mathema3cs: Discrete Probability, Linear algebra (vectors and matrices) Biology: Basics. (What is DNA?) 1

Course Par3culars Webpage h\p://cs.brown.edu/courses/csci1950 z/ [readings (including some background material) Textbook: None Assignments: mens et manus 1. 4 wri\en assignments: ~40% of grade 2. 3 programming assignments: ~40% of grade 3. Take home final: ~20% of grade Graduate credit Extra assignment/project Talk to me before March 1 Survey Topic 1: Phylogeny 2

Early Evolu3onary Studies 200 th Anniversary of birth of Charles Darwin From Origin of the Species (1859) Darwin 1960 s Anatomical features were the dominant criteria used to derive evolu3onary rela3onships between species. Imprecise, ofen subjec3ve, observa3ons ofen led to inconclusive, contradictory, or incorrect evolu3onary rela3onships between species Molecular data (DNA and protein sequences) drama3cally improved situa3on. 3

Species Trees Is a panda more closely related to a bear or a raccoon? Looks Hiberna3on Pa\ern Bear Raccoon ~100 years of arguments Tree derived from DNA sequence data. Steven O Brien et al. (1985) Human Evolu3onary History From: Molecular Evolu7on a Phylogene7c Approach, R. Page & E. Holmes 4

More Recent Human History Out of Africa Hypothesis: Most ancient ancestor lived in Africa roughly 200,000 years ago 1 2 3 4 5 http://www.becominghuman.org The Origin of Humans: Out of Africa vs Mul3regional Hypothesis Out of Africa: Humans evolved in Africa ~200,000 years ago Humans migrated out of Africa, replacing other humanoids around the globe Multiregional: Humans evolved in the last two million years as a single species. Independent appearance of modern traits in different areas Humans migrated out of Africa mixing with other humanoids on the way 5

Human Evolu3onary Tree DNA based reconstruc3on of the human evolu3onary tree http://www.mun.ca/biology/scarr/out_of_africa2.htm Evolu3onary Tree of Humans (mtdna) Vigilant, Stoneking, Harpending, Hawkes, and Wilson (1991) African population is the most diverse (sub-populations had more time to diverge) Evolutionary tree separates one group of Africans from a group containing all five populations. Tree rooted on branch between groups of greatest difference. 6

Evolu3onary Tree of Humans: (microsatellites) Neighbor joining tree for 14 human populations genotyped with 30 microsatellite loci. Lineage of Genghis Kahn? In humans, Y chromosome passed from father only. Can be used to iden3fy parental lineages. ~8% of males in parts of Asia and 0.5% world wide es3mated to be descendants of a resident of Mongolia ~1000 years ago (Zerjal et al. AGHG 2003). 7

Lafaye\e, Louisiana, 1994: A woman claimed her exlover (who was a physician) injected her with HIV+ blood Records show the physician had drawn blood from an HIV+ pa3ent that day Is there a way to show that blood from that HIV + pa3ent ended up in the woman? HIV Transmission HIV has a high muta3on rate, which can be used to trace paths of transmission Two people who were infected from different sources will have very different HIV sequences Alignment of fourteen amino acid sequences from V3 region of HIV 1 gp120 genes Azizi et al. BMC Immunology 2006 7:25 8

To the Lab! Wet lab Take mul3ple samples from the pa3ent, the woman, and controls (non related HIV+ people) Obtain DNA sequence from two HIV genes HIV (gp120 and RT). Computer lab Build phylogene3c tree from the DNA sequences. Phylogene3c Tree Convic3on Three different tree reconstruc3on techniques used. In every reconstruc3on, vic3m s sequences were related to pa3ent s sequences. Nes3ng of the vic3m s sequences within the pa3ent sequence indicated the direc3on of transmission was from pa3ent to vic3m First 3me phylogene3c analysis was used in a court case as evidence (Metzker, et. al., 2002) 9

Phylogene3c Trees How to build a phylogene7c tree from data? Data 1. Characters/Features 2. Pairwise distances Algorithm Phylogene3c Trees What is a phylogene7c tree? Biology definition: None (picture) A branching diagram Intuition: Leaves represent existing species Branch points represent most recent common ancestor. Length of branches represent evolutionary time. Root represents the oldest evolutionary ancestor. 10

Phylogene3c Trees What is a phylogene7c tree? Computer science definition tree: A connected acyclic graph G = (V, E). graph: A set V of vertices and a set E of edges, where each edge connects a pair of vertices. Tree Defini3ons tree: A connected acyclic graph G = (V, E). graph: A set V of vertices and a set E of edges, where each edge (v i, v j ) connects a pair of vertices. A path in G is a sequence (v 1, v 2,, v n ) of vertices in V such that (v i, v i+1 ) are edges in E. A graph is connected provided for every pair v i v j of vertices, there is a path between v i and v j. A cycle is a path with the same starting and ending vertices. A graph is acyclic provided it has no cycles. 11

Tree Defini3ons tree: A connected acyclic graph G = (V, E). degree of vertex v is the number of edges incident to v. A phylogenetic tree is a tree with a label for each leaf (vertex of degree one). A binary phylogenetic tree is a phylogenetic tree where every interior (non-leaf) vertex has degree 3; i.e. two children. A rooted (*binary) phylogenetic tree is phylogenetic tree with a single designated vertex r (* of degree 2) Rooted and Unrooted Trees In the unrooted tree the position of the root ( oldest ancestor ) is unknown. Otherwise, they are like rooted trees 12

Evalua3ng Different Phylogenies Value1 Value2 Mouth Smile Frown Eyebrows Normal Pointed Character Based Tree Reconstruc3on Which tree is beher? 13

Character Based Tree Reconstruc3on Count changes on tree Character Based Tree Reconstruc3on Parsimony: minimize number of changes on edges of tree 14

Character Based Tree Reconstruc3on Maximum Likelihood: Given Pr[change], what is tree with maximum probability? Iden3fying Highest Scoring Tree Naïve, exhaus3ve Algorithm: check all trees. How many possibili3es? Restrict to binary trees. 15

Phylogene3c Trees How to efficiently build trees from data? 1 4 3 2 5 Data 1. Characters/Features 2. Pairwise distances 1 4 2 3 5 Phylogene3c Trees How to efficiently build trees from data? 1 4 3 2 5 1 4 2 3 5 Methods 1. Characters/Features Parsimony: Minimum number of changes Probabilistic Model 2. Pairwise distances Clustering (UPGMA, Neighbor joining, ) 16

Addi3onal Models and Extensions Comparing trees Distances between trees. Sta3s3cal tests: bootstrap, permuta3on tests, etc. Supertrees and consensus Gene trees vs. species trees. Whole genome phylogeny. Topic 2: Func3onal Genomics 17

Biology 101 Biology 101 Central Dogma 18

What can we measure? Sequencing (expensive) Hybridiza3on (noisy) Sequencing (expensive) Hybridiza3on (noisy) Mass spectrometry (noisy) Hybridiza3on (very noisy!) DNA Basepairing 19

DNA Microarrays Clustering of Gene Expression Each microarray experiment: expression vector u = (u 1,, u n ) u i = expression value for each gene. Group similar vectors. Samples Gene expression BMC Genomics 2006, 7:279 20

Clustering 1 4 3 2 5 Clustering algorithms related to distance based phylogene3c algorithms. Phylogeny gives grouping of related data points. 1 4 2 3 5 Binary classifica@on Given a set of examples (x i, y i ), where y i = + 1, from unknown distribu3on D. Design func3on f: R n { 1,+1} that assigns addi3onal samples x i to one of two classes op7mally. Classifica3on 21

Topics Methods for Clustering Hierarchical, Matrix based (PCA), Graph based (Clique finding) Methods for Classifica3on Nearest neighbors, support vector machines Data Integra3on: Bayesian Networks Topic 3: Network and Systems Biology 22

Biological Interac3on Networks Many types: Protein DNA (regulatory) Protein metabolite (metabolic) Protein protein (signaling) RNA RNA (regulatory) Gene3c interac3ons (gene knockouts) Regulatory Networks 23

Cis regulatory Network Metabolic Networks Nodes = reactants Edges = reac3ons labeled by enzyme (protein) that catalyzes reac3on 24

Protein Protein Interac@on (PPI) Network Protein Protein Interac3on Network? Proteins are nodes Interac3ons are edges Edges may have weights Yeast PPI network H. Jeong et al. Nature 411, 41 (2001) 25

Computa3onal Problems 1. Classifying Network Topology Finding paths, cliques, dense subnetworks, etc. 2. Comparing Networks Across Species 3. Using networks to explain data Dependencies revealed by network topology 4. Modeling dynamics of networks Network Mo3fs Subnetworks with more occurrences than expected by chance. How to find? How to assess sta3s3cal significance? Shen Orr et al. 2002 26

Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427 433, 2006 The Network Alignment Problem Given: k different interac3on networks belonging to different species, Find: Conserved sub networks within these networks Conserved defined by protein sequence similarity (node similarity) and interac3on similarity (network topology similarity) 27

Protein Signaling Networks Art Salomon Biology Department Use machine learning methods (Bayesian networks, etc. to derive network structure. Course Themes Topics: Phylogeny, Func3onal Genomics, Systems & Network Biology Mixture of theory and prac3ce (real data) Graph algorithms: Path and clique finding, isomorphism, heavy subgraphs, matching, vertex cover, spanning and Steiner problems, etc. Sta@s@cs: Hypothesis tes3ng, permuta3on tests, bootstrap and resampling, enrichment (hypergeometric), etc. Data Mining and Machine Learning: Clustering and Classifica3on 28

Sources h\p://bioalgorithms.info (por3ons of Out of Africa and character phylogeny slides) 29