Comparative Genomics. Dept. of Computer Science Comenius University in Bratislava, Slovakia
|
|
- Sabrina Waters
- 5 years ago
- Views:
Transcription
1 Comparative Genomics Broňa Brejová Dept. of Computer Science Comenius University in Bratislava, Slovakia 1
2 2
3 Why to sequence so many genomes? 3
4 Comparative genomics Compare genomic sequences of multiple related species find similarities and differences substitutions, indels, genome rearrangements and duplications Explore evolutionary processes neutral mutations vs. positive / negative selection Find functional regions (genes, regulatory regions etc.) often characterized by negative (purifying) selection Look for differences explaining different phenotypes, e.g. human versus other primates domesticated animals/plants vs. wild counterparts pathogenic species vs. free-living relatives adaptations to different environments and diets 4
5 Gene family evolution duplication HISTORY: GENE TREE: speciation A1 A2 A3 B1 B2 B3 speciation duplication loss SPECIES TREE: A1 A2 B1 B2 A3 B3 species 1 species 2 species 3 species 1 species 2 species 3 Homolog: shared evolutionary origin Ortholog: closest common ancestor is a speciation node (e.g.. A1/A3) Paralog: closest common ancestor is a duplication node (e.g. B1/B2, A1/B1, B1/B3, B2/B3) 5
6 Gene tree / species tree reconciliation Given species and gene tree, infer history Favor histories with fewer events (parsimony) Gene tree inferred from gene sequences, may contain errors GENE TREE: HISTORY: A1 A2 A3 B1 B2 B3 SPECIES TREE: species 1 species 2 species 3 A1 A2 B1 B2 A3 B3 species 1 species 2 species 3 6
7 Global view of gene family evolution Do not infer history for each family instead assume a global evolutionary model of family size change Find genes in genomes Assign them to families (by sequence similarity) Summarize counts for each family and each species Infer a species tree Infer overall rateλof gene gain/loss E.g. in yeasts gains and losses/gene/million years Look for families with significantly accelerated evolution [Hahn et al. 2005] 7
8 Stochastic model of gene family evolution Simplified view of how evolution might operate Imagine generating simulated data Birth and death process for gene families species 1 species 2 species 3 8
9 Stochastic model of gene family evolution Each gene can duplicate or be lost at a rateλ For a short timetwe expect2λtn events in ann-gene family For longertsome genes can be affected multiple times P(X t = c X 0 = s) = min(s,c) j=0 whereα = λt/(1+λt) Using these formulas, we can compute ( s j)( s+c j+1 s 1 probability of a history 1 3 We can also compute probability (likelihood) ) α s+c 2j (1 2α) j of observing counts in the current species We can find value ofλmaximizing probability over all gene families
10 Results on five yeast genomes [Hahn et al. 2005] Rate λ of gene gain/loss is gains and losses/gene/million years 1254 out of 3517 gene families some change in size Look for families with significantly accelerated evolution Stress response family: S.cer. S.par. S.mik. S.kud. S.bay. 10
11 Whole-genome alignments For each region of a reference genome (e.g. human) find and align corresponding parts from other genomes Human AGTGGCTGCCAGGCTG---GGATGCTGAGGCCTTGTTTGCAGGGAGGT Rhesus AGTGGCTGCCAGGCTG---GGTTGCTGAGGCCTTGTTTGCCGGGAGGT Mouse GGTGGCTGCCGGGCTG---GGTGGCTGAGGCCTTGTTGGTGGGGTGGT Dog AGTGGCTGCCCGGCTG---GGTGGCTGAGGCCTTATTTGCAGGGAGGT Horse GATGGCTGCCGGGCTG---GGCTGCCGAGGCCTTGTTCGTGGGGAGGT Armadillo AGTGGCTGCCGGGCTG---GGAGGCCAAGGCCTTGTTCGCGGGCAGGT Chicken AGTGGCTGCCAGTCTGCGCCGTGGCCGACGTCTTGCTCGGGGGAAGGT X. tropicalis AATGGCTTCCATTTTGTGCCGCTGCTGAGGTCTTGTTCTGGGGAAGAT 11
12 12
13 Nets and chains from the UCSC genome browser For each region of a reference genome (e.g. human) find and align corresponding parts from other genomes (e.g. mouse, dog, chicken, etc.) Local alignments align exons and other conserved elements, but many parts of genomes have changed too much For duplicated regions decide which pairs are orthologs This can be done using synteny: a chain of local alignments in the same order and orientation in the two genomes 13
14 Nets and chains from the UCSC genome browser Start with local alignments Connect them to chains, where we allow big indels and unaligned regions, but require the same order and orientation in both genomes Selects some chains to form a hierarchical net: choose chains with highest score that do not overlap in reference We can use only parts of some chains Parts of non-reference genome can be used more than once (if reference duplicated) 14
15 Scale chr13: chr k Level 1 Level 2 Level 3 Level 4 Level 5 Level 6 2 kb Mouse Chained Alignments chr k chr k chr k chr k chr k Mouse Alignment Net 15
16 Negative (purifying) selection Important parts of a genome accumulate mutations more slowly Find conserved elements in genomes Many correspond to known functional elements (genes, regulation) Conserved elements not overlapping these are interesting targets for future research SC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genom UCSC Genes 1 _ Placental Mammal Conservation by PhastCons Mammal Cons 0 _ Multiz Alignments of 46 Vertebrates Gaps 2 Human C A AGA CGAGA C AGG T A A A T C T C A T GAGC T T T A T T C T A T A T T T Chimp C A AGA CGAGA C AGG T A A A T C T C A T GAGC T T T A T T C T A T A T T T Mouse C A AGGCGGGA C AGG T GAGCC T CC T GCGC T GCGC T C T C T GC T T Dog C A AGGCGAGA C AGG T A A AGC T C A T GAGA T T T A T T C T A T A T T T Chicken C A AGGCGAGA C AGG T A A T T C T T A T GAGA T T T CGA C T G T A C T T 16
17 Substitution models Jukes-Cantor model: basexmutates to some other basey at rateα(rate the same for allx Y) Probability of change fromatocover timet: Pr(X t = C X 0 = A) = 1 4 (1 e 4 3 αt ) Includes possibility of multiple mutations C We can compute probability C of a history A A C We can also compute likelihood given only current sequences A A C We can estimate bestαfor a given alignment 17
18 More complex substitution models Jukes-Cantor model assumes each mutation equally likely In general, substitution rateµ xy from basexto basey Substitution rate matrix µ A µ AC µ AG µ AT µ CA µ C µ CG µ CT µ GA µ GC µ G µ GT µ TA µ TC µ TG µ T Pr(X t = C X 0 = A) does not have in general closed formula but can be computed by algebraic methods Equilibrium frequenciesπ A,π C,π G,π T stay stable in the model 18
19 HKY model [Hasegawa, Kishino a Yano 1985] A lower number of parameters µ A βπ C απ G βπ T βπ A µ C βπ G απ T απ A βπ C µ G βπ T βπ A απ C βπ G µ T Transition rateα:c T,A G Transversion rateβ:{c,t} {A,G} Five parameters:π A,π C,π G,α,κ = α/β 19
20 Back to conserved elements Basic idea: infer two ratesα c for slow evolving andα n for fast evolving alignment columns For each column try to determine, which rate more likely But: one column may look conserved purely by chance (not enough information) Combine information from a short window or use a phylogenetic hidden Markov model (phylohmm) 20
21 PhastCons: detection of conserved elements using phylohmms 21
22 PhastCons results Whole-genome alignments of human, mouse, chicken and fugu 22
23 Conserved elements in 29 mammals [Lindblad-Toh et al. 2011] Four binding sites of NRSF transcription factor 23
24 Comparative gene finding Improve genome annotation using whole-genome alignments Look for specific signatures typical for genes (synonymous substitutions, indels preserving reading frame) Lin et al
25 Comparative gene finding Comparative genomics also helps to find special cases e.g. stop codon readthrough selenoproteins (UGA to selenocysteine) RNA editing (adenine na inosine) Lin et al
26 Human Accelerated Regions [Pollard et al 2006] We are looking for genomic regions which: were mutating slowly for a long time (negative selection) in human lineage they change very fast (positive selection) Details: Consider regions of length 100 with 96% sequence identity between chimpanzee and mouse/rat (35,000) Compare with other mammals, select those that have many mutations in human and few elsewhere Probabilistic model which allows scaling of human branch 49 statistically significant regions, 96% of them non-coding 26
27 Human Accelerated Regions: HAR1 Region of length 118 bases 300 mil. years 18 changes between human and chimpanzee medzi 2 changes between chimpanzee and chicken 6 mil. years Clovek C T G A A A T G A T G G G C G T A G A C G C A C G T C A G C G G C G G A A A T G G T T T C T A T Simpanz C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C A G T G G A A A T A G T T T C T A T Gorila C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C A G T G G A A A T A G T T T C T A T Rezus C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C A G T G G A A A T A G T T T C T A T Mys C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C C G T G G A A A T G G T T T C T A T Krava C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C A G T G G A A A C C G T T T C T A T Pes C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C G G T G C A A A C A G T T T C T A T Sliepka C T G A A A T T A T A G G T G T A G A C A C A T G T C A G C A G T A G A A A C A G T T T C T A T 27
28 What is the function of HAR1? Overlaps RNA genes HAR1R and HAR1F HAR1F is expressed in neocortex in 7 and 9 week old embryos, later also in other parts of the brain (in human and other primates) 28
29 What is the function of HAR1? Mutations change RNA structure 29
30 Functional enrichment Results of whole-genome studies often in the form of a list of significant genes In comparative genomics e.g. families with accelerated gain and loss, human accelerated regions, genes under positive selection etc. Also from other studes, e.g. differential expression analysis How to use such lists? look manually at the most significant candidates try to find common characteristics of the whole set 30
31 Gene ontology Hierarchical structure of biological terms describing functions of genes GO: biological process GO: localization GO: establishment of localization GO: transport GO: ion transport GO: ion transmembrane transport Databases contain gene ontology terms for many proteins Is some function enriched in our gene set? 31
32 Example [Kosiol et al 2007] n = genes overall n i = 70 genes with innate immune response term (0.4% of all genes) n p = 400 genes with positive selection overall n ip = 8 of them innate immune response (2% of genes with pos.sel.) Contingency table Pos.sel. No pos.sel. Total Immunity 8 (n ip ) (n i ) Other Total 400 (n p ) (n) 32
33 Null hypothesis Genes in our list were randomly selected from all genes Whole genome hasn i /n = 0.4% immunity genes Our list should contain aboutn p (n i /n) immunity genes We expect 1.7 genes, get 8 genes But purely by chance the number can be larger or smaller Urn withn i white andn n i black balls Randomly selectn p balls, how many are white?x ip Hypergeometric distribution Pr(X ip = n ip ) = ( ni n ip )( n ni n p n ip ) ( ) n / n p P-value:P(X ip 8) =
34 Our research: ancestral gene orders in mitochondrial genomes [Valach et al. NAR, 2011, Kovac, Brejova, Vinar WABI 2011] 4 0 (0-1) 2 (0-2) 1 (1-3) 5 (4-5) 2 (0-2) 4 (3-4) 0 (0-1) C. parapsilosis 2 (1-3) 1 (0-1) C. orthopsilosis 0 0 (0-1) C. orthopsilosis 1 (0-1) C. jiufengensis 9 (8-9) L. elongisporus 2 (0-2) C. tropicalis 2 (2-3) 5 (5-7) C. sojae 0 1 (0-1) C. viswanathii 2 C. frijolesensis 1 0 C. neerlandica 5 (5-6) C. albicans 11 (11-12) C. maltosa 5 (4-5) C. alai 2 (2-3) 3 C. subhashii 5 (4-5) D. hansenii 3 (3-4) P. sorbitophila nad3 nad2 cob cox2 rnl cox1 nad4 rns atp9 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob cox2 rnl cox1 nad4 rns atp9 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob cox2 rnl cox1 nad4 rns atp9 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob cox2 rnl cox1 nad4 rns atp9 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob cox2 rnl cox1 nad4 rns atp9 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob atp9 rns nad4 cox1 rnl cox2 nad6 nad1 cox3 nad4l nad5 atp8 atp6 cox1 nad4 rns atp9 cob nad2 nad3 rnl cox2 nad6 nad1 cox3 nad4l nad5 atp8 atp6 nad3 nad2 cob atp9 rns nad4 cox1 rnl cox2 nad6 nad1 cox3 nad4l nad5 atp8 atp6 cob rnl cox2 nad6 nad1 nad2 nad3 nad4 rns atp9 cox1 atp8 atp6 cox3 nad5 nad4l nad3 nad2 cob atp9 rns nad4 cox1 rnl cox2 nad6 nad1 cox3 atp8 atp6 nad5 nad4l rns atp9 nad2 nad3 cob cox1 rnl cox2 nad6 nad1 cox3 atp6 atp8 nad4 nad4l nad5 nad3 nad2 atp9 rns cob cox3 nad1 nad6 cox2 rnl cox1 nad5 nad4l nad4 atp8 atp6 cox1 nad3 nad2 atp9 rns cob cox2 rnl nad6 nad1 cox2 cob rns atp9 nad2 nad3 nad4 cox3 nad3 nad2 atp9 rns cob cox1 rnl cox2 nad6 nad1 cox3 atp6 atp8 nad4 nad4l nad5 cox3 cox1 cob rns atp9 nad2 nad3 nad5 nad4l nad4 atp8 atp6 rnl cox2 nad6 nad1 nad3 nad2 atp9 rns cob cox1 rnl cox2 nad6 nad1 cox3 atp6 atp8 nad4 nad4l nad5 rnl cox2 nad6 nad1 cox3 cox1 cob rns atp9 nad2 nad3 nad5 nad4l nad4 atp8 atp6 nad3 nad2 atp9 rns cob cox1 rnl cox2 nad6 nad1 cox3 atp6 atp8 nad4 nad4l nad5 nad3 nad2 atp9 rns cob cox1 rnl cox2 nad6 nad1 cox3 atp6 atp8 nad4 nad4l nad5 cox1 cob rns atp9 nad2 nad3 cox3 rnl cox2 nad6 nad1 nad4l nad5 nad4 atp6 atp8 cox1 nad2 nad3 cob rns nad4l nad5 nad4 cox3 rnl cox2 nad6 nad1 cox3 atp9 atp6 atp8 cox1 nad4 cob rns atp9 nad2 nad3 cox3 rnl cox2 nad6 nad1 nad4l nad5 atp6 atp8 nad2 nad3 cox1 nad4 nad1 nad6 atp6 atp8 cox2 rnl cob nad5 nad4l rns atp9 cox3 cox2 rnl cox1 nad4 rns atp9 cob nad2 nad3 atp8 atp6 nad5 nad4l cox3 nad1 nad6 cob nad1 nad6 cox2 nad4 rnl cox3 cox1 atp6 atp8 atp9 rns nad4l nad5 nad2 nad3 atp8 a nad3 nad2 nad5 nad4l rnl cox2 nad6 nad1 cox3 cox1 nad4 rns atp9 cob atp6 atp8 cob atp9 rns nad4 nad1 nad6 cox2 rnl nad4l nad5 cox1 cox3 nad2 nad3 atp8 atp6 cox3 cob nad3 nad2 nad1 nad6 cox2 rnl rns nad4 cox1 atp9 atp8 atp6 nad4l nad5 rns nad4 cox1 rnl cox2 nad6 nad1 atp9 atp8 atp6 nad2 nad3 cob cox3 nad4l nad5 nad2 nad3 nad4 rns cob cox3 nad4l nad5 cox1 rnl cox2 nad6 nad1 atp9 atp8 atp6 34
35 Our research: history inference for duplicated gene clusters [Vinar, Brejova, Song, Siepel. JCB 2010] human IFN cluster, chr 9 35
36 Conclusion Comparative genomics can help us annotate genomes and study their evolution Evolution typically characterized by stochastic models We can estimate parameters of these models from data to study typical patterns of evolution We can also detect atypical elements Gene family evolution, conserved elements, accelerated elements Next: positive selection in protein coding genes 36
Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationPhylogenetic Tree Reconstruction
I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationEvolution by duplication
6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationAdaptive Evolution of Conserved Noncoding Elements in Mammals
Adaptive Evolution of Conserved Noncoding Elements in Mammals Su Yeon Kim 1*, Jonathan K. Pritchard 2* 1 Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America,
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationThe Phylo- HMM approach to problems in comparative genomics, with examples.
The Phylo- HMM approach to problems in comparative genomics, with examples. Keith Bettinger Introduction The theory of evolution explains the diversity of organisms on Earth by positing that earlier species
More informationGene Ontology and Functional Enrichment. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein The parsimony principle: A quick review Find the tree that requires the fewest
More informationLecture Notes: Markov chains
Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity
More informationLecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).
1 Bioinformatics: In-depth PROBABILITY & STATISTICS Spring Semester 2011 University of Zürich and ETH Zürich Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM). Dr. Stefanie Muff
More informationDrosophila melanogaster and D. simulans, two fruit fly species that are nearly
Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different
More informationPhylogenetic Assumptions
Substitution Models and the Phylogenetic Assumptions Vivek Jayaswal Lars S. Jermiin COMMONWEALTH OF AUSTRALIA Copyright htregulation WARNING This material has been reproduced and communicated to you by
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationMolecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço
Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço jcarrico@fm.ul.pt Charles Darwin (1809-1882) Charles Darwin s tree of life in Notebook B, 1837-1838 Ernst Haeckel (1934-1919)
More informationGene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family
Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1
More information7. Tests for selection
Sequence analysis and genomics 7. Tests for selection Dr. Katja Nowick Group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute for Brain Research www. nowicklab.info
More informationComputational Identification of Evolutionarily Conserved Exons
Computational Identification of Evolutionarily Conserved Exons Adam Siepel Center for Biomolecular Science and Engr. University of California Santa Cruz, CA 95064, USA acs@soe.ucsc.edu David Haussler Howard
More informationComparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition
Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationLecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/26
Lecture 27. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 27. Phylogeny methods, part 4 (Models of DNA and
More informationSequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment
Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of
More informationMETHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationMolecular phylogeny How to infer phylogenetic trees using molecular sequences
Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationUsing phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)
Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationMassachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution
Massachusetts Institute of Technology 6.877 Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution 1. Rates of amino acid replacement The initial motivation for the neutral
More informationTaming the Beast Workshop
Workshop David Rasmussen & arsten Magnus June 27, 2016 1 / 31 Outline of sequence evolution: rate matrices Markov chain model Variable rates amongst different sites: +Γ Implementation in BES2 2 / 31 genotype
More informationProcesses of Evolution
15 Processes of Evolution Forces of Evolution Concept 15.4 Selection Can Be Stabilizing, Directional, or Disruptive Natural selection can act on quantitative traits in three ways: Stabilizing selection
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationProbabilistic modeling and molecular phylogeny
Probabilistic modeling and molecular phylogeny Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical University of Denmark (DTU) What is a model? Mathematical
More informationComparative Genomics II
Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods
More informationQ1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate.
OEB 242 Exam Practice Problems Answer Key Q1) Explain how background selection and genetic hitchhiking could explain the positive correlation between genetic diversity and recombination rate. First, recall
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationInferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT
Inferring phylogeny Constructing phylogenetic trees Tõnu Margus Contents What is phylogeny? How/why it is possible to infer it? Representing evolutionary relationships on trees What type questions questions
More informationChromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre
PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008
More informationComparing Genomes! Homologies and Families! Sequence Alignments!
Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationMolecular evolution 2. Please sit in row K or forward
Molecular evolution 2 Please sit in row K or forward RBFD: cat, mouse, parasite Toxoplamsa gondii cyst in a mouse brain http://phenomena.nationalgeographic.com/2013/04/26/mind-bending-parasite-permanently-quells-cat-fear-in-mice/
More informationGene function annotation
Gene function annotation Paul D. Thomas, Ph.D. University of Southern California What is function annotation? The formal answer to the question: what does this gene do? The association between: a description
More informationGENETICS - CLUTCH CH.22 EVOLUTIONARY GENETICS.
!! www.clutchprep.com CONCEPT: OVERVIEW OF EVOLUTION Evolution is a process through which variation in individuals makes it more likely for them to survive and reproduce There are principles to the theory
More informationLecture 17. Comparative genomics I: Genome annotation using evolutionary signatures
6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 17 Comparative genomics I: Genome annotation using evolutionary signatures 1 Module V: Comparative genomics and evolution
More informationA Practical Algorithm for Ancestral Rearrangement Reconstruction
A Practical Algorithm for Ancestral Rearrangement Reconstruction Jakub Kováč, Broňa Brejová, and Tomáš Vinař 2 Department of Computer Science, Faculty of Mathematics, Physics, and Informatics, Comenius
More informationHow Molecules Evolve. Advantages of Molecular Data for Tree Building. Advantages of Molecular Data for Tree Building
How Molecules Evolve Guest Lecture: Principles and Methods of Systematic Biology 11 November 2013 Chris Simon Approaching phylogenetics from the point of view of the data Understanding how sequences evolve
More informationEarly History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationCladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationSubstitution = Mutation followed. by Fixation. Common Ancestor ACGATC 1:A G 2:C A GAGATC 3:G A 6:C T 5:T C 4:A C GAAATT 1:G A
GAGATC 3:G A 6:C T Common Ancestor ACGATC 1:A G 2:C A Substitution = Mutation followed 5:T C by Fixation GAAATT 4:A C 1:G A AAAATT GAAATT GAGCTC ACGACC Chimp Human Gorilla Gibbon AAAATT GAAATT GAGCTC ACGACC
More informationInferring Molecular Phylogeny
Dr. Walter Salzburger he tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 55 Maximum Parsimony (MP): objections long branches I!! B D long branch attraction
More informationLecture 4. Models of DNA and protein change. Likelihood methods
Lecture 4. Models of DNA and protein change. Likelihood methods Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 4. Models of DNA and protein change. Likelihood methods p.1/36
More informationElements of Bioinformatics 14F01 TP5 -Phylogenetic analysis
Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis 10 December 2012 - Corrections - Exercise 1 Non-vertebrate chordates generally possess 2 homologs, vertebrates 3 or more gene copies; a Drosophila
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationLecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) p.1/22
Lecture 24. Phylogeny methods, part 4 (Models of DNA and protein change) Joe Felsenstein Department of Genome Sciences and Department of Biology Lecture 24. Phylogeny methods, part 4 (Models of DNA and
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More informationReading for Lecture 13 Release v10
Reading for Lecture 13 Release v10 Christopher Lee November 15, 2011 Contents 1 Evolutionary Trees i 1.1 Evolution as a Markov Process...................................... ii 1.2 Rooted vs. Unrooted Trees........................................
More informationThe African coelacanth genome provides insights into tetrapod evolution
The African coelacanth genome provides insights into tetrapod evolution bioinformaatika ajakirjaklubi 27.05.2013 Ülesehitus Täisgenoomi sekveneerimisest vankrid mille ette neid andmeid on rakendatud evolutsiooni
More informationGenomes and Their Evolution
Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationThe Causes and Consequences of Variation in. Evolutionary Processes Acting on DNA Sequences
The Causes and Consequences of Variation in Evolutionary Processes Acting on DNA Sequences This dissertation is submitted for the degree of Doctor of Philosophy at the University of Cambridge Lee Nathan
More informationResearch Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.
Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research
More informationMolecular Evolution and Phylogenetic Tree Reconstruction
1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationBayesian Models for Phylogenetic Trees
Bayesian Models for Phylogenetic Trees Clarence Leung* 1 1 McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada ABSTRACT Introduction: Inferring genetic ancestry of different species
More informationPhylogenetic trees 07/10/13
Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationPhylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science
Phylogeny and Evolution Gina Cannarozzi ETH Zurich Institute of Computational Science History Aristotle (384-322 BC) classified animals. He found that dolphins do not belong to the fish but to the mammals.
More informationWhat is Phylogenetics
What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationExploring Evolution & Bioinformatics
Chapter 6 Exploring Evolution & Bioinformatics Jane Goodall The human sequence (red) differs from the chimpanzee sequence (blue) in only one amino acid in a protein chain of 153 residues for myoglobin
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationBrowsing Genomic Information with Ensembl Plants
Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial
More informationWhat Is Conservation?
What Is Conservation? Lee A. Newberg February 22, 2005 A Central Dogma Junk DNA mutates at a background rate, but functional DNA exhibits conservation. Today s Question What is this conservation? Lee A.
More informationComparative Network Analysis
Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationPhylogenetics: Building Phylogenetic Trees
1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should
More informationWhat can sequences tell us?
Bioinformatics What can sequences tell us? AGACCTGAGATAACCGATAC By themselves? Not a heck of a lot...* *Indeed, one of the key results learned from the Human Genome Project is that disease is much more
More information