Comparative Bioinformatics Midterm II Fall 2004

Similar documents
08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

8/23/2014. Phylogeny and the Tree of Life

C3020 Molecular Evolution. Exercises #3: Phylogenetics

BLAST. Varieties of BLAST

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Comparative genomics: Overview & Tools + MUMmer algorithm

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

15.2 Prokaryotic Transcription *

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types


2 Genome evolution: gene fusion versus gene fission

Using Bioinformatics to Study Evolutionary Relationships Instructions

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Comparative Genomics II

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Basic Local Alignment Search Tool

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Dr. Amira A. AL-Hosary

Phylogenetic Tree Reconstruction

Bacillus anthracis. Last Lecture: 1. Introduction 2. History 3. Koch s Postulates. 1. Prokaryote vs. Eukaryote 2. Classifying prokaryotes

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

MiGA: The Microbial Genome Atlas

Bioinformatics Chapter 1. Introduction

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

What is Phylogenetics

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Computational methods for predicting protein-protein interactions

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Motivating the need for optimal sequence alignments...

Name: Class: Date: ID: A

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Phylogeny: building the tree of life

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bioinformatics and BLAST

Cladistics and Bioinformatics Questions 2013

Comparing Genomes! Homologies and Families! Sequence Alignments!

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Chapter 20. Initiation of transcription. Eukaryotic transcription initiation

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

Bioinformatics. Dept. of Computational Biology & Bioinformatics

1 ATGGGTCTC 2 ATGAGTCTC

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Molecular Phylogenetics (part 1 of 2) Computational Biology Course João André Carriço

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Computational Biology: Basics & Interesting Problems

SUPPLEMENTARY INFORMATION

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.

Sequence Alignment Techniques and Their Uses

Algorithms in Bioinformatics

Designer Genes C Test

Consensus Methods. * You are only responsible for the first two

Phylogenetic analyses. Kirsi Kostamo

Chapter 19: Taxonomy, Systematics, and Phylogeny

molecular evolution and phylogenetics

Pairwise & Multiple sequence alignments

Bioinformatics Exercises

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

DNA Phylogeny. Signals and Systems in Biology Kushal EE, IIT Delhi

Chapter 19 Organizing Information About Species: Taxonomy and Cladistics

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

AP Biology Notes Outline Enduring Understanding 1.B. Big Idea 1: The process of evolution drives the diversity and unity of life.

Supplemental Materials

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

BINF6201/8201. Molecular phylogenetic methods

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Comparative Network Analysis

Supplementary Information

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Phylogeny and the Tree of Life

Computational approaches for functional genomics

Genomics and bioinformatics summary. Finding genes -- computer searches

Controlling Gene Expression

Phylogeny and the Tree of Life

Constructing Evolutionary/Phylogenetic Trees

EVOLUTIONARY DISTANCES

Transcription:

Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans has been isolated from: a. Radiation "sterilized" canned meat. b. The surface of Mars. c. The inside of an autoclave. d. All of the above. 2. NP-Hard (nonparametric polynomial) problems are a class of problems for which a. exact solutions become very difficult as the size of the problem increases. b. it is possible to calculate a goodness-of-fit score for a candidate solution. c. heuristic algorithms are widely used to find approximate solutions. d. all of the above. 3. Clusters of Orthologous Groups are identified by a. reciprocal best hits from "all-against-all" BLAST analyses of genomes. b. phylogenetic analysis of sequences excluded by the "triangle criterion." c. Hierarchical clustering and "synteny-mapping" of chromosomes. d. none of the above. 4. In a Markov process a. The system is defined by a set of states, the allowed transitions that can occur among these states. b. Probabilities are associated either with the states, or with the allowed transitions. c. The states into which the system can change are dependent only on the state the system is in. d. All of the above. 5. Paralogs are homologous copies of a gene that are related by a. phylogenetic descent. b. gene duplication. c. at least 85% identity. d. none of the above. 1/7

Objective Answer, part II: The groups Archaea, Bacteria, and Eukarya, as well as combinations of these, are listed below. a. Transcription start site with TATA motif (TATA box) b. Formylmethionine as first amino acid in new protein c. Circular chromosomes d. Protein-coding genes organized into operons e. Abundant introns f. DNA bound to histones or histone-like proteins g. Several distinct RNA polymerases that require transcription factors h. Nucleus i. Ether-linked phospholipids For each of the following sets of organisms, circle those of the characteristics shown above that apply to all (or many) of the organisms in that set. (10 points) 6. Archaea 7. Bacteria 8. Eukarya 9. Archaea and Bacteria 10. Archaea and Eukarya 2/7

Short Answer. (5 pts each) 11. In principle, one could perform multiple sequence alignment by extending the Smith- Waterman algorithm to create an n-dimensional matrix, where n corresponds to the number of sequences to be aligned. In fact such an algorithm has been implemented, and provides an exact solution to multiple sequence alignment. Why is this software tool unlikely to find widespread use? 5 points. 12. Describe the cluster approach to multiple sequence alignment (e.g., as implemented by clustalw or pileup). 5 points. 13. What is a consensus sequence? Provide an example. 5 points. 3/7

For each of the scenarios described below, (1) indicate what type of analysis you would recommend as the next step, and (2) explain why you recommend this approach (5 points each). 14. Kevin has determined 20,000 EST sequences from five different wombat tissues. He would like to make tentative inferences of homology for each of these sequences. 15. Julia has completed the genome sequence of five different species of nematode, has assembled each of the genomes, and has identified a large number of open reading frames. She has used BLAST analysis to search each ORF against all of the genomes. 16. Arya has determined the gene encoding a highly variable surface antigenic protein from 75 isolates of Bacillus anthracis, and would like to understand how these strains are related to each other. 4/7

17. We have obtained the following DNA sequence, and would like to determine whether or not it is derived from the Plasmodium genome. We can represent the sequence with a Markov model showing the frequencies with which individual nucleotides follow each other. (25 points) QUERY: AACCTGGTTG ATCCTGCCAG TAGTCATATG CTTGTCTCAA AGATTAAGCC a A e f C b c g h m o n p i k G l T j d a) What dinucleotide pair does transformation "j" represent? b) How often is this dinucleotide pair observed in the sequence above? c) What are the observed frequencies of each transformation? a: b: c: d: e: f: g: h: i: j: k: l: m: n: o: p: d) The Plasmodium genome is 80% A+T. From this we can estimate some relationships among expected transformation frequencies in Plasmodium. Provide a numerical estimate for each of the following values: a+e+h+o = d+l+i+n =? c+g+k+m = b+f+j+p =? e) What assumptions did you use to make your inference in (d)? f) [optional] Is the sequence likely to have been derived from Plasmodium? Why? 5/7

18. Assume that you have sequence data from four animals, Albatross, Beluga Whale, Cat, and Dugong (10 points). a. Draw the three possible unrooted trees that can interrelate these taxa. b. Draw the three corresponding ROOTED trees that assume that the Albatross is the outgroup. 6/7

You may wish to refer to the following algorithm (described by Swofford and Maddison, 1987), which can be used to determine the length of a tree under the parsimony criterion: 1.Assign a state to each terminal node (2). Visit first internal node A. is the intersection of states non-empty? I. Yes: set internal state to this. II. Else: a. set the state to the smallest set containing the states of the daughter nodes b. increase the tree length by 1. 3. Are you at the root of the tree? A. No: go to 2. B. Yes: go to 4. (4). Is the state at this node the same as the outgroup state? A. Yes: Proceed to the next character B. Else: Add one to the length of the tree; proceed to next character 123456789 Albatross GACCGTATA Beluga GTTCCTCTC Cat GTTCCCTTC Dugong GTCCGCTTA 19. Use the Swofford and Maddison algorithm (above) to (a) determine which of the three rooted trees is most parsimonious, and (b) show the node assignments of character states for characters 5 and 6 on the most parsimonious tree (10 points). 7/7