Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Similar documents
Genomes and Their Evolution

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Homeotic Genes and Body Patterns

Multiple Alignment of Genomic Sequences

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Opinion Multi-species sequence comparison: the next frontier in genome annotation Inna Dubchak* and Kelly Frazer

Computational Structural Bioinformatics

Contact 1 University of California, Davis, 2 Lawrence Berkeley National Laboratory, 3 Stanford University * Corresponding authors

Warm Up. What are some examples of living things? Describe the characteristics of living things

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

3/8/ Complex adaptations. 2. often a novel trait

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Handling Rearrangements in DNA Sequence Alignment

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Chapter 18 Active Reading Guide Genomes and Their Evolution

Quantitative Genetics & Evolutionary Genetics

Computational methods for predicting protein-protein interactions

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Computational approaches for functional genomics

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

BIOINFORMATICS: An Introduction

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Chapter 16: Reconstructing and Using Phylogenies

BIOLOGY 111. CHAPTER 1: An Introduction to the Science of Life

Evidence of Evolution. Lesson Overview. Lesson Overview Evidence of Evolution

Comparative Genomics. Primer. Ross C. Hardison

BIOINFORMATICS LAB AP BIOLOGY

Orthologs Detection and Applications

Homology and Information Gathering and Domain Annotation for Proteins

THE WORLD OF BIOLOGY SECTION 1-1 REVIEW. VOCABULARY REVIEW Define the following terms. MULTIPLE CHOICE Write the correct letter in the blank.

Biol478/ August

18.4 Embryonic development involves cell division, cell differentiation, and morphogenesis

Evolution at the nucleotide level: the problem of multiple whole-genome alignment

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Evaluate evidence provided by data from many scientific disciplines to support biological evolution. [LO 1.9, SP 5.3]

Frazer et al. ago (Aparicio et al. 2002), conserved long-range sequence organization has not been reported for more distantly related species. Figure

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

AP Biology: Chapter 1: Introduction: Evolution and the Foundations of Biology

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Molecular and cellular biology is about studying cell structure and function

Bioinformatics Exercises

Introduction to protein alignments

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

BLAST. Varieties of BLAST

Small RNA in rice genome

Predicting Protein Functions and Domain Interactions from Protein Interactions

Identify the 6 kingdoms into which all life is classified.

Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON M5R 3G4 Canada

2/17/17. B. Four scientists important in development of evolution theory

Theory a well supported testable explanation of phenomenon occurring in the natural world.

Evolutionary Developmental Biology

and just what is science? how about this biology stuff?

Introduction to Bioinformatics Integrated Science, 11/9/05

13.4 Gene Regulation and Expression

Name Class Date. biosphere biology metabolism biodiversity organism DNA. MAIN IDEA: Earth is home to an incredible diversity of life.

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Unit 5: Cell Division and Development Guided Reading Questions (45 pts total)

Example of Function Prediction

UNIT 4: EVOLUTION Chapter 10: Principles of Evolution. I. Early Ideas about Evolution (10.1) A. Early scientists proposed ideas about evolution

You are required to know all terms defined in lecture. EXPLORE THE COURSE WEB SITE 1/6/2010 MENDEL AND MODELS

Biology 1B Evolution Lecture 2 (February 26, 2010) Natural Selection, Phylogenies

Computational Biology: Basics & Interesting Problems

Comparing Genomes! Homologies and Families! Sequence Alignments!

Case study: spider mimicry

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

UNIT 1: INTRODUCING BIOLOGY. Chapter 1: Biology in the 21st Century

R.S. Kittrell Biology Wk 10. Date Skill Plan

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Motivating the need for optimal sequence alignments...

A A A A B B1

Piecing It Together. 1) The envelope contains puzzle pieces for 5 vertebrate embryos in 3 different stages of

Evolution Unit: What is Evolution?

Lecture 7. Development of the Fruit Fly Drosophila

There are 3 parts to this exam. Use your time efficiently and be sure to put your name on the top of each page.

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Course Descriptions Biology

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

THE EVIDENCE FOR EVOLUTION

1.1. KEY CONCEPT Biologists study life in all its forms. 4 Reinforcement Unit 1 Resource Book. Biology in the 21st Century CHAPTER 1

Name: Hour: Teacher: ROZEMA. Inheritance & Mutations Connected to Speciation

Press Release BACTERIA'S KEY INNOVATION HELPS UNDERSTAND EVOLUTION

Lab Anatomical Evidence Of Evolution

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

WHERE DOES THE VARIATION COME FROM IN THE FIRST PLACE?

UNIT 4: EVOLUTION Chapter 10: Principles of Evolution

AP Curriculum Framework with Learning Objectives

Sequence Alignment Techniques and Their Uses

Full file at CHAPTER 2 Genetics

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

SPECIES PARADOX By Colin leslie dean

Outline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16

Comparative genomics: Overview & Tools + MUMmer algorithm

Variation of Traits. genetic variation: the measure of the differences among individuals within a population

Transcription:

Orthologue Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Homolog A gene related to a second gene by descent from a common ancestral DNA sequence. The term, homolog, may apply to the relationship between genes separated by the event of speciation (see ortholog) or to the relationship betwen genes separated by the event of genetic duplication (see paralog). Paralog Paralogs are genes related by duplication within a genome. Orthologs retain the same function in the course of evolution, whereas paralogs evolve new functions, even if these are related to the original one. The functions of human genes and other DNA regions often are revealed by studying their parallels in nonhumans. To enable such comparisons, HGP researchers have obtained complete genomic sequences for the bacterium Escherichia coli, the yeast Saccharomyces cerevisiae, the roundworm Caenorhabditis elegans, the fruitfly Drosophila melanogaster, the laboratory mouse, and many other organisms. The availability of complete genome sequences generated both inside and outside the HGP is driving a major breakthrough in fundamental biology as scientists compare entire genomes to gain new insights into evolutionary, biochemical, genetic, metabolic, and physiological pathways. HGP planners stress the need for a sustainable sequencing capacity to facilitate future comparisons. What is Comparative Genomics Comparative genomics is the analysis and comparison of genomes from different species. The purpose is to gain a better understanding of how species have evolved and to determine the function of genes and noncoding regions of the genome. Researchers have learned a great deal about the function of human genes by examining their counterparts in simpler model organisms such as the mouse. Genome researchers look at many different features when comparing genomes: sequence similarity, gene location, the length and number of coding regions (called exons) within genes, the amount of noncoding DNA in each genome, and highly conserved regions maintained in organisms as simple as bacteria and as complex as humans. What is Comparative Genomics Comparative genomics involves the use of computer programs that can line up multiple genomes and look for regions of similarity among them. Some of these sequencesimilarity tools are accessible to the public over the Internet. One of the most widely used is BLAST, which is available from the National Center for Biotechnology Information. 1

Goals Complete the sequence of the roundworm C. elegans genome by 1998. Complete the sequence of the fruitfly Drosophila genome by 2002. Develop an integrated physical and genetic map for the mouse, generate additional mouse cdna resources, and complete the sequence of the mouse genome by 2008. Identify other useful model organisms and support appropriate genomic studies. The complete DNA sequence of the Human Genome is a remarkable achievement for molecular biology and represents the work of many people in a number of large sequencing centers. Far from resting on their laurels, those centers have gone on to sequence the genomes of the mouse, rat, pufferfish, zebrafish, chicken, chimpanzee... you name it they're sequencing it. Why this drive to sequence every animal in the zoo? Do we really care about the genetics of pufferfish? In isolation, not so much, but comparisons with the other genomes yield tremendous insights into the genes that are essential for life and those that define the species. They reveal the mechanisms of evolution and the hidden mechanisms of gene regulation. Geographic maps are a useful analogy for how we study genomes. If you were given a detailed map of London, you could learn a lot about what defines a large cosmopolitan city. You would see a large number of apartments, shops, and restaurants and might reasonably conclude that these are essential for life in the city. But you could not assess the relative importance of unique features like Buckingham Palace or the Brick Lane street market. Things would be clearer if you were also given a detailed map of Paris. That too has apartments, shops, and restaurants, confirming your earlier hypothesis. It also has street markets, so perhaps those are an important, albeit secondary, aspect of city life. In contrast, Paris has no "active" royal palaces. Why not? One interpretation might be that Buckingham Palace is an important feature that distinguishes London from other cities. Another might be that a royal family has no function whatsoever in a modern society and survives in London merely as an evolutionary remnant. Comparing the sequence to a second genome can answer many of these questions. We can compare one with the other, locate conserved sequence segments and assess their significance. The more genomes we have, the more confident we can become of our assignments and the higher the "resolution" at which we can examine the subtleties. 2

Synteny Evolution never makes things simple for biologists. We can't just line up the mouse and human genomes starting at one end of a chromosome and expect to find matching regions one after another. On the time scale of evolution, the process of recombination -- the genetic equivalent of cutand-paste -- is continually at work rearranging the genome. Large blocks of genes are moved around within, and between, genomes. The Software: Genome Browsers To explore comparative genomics we will use the VISTA Genome Browser from Ed Rubin's group at Lawrence Berkeley National Laboratory (LBNL) in Berkeley, Calif. LAGAN and Multi-LAGAN Glocal alignment http://www.tigr.org/software/ References [1] Couronne, O., Poliakov, A., Bray, N., Ishkhanov, T., Ryaboy, D., Rubin, E., Pachter, L., Dubchak, I. 2002. Strategies and Tools for Whole-Genome Alignments. Genome Res. 2003 Jan;13(1):73-80 [2] Brudno M, Do CB, Cooper GM, Kim MF, Davydov E, Green ED, Sidow A, Batzoglou S. LAGAN and Multi-LAGAN: efficient tools for largescale multiple alignment of genomic DNA. Genome Research 2003 Apr;13(4):721-31 [3] Michael Brudno, Sanket Malde, Alexander Poliakov, Chuong Do, Olivier Courone, Inna Dubchak, and Serafim Batzoglou Glocal alignment: finding rearrangements during alignment. Special Issue on the Proceedings of the ISMB 2003, Bioinformatics 19: 54i-62i, 2003. Comparative Genome Databases http://www.hgmp.mrc.ac.uk/genomeweb/ comp-gen-db.html Like other scientific discoveries, genes are named by the person who discovers them. Technically, a scientist can name a gene anything he or she wants. Some scientists choose names based on the disorder thought to be associated with changes in the gene. For example, changes in the CFTR gene cause cystic fibrosis. 3

Some genes are named with abbreviations: After reading about WNT2, RELN, HOXA1, OXTR, and others, you may wonder how genes are named. As you may have guessed, these names are abbreviations for the full gene names. Abbreviated gene names are especially useful for genes with long names. WNT2 was abbreviated from "winglesstype MMTV integration site family member 2"). While "wingless" seems an unnecessary adjective (of course humans don't have wings!), some genes are named after similar genes in other organisms, such as fruit flies. Some scientists choose names based on the disorder thought to be associated with changes in the gene. For example, changes in the CFTR gene cause cystic fibrosis CFTR: Cystic fibrosis transmembrane regulator; Sometimes the gene name is actually a variation of the name of the protein the gene makes. For example, the RELN gene contains instructions for making the 'reelin' protein. The 'reelin' protein was named for the "reeling" walking motion of mice that have changes in their own version of the RELN gene! Other genes are named based on their functions. For example, HOX genes (short for homeobox) are a whole group of genes involved in development. Individual HOX genes are named with additional letters and numbers, such as HOXA1 or HOXD9. However, because of the consistent naming system, we know that all HOX genes play a specific type of role in development. There are even playful gene names, such as the SHH gene, which is involved in the development of the brain, spinal cord, and limbs. The SHH gene is named after Sonic the Hedgehog! 4

International BCB-Workshop on Gene Annotation Analysis and Alternative Splicing http://www.medizin.fuberlin.de/molbiochem/bioinf/konferenz _04/Start.html Candidate gene A candidate gene is a gene that researchers think may be related to a particular disease or condition. Researchers find candidate genes in a variety of different ways, but candidate genes in general may be divided into two categories: positional or functional. Positional candidate genes A positional candidate gene is one that researchers think may be associated with a disorder based on the gene's location on a chromosome. Functional candidate genes Researchers sometimes look at candidate genes that make products that may have something in common medically or biologically with the disorder that they are studying. Technically, scientist often identify functional candidate genes by correlated expression of certain genes and the traits under study. Methods for comparative mapping 1. Linkage mapping of many known genes and then compare with the genes within each linkage group for similar genes and gene orders. Methods for comparative mapping 2. Physical analysis of a large segment of DNA containing known genes in human or other species, and compare if genes, gene order, and orientation of the genes are the same. 5

Methods for comparative mapping 3. Find a group of genes in human located in the same region, conduct in silico Southern blot analysis to determine if the same genes are organized in the similar regions in, e.g., pufferfish or zebrafish. 6