Comparative genomics. Lucy Skrabanek ICB, WMC 6 May 2008
|
|
- Bathsheba Brooks
- 5 years ago
- Views:
Transcription
1 Comparative genomics Lucy Skrabanek ICB, WMC 6 May 2008
2 What does it encompass? Genome conservation transfer knowledge gained from model organisms to non-model organisms Genome evolution understand how genomes change over time in order to identify evolutionary processes and constraints Genome variation understand how genomes vary within a species to identify genes central to particular processes
3 Main uses Whole genome comparisons Genome evolution Coding regions comparisons Gene prediction Gene structure (exon-intron) prediction Function prediction Non-coding region comparisons Regulatory region discovery Protein-protein interaction prediction
4 Metazoan evolutionary tree Ureta-Vidal et al, NRG 2003
5 Rearrangement rate Can estimate the number of rearrangements from cytological comparisons Two very different rates: Very slow rate of rearrangement (1 or 2 exchanges per 10 MYR) ~ 7 rearrangements between the human genome from the hypothetical primate ancestor 13 rearrangements between cat and human
6 Rearrangement rate Punctuated by abrupt global genome rearrangement in some lineages Gibbons and siamangs rearranged 3 or 4 times more extensively than human or other great apes Rodent species exhibit very rapid patterns of chromosome change ~180 conserved segments between mouse and human ~100 conserved segments between rat and human
7 Human, cat and mouse X chromosome comparison O Brien et al, Science 1999
8 Genome comparison across species gross changes in chromosome number O Brien et al, Science 1999
9 Genome alignment Very different from single gene or protein alignments Standard DPAs are too expensive Made complicated by extensive rearrangements of large homologous segments
10 Problems Looking for syntenic regions Rearrangements disrupting syntenic regions Insertions Deletions Inversions Translocations Duplications
11 Assumptions The two genomes to be aligned derived from a common ancestor There remains sufficient similarity between the genomes to enable easy identification of homologous regions For the alignment to be informative, there has to have been time for the genomes to diverge and for selection to have occurred
12 Algorithm requirements Genome alignment algorithms must Scale linearly (computationally) Be robust (not too many parameters) Be memory-efficient Be able to handle rearrangements, gene duplications, repetitive elements Smith-Waterman, Needleman Wunsch Time to do calculation of the order of (O) 4 Not feasible for sequence length > 10,000 bp Cannot handle rearrangements or inversions
13 Alignment methods Seeding methods (e.g., BLASTZ, BLAT, EXONERATE) Produce local alignments All matches found (including all paralogs) Very sensitive, not very specific Very fast Anchor-based methods (e.g., MUMmer, GLASS, AVID) Produce global alignments Specific Difficulty with rearrangements Ureta-Vidal et al, Nat Rev Genet April 2003
14 Local aligner: BLASTZ As in BLAST, find seeds Seeds are determined by a 12of19 weightedspaced seeds strategy where the positions for strict matches are specified combination shown to be the most sensitive (and more sensitive than the 11-consecutive match seed strategy used by BLAST) Also allows one transition in 1/12 strict match positions Seeds with many matches masked out (assumed to be repetitive regions)
15 BLASTZ Gap-free extension Further extend seeds by DPA, allowing gaps Low complexity regions have their scores specifically down-weighted Repeat above steps (using a more sensitive seed, e.g., 7-mer) for regions that lie between matches, that share the same order and orientation, and are separated by <50 kb Post-processing to remove multiple hits to the same region When aligning human and mouse genomes, can achieve 98% alignment of known coding regions
16 Local aligner: BLAT Two modes: untranslated and translated Untranslated mode performs poorly when conservation < 90% so translated mode usually used when aligning genomes Will more efficiently identify regions that are conserved for their coding ability rather than for the regulatory functions Seeds created by building an index of nonoverlapping 5-mers from one genome Frequent 5-mers and ambiguous sequences (repeats and low-complexity regions) excluded The other genome is chopped into <200kb chunks Comparisons are made between the indexed 5-mers and the chunks DPA applied both upstream and downstream of matches
17 Global aligner: AVID Assumes that no gene duplications, inversions or translocations have occurred Need a pre-processing step to identify syntenic regions - use BLAT hits, filtered for specificity Find maximally repeated matches in syntenic regions Matches are flanked on either side by mismatches Determine an anchor set Exclude all maximally repeated matches that are less than half the length of the longest match Allow only non-overlapping non-crossing matches Use clean matches first, then repeat matches
18 Bray et al, Genome Res 2003 Blue + Red: all maximally repeated matches Red: anchor set n anchors n+1 regions to be aligned Add any matches that were discarded because they were too short on the first round Add any new anchors When there are no more anchors to be added, use the NW algorithm to align the intervening sequence If the regions are longer than ~ 4kb, the alignment is returned with a gap in both sequences at that position
19 Global aligner: LAGAN Different matching algorithm to AVID Generates local alignments Allows mismatches Find highest weight chain Compute NW alignment, limiting it to area around chains MLAGAN - multiple global alignment tool
20 Global aligner: WABA Takes into account degeneracy of genetic code Seeding step similar to BLAT Uses the two weighted-spaced seed 6of , where the third position is allowed to mismatch In the alignment of homologous regions, uses a seven-state pair HMM which also allows mismatches in the third position Finally, join overlapping matches from the alignment phase
21 Visualization tools: VISTA vs. PipMaker VISTA (VISualization Tools for Alignments) Uses AVID to generate alignments Uses a sliding window approach Plots percent identity within a fixed window size, at regular intervals PipMaker (Percent Identity Plot) Uses BLASTZ X-axis is the reference sequence; horizontal lines represent gap-free alignments
22 Once we have a macro-alignment, we can study the evolution of genomes between species, and also can trace the evolutionary history of the structure of each genome itself Genome structure rearrangement Gene duplication, chromosomal duplication and polyploidization (whole genome duplication) New genes
23 Unravelling the history of the genome Plants Wheat Allohexaploid (AABBCC) Maize Diploidized allotetraploid Has grown 12x in the past 5 MY due to increased numbers of transposable elements Arabidopsis Haploid, 5N Yeast Saccharomyces cerevisiae vs. Kluyveromyces lactis Human Genome duplication?
24 Decipher history of lineages Genome size changes Compaction Large scale deletions - Fugu Intron loss Expansion Baxendale et al, Nature Genet, 1995 Duplication Transposable element insertions Transposons - Alus, LINEs Ancient insertions prior to eutherian radiation More recent insertions - maize
25 Why gen(om)e duplication? Duplicated genes provide a source for genetic novelty during evolution Either member of a duplicated gene pair can diverge to either Acquire a new function which may be positively selected for Subfunctionalize Be differently regulated (e.g., tissue-specific) Become a pseudogene Be deleted Whole genome duplication allows for the duplication (and subsequent divergence) of whole pathways at a time
26 Duplication events Resolution of polyploidy Non-disjunction of chromosomes (form multivalents instead of divalents) Sterility Duplicated genes do not start to diverge (or get deleted) until disomic inheritance resolved
27 Rearrangements Are genome rearrangements the cause or consequence of diploidization? Most widely accepted hypothesis is that diploidization proceeds by structural divergence of chromosomes Some loci appear disomic, others tetrasomic Stage 1: pairing between similar chromosomes allowed Loci near the centromeres can display residual tetrasomy Stage 2: non-homologous chromosome pairing resolved
28 Just how hard can it be to tell what happened? Take four, or maybe eight, decks of 52 playing cards. Shuffle them all together and then throw some cards away. Pick 20 cards at random and drop the rest on the floor. Give the 20 cards to some evolutionary biologists and ask them to figure out what you ve done. Skrabanek and Wolfe, Curr Opin Genet Dev, 1998 Now that the human genome has been sequenced, things aren t quite so bleak.
29 Effects of polyploidization Wolfe, Nature Rev Genet 2001
30 Effects of polyploidization Wolfe, Nature Rev Genet 2001
31 Saccharomyces cerevisiae (baker's yeast)
32 Whole Genome Duplication (WGD) WGD in S. cerevisiae: Wolfe & Shields, Nature 387:708 (1997)
33 Yeast Saccharomyces cerevisiae Degenerate tetraploid Polyploidy followed by extensive deletion and (70-100) reciprocal translocations 8% of genes duplicated in 55 blocks (plus many missed smaller blocks) Relative orientation of genes in blocks conserved with respect to the centromere Seoighe and Wolfe, PNAS, 1998
34 Duplicated blocks in yeast Wolfe and Shields, Nature 1997
35 Estimation of time of polyploidy event Diverged from Kluyveromyces lactis (unduplicated) ~150 MYA Comparison of gene sequences and gene order reveals conservation 59% of adjacent gene pairs in K. lactis or K. marxianus are also adjacent in S. cerevisiae 16% of Kluveromyces neighbors can be explained in terms of inferred ancestral gene order Phylogenetic analyses of duplicated genes where both the Kluveromyces orthologue and an outgroup orthologue were available, deduced that the polyploidization event in S. cerevisiae occurred around 100 MYA
36 5,000 genes 10,000 genes 10% retention 5,500 genes Different subsets retained 5,000 genes Evidence from conserved order of a very few genes Evidence from interleaving genes from sister segments Kellis et al Nature 428: (2004)
37 Each region of K.waltii matches two regions of S.cerevisiae We don t even need any remaining twocopy genes to infer the ancestral order
38 Human Susumu Ohno proposed that vertebrate genomes had originated via genome duplication 2R (two Rounds [of genome duplication]) hypothesis One before, and one after divergence of agnathans (lamprey) from tetrapods (~430 MYA and ~500 MYA) Popular and controversial Split between the map-based people and the treebased people Duplicated regions are evident (covering 44% of the genome), but it is difficult to tell whether it is due to (a) genome duplication(s) or chromosomal duplications
39 Susumu Ohno Book: Evolution by Gene Duplication (1970) Whole-genome duplication (polyploidy)
40 Timing of tetraploidy events Skrabanek, PhD thesis, 1999
41 Evidence? There are regions in the genome which are quadruplicated HSA 1, 6, 9 and 19 (MHC region, 10 gene families) HSA 4, 5, 8 and 10 HSA 2, 7, 12 and 17 (Hox clusters) Expect to see (A,B)(C,D) tree, where the time of divergence of A from B and C from D is approximately the same However, this is not consistently the case
42 Proposed evolution of the Hox clusters Skrabanek, PhD thesis, 1999
43 Possible routes to 4- gene families Hokamp et al, J Struct Func Genomics 2003
44 Estimates of divergence times for genes in the MHC region on chromosomes 1, 6, 9 and 19 Hughes, Mol Biol Evol 1998
45 Hughes et al, Genome Res, 2001 Divergence times of genes with members on at least two of the Hox cluster bearing chromosomes (2, 7, 12, 17)
46 Phylogenetic relationships of four- and threemembered gene families on Hox cluster bearing chromosomes Hughes et al, Genome Res, 2001
47 Conclusions Duplicated regions seen in human genome are most likely vertebrate specific Significant amount of duplication occurring ~ MYA Possible to explain large margin by alloploidy? Probable support for at least one whole genome duplication event Once more genomes are available, such as Ciona intestinalis or amphioxus, it may be easier to decipher the history of duplication However, long time spans under consideration, and diploidization requires extensive genomic changes
48 PLOS Biology 3:e314 (2005)
49 Fish-specific genome duplication event Panopoulou and Poustka, TIG 21:559, 2005
50 Lander et al - Intl. Human Genome Sequencing Consortium paper Nature 409:860 (2001)
51 Hypothetical genomic region 1R decay 2R decay
52 Finding all homologs, and only homologs Find most similar vertebrate gene (here M1) to the Ciona gene. Other vertebrate genes are added to the cluster if they are more similar to M1 than M1 is to the Ciona gene. > S1 S1 < S1 Fugu-specific duplication Duplicates may have arisen by speciation (lineage-splitting) or by gene duplication events specific to one or more vertebrates Human-specific duplication
53
54 Number of gene duplications 46.6% of ancestral chordate genes are duplicated in 1 lineage 34.5% with at least one duplication before Fugu-tetrapod split 23.5% with at least one duplication after Fugu-tetrapod split No evidence of 2R hypothesis from gene family membership alone, nor from phylogenetics (since duplications abundant on every branch)
55 Paralogs generated by a gene duplication before the Fugu-tetrapod split count as matches. N-fold redundancy calculated by identifying all cases where 2 different duplicates are within a 100-gene window, and then counting the number of times that their paralogs occur within a 100-gene window elsewhere in the genome. 2R
56 4-fold redundancy most common - accounts for 25% of the genome
57 1,912 genes duplicated prior to fish-tetrapod split 2,953 paralogous gene pairs 32.4% are found in 386 detectable paralogous segments, comprising 772 individual genomic segments 454 are tetra-paralogons, where overlapping sets fall into 4-fold groups Of the genes that duplicated after the fish-tetrapod split, only 11% are found in paralogous regions (i.e., duplications after the split did not include large segments of the genome)
58 Could be of recent origin, or could have undergone multiple rearrangement events that destroyed the tetra-paralogons signal
59 Old duplications Recent duplications Hox-bearing chromosomes: 50% of genes duplicated after the fish-tetrapod split are tandem duplicates (separated by < 10 genes), whereas only 6% of genes duplicated before the split are tandem duplicates.
60 Conclusions 2R hypothesis most likely scenario Two rounds of closely spaced autotetraploidization events Some paralogous pairs within tetraparalogons extend over longer regions than others So octaploidy unlikely, since pairs of regions would have had to lose the same sets of genes Phylogenetic trees are not consistently nested So allotetraploidy or two distantly spaced autotetraploidy events unlikely Tree topologies within paralogous blocks not always congruent Gene loss and rediploidization processes probably spanned the two duplication events
61 Identification of functional elements Coding sequences Relatively easy to identify Many gene prediction programs available General gene structure known, e.g., TATA boxes Splice donor-acceptor sites ESTs and cdnas available to aid gene prediction Non-coding functional sequences Much harder to identify No common structure of regulatory regions TF binding sites are short and ubiquitous Comparative genomics A genomic sequence that provides a function that is under selection and tends to be conserved between species is called a functional sequence (e.g., a protein-coding region or a transcription factor binding site)
62 Coding region comparison Discover new genes Annotate gene structure Exon-intron structure Compare gene content Find genes common to sets of organisms Find genes unique to an organism We can study how genomes vary within a species to identify genes central to particular processes Can also compare subspecies e.g., E. coli K12 and O157:H7 (pathogenic) Discover gene function Can find missing genes in metabolic pathways
63 Nature 420:520, 2002
64 Predicting structure of new genes Many gene finding programs Look for start sites, termination sites, splice sites Analyze codon usage, exon length Comparative method A Find syntenic regions Predict genes using conventional gene finding techniques in both species Genes predicted in both species are probable
65 Gene prediction Comparative method B Find syntenic regions Perform pattern filtering Coding exons tend to be well conserved Conservation higher in first and second positions of codons Advantage: can deal with sequencing errors
66 Identification of paralogues and orthologues in other species Best Reciprocal Hit (BRH) Mouse Human Reciprocal Hit by Synteny (RHS) Identification of adjacent orthologues Domain checking - internal quality control
67 Identification of orthologues BRH RHS Human Others Orphan Mouse Matches to some other chromosome
68 Genes in mouse shared with other organisms Nature 420:520, 2002
69 Non-coding region comparison Regulatory region discovery No accurate methods to identify regulatory sequences based on scanning the DNA sequence alone Putative TF binding sites found everywhere Low specificity Assume that regulatory regions are conserved because they serve a vital function
70 Rationale DNA sequences encoding and regulating the expression of essential proteins and RNAs will be conserved Consequently, the regulatory profiles of genes involved in similar processes among related species will be conserved Conversely, sequences that encode or control the expression of proteins or RNAs responsible for differences between species will be divergent
71 What species? How distant should the compared species be? Transgenic mice tend to express the transgene in a similar manner to the source mammalian organism, including those for which the ortholog does not exist in mice! Rapidly evolving regulatory regions (e.g., β globin) allow comparisons between closely related species However, some regulatory regions evolve very slowly (e.g., T-cell receptors), so comparisons with more distant species are required
72 Finding regulatory regions Find conserved regions between species Can also find conserved regions between similarly expressed genes Search for TF binding sites in these conserved regions Pennacchio and Rubin, Nature Rev Genet 2001
73 Human vs. mouse Similar gene content and linear organization ~340 syntenic blocks (~150 more than discovered using cytological studies) spanning 90% of the genome Difference in genome size Mouse genome is 14% smaller, probably due to a higher rate of deletion Sequence Conservation ~40% in alignments ~5% under selection ~1.5% protein coding ~3.5% non-coding: untranslated regions, regulatory elements, non-protein-coding genes, and chromosomal structural elements Nature 420:520, 2002
74 Human vs. mouse genomes Nature 420:520, 2002
Whole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationComparative / Evolutionary Genomics
Canestro et al 2003 Genome Biology Comparative / Evolutionary Genomics What processes have shaped metazoan genomes? What genes are responsible for anatomical & physiological differences among metazoan
More informationEvolution by duplication
6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Lecture 18 Nov 10, 2005 Evolution by duplication Somewhere, something went wrong Challenges in Computational Biology 4 Genome Assembly
More informationComputational analyses of ancient polyploidy
Computational analyses of ancient polyploidy Kevin P. Byrne 1 and Guillaume Blanc 2* 1 Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland. 2 Laboratoire
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationAlignment Strategies for Large Scale Genome Alignments
Alignment Strategies for Large Scale Genome Alignments CSHL Computational Genomics 9 November 2003 Algorithms for Biological Sequence Comparison algorithm value scoring gap time calculated matrix penalty
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationPhylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata.
Supplementary Note S2 Phylogenetic relationship among S. castellii, S. cerevisiae and C. glabrata. Phylogenetic trees reconstructed by a variety of methods from either single-copy orthologous loci (Class
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationMultiple Whole Genome Alignment
Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by
More informationHandling Rearrangements in DNA Sequence Alignment
Handling Rearrangements in DNA Sequence Alignment Maneesh Bhand 12/5/10 1 Introduction Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome
More informationMultiple Alignment of Genomic Sequences
Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part
More informationComparing Genomes! Homologies and Families! Sequence Alignments!
Comparing Genomes! Homologies and Families! Sequence Alignments! Allows us to achieve a greater understanding of vertebrate evolution! Tells us what is common and what is unique between different species
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationGenomes and Their Evolution
Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More informationApplications of genome alignment
Applications of genome alignment Comparing different genome assemblies Locating genome duplications and conserved segments Gene finding through comparative genomics Analyzing pathogenic bacteria against
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationDivergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law
Divergence Pattern of Duplicate Genes in Protein-Protein Interactions Follows the Power Law Ze Zhang,* Z. W. Luo,* Hirohisa Kishino,à and Mike J. Kearsey *School of Biosciences, University of Birmingham,
More informationMotivating the need for optimal sequence alignments...
1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use
More informationSequence Database Search Techniques I: Blast and PatternHunter tools
Sequence Database Search Techniques I: Blast and PatternHunter tools Zhang Louxin National University of Singapore Outline. Database search 2. BLAST (and filtration technique) 3. PatternHunter (empowered
More informationHow to detect paleoploidy?
Genome duplications (polyploidy) / ancient genome duplications (paleopolyploidy) How to detect paleoploidy? e.g. a diploid cell undergoes failed meiosis, producing diploid gametes, which selffertilize
More informationComparative Genomics. Chapter for Human Genetics - Principles and Approaches - 4 th Edition
Chapter for Human Genetics - Principles and Approaches - 4 th Edition Editors: Friedrich Vogel, Arno Motulsky, Stylianos Antonarakis, and Michael Speicher Comparative Genomics Ross C. Hardison Affiliations:
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationComparative Bioinformatics Midterm II Fall 2004
Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationMolecular Evolution & the Origin of Variation
Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants
More informationMolecular Evolution & the Origin of Variation
Molecular Evolution & the Origin of Variation What Is Molecular Evolution? Molecular evolution differs from phenotypic evolution in that mutations and genetic drift are much more important determinants
More informationComputational methods for predicting protein-protein interactions
Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational
More informationBiology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall
Biology Biology 1 of 26 Fruit fly chromosome 12-5 Gene Regulation Mouse chromosomes Fruit fly embryo Mouse embryo Adult fruit fly Adult mouse 2 of 26 Gene Regulation: An Example Gene Regulation: An Example
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationOrthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona
Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationBioinformatics and BLAST
Bioinformatics and BLAST Overview Recap of last time Similarity discussion Algorithms: Needleman-Wunsch Smith-Waterman BLAST Implementation issues and current research Recap from Last Time Genome consists
More informationPractical considerations of working with sequencing data
Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationOutline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/8/16
Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection
More informationChromosomal rearrangements in mammalian genomes : characterising the breakpoints. Claire Lemaitre
PhD defense Chromosomal rearrangements in mammalian genomes : characterising the breakpoints Claire Lemaitre Laboratoire de Biométrie et Biologie Évolutive Université Claude Bernard Lyon 1 6 novembre 2008
More informationPairwise & Multiple sequence alignments
Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived
More informationUoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)
- Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the
More informationBasic Local Alignment Search Tool
Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses
More information1 ATGGGTCTC 2 ATGAGTCTC
We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that
More informationThe Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11
The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding
More informationTE content correlates positively with genome size
TE content correlates positively with genome size Mb 3000 Genomic DNA 2500 2000 1500 1000 TE DNA Protein-coding DNA 500 0 Feschotte & Pritham 2006 Transposable elements. Variation in gene numbers cannot
More informationOutline. Genome Evolution. Genome. Genome Architecture. Constraints on Genome Evolution. New Evolutionary Synthesis 11/1/18
Genome Evolution Outline 1. What: Patterns of Genome Evolution Carol Eunmi Lee Evolution 410 University of Wisconsin 2. Why? Evolution of Genome Complexity and the interaction between Natural Selection
More informationWhole Genome Alignment. Adam Phillippy University of Maryland, Fall 2012
Whole Genome Alignment Adam Phillippy University of Maryland, Fall 2012 Motivation cancergenome.nih.gov Breast cancer karyotypes www.path.cam.ac.uk Goal of whole-genome alignment } For two genomes, A and
More informationGenome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering
Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation
More informationHomolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics
Orthologue Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Normally, orthologs retain the same function in the course of evolution. Identification of orthologs
More informationCase Study. Who s the daddy? TEACHER S GUIDE. James Clarkson. Dean Madden [Ed.] Polyploidy in plant evolution. Version 1.1. Royal Botanic Gardens, Kew
TEACHER S GUIDE Case Study Who s the daddy? Polyploidy in plant evolution James Clarkson Royal Botanic Gardens, Kew Dean Madden [Ed.] NCBE, University of Reading Version 1.1 Polypoidy in plant evolution
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationGenome-wide analysis of the MYB transcription factor superfamily in soybean
Du et al. BMC Plant Biology 2012, 12:106 RESEARCH ARTICLE Open Access Genome-wide analysis of the MYB transcription factor superfamily in soybean Hai Du 1,2,3, Si-Si Yang 1,2, Zhe Liang 4, Bo-Run Feng
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More information3/8/ Complex adaptations. 2. often a novel trait
Chapter 10 Adaptation: from genes to traits p. 302 10.1 Cascades of Genes (p. 304) 1. Complex adaptations A. Coexpressed traits selected for a common function, 2. often a novel trait A. not inherited from
More informationFUNDAMENTALS OF MOLECULAR EVOLUTION
FUNDAMENTALS OF MOLECULAR EVOLUTION Second Edition Dan Graur TELAVIV UNIVERSITY Wen-Hsiung Li UNIVERSITY OF CHICAGO SINAUER ASSOCIATES, INC., Publishers Sunderland, Massachusetts Contents Preface xiii
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationStatistical Analyses and Markov Modeling of Duplication in Genome Evolution
Statistical Analyses and Markov Modeling of Duplication in Genome Evolution By Yi (Joey) Zhou A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationPhylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human
Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human Leo Goodstadt *, Chris P. Ponting Medical Research Council Functional Genetics Unit, University of Oxford, Department
More informationExample of Function Prediction
Find similar genes Example of Function Prediction Suggesting functions of newly identified genes It was known that mutations of NF1 are associated with inherited disease neurofibromatosis 1; but little
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More informationChapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships
Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic
More informationIntro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!
Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under
More informationGenômica comparativa. João Carlos Setubal IQ-USP outubro /5/2012 J. C. Setubal
Genômica comparativa João Carlos Setubal IQ-USP outubro 2012 11/5/2012 J. C. Setubal 1 Comparative genomics There are currently (out/2012) 2,230 completed sequenced microbial genomes publicly available
More informationCladistics and Bioinformatics Questions 2013
AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species
More information5/4/05 Biol 473 lecture
5/4/05 Biol 473 lecture animals shown: anomalocaris and hallucigenia 1 The Cambrian Explosion - 550 MYA THE BIG BANG OF ANIMAL EVOLUTION Cambrian explosion was characterized by the sudden and roughly simultaneous
More informationChapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics
Chapter 18 Lecture Concepts of Genetics Tenth Edition Developmental Genetics Chapter Contents 18.1 Differentiated States Develop from Coordinated Programs of Gene Expression 18.2 Evolutionary Conservation
More informationGene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family
Review: Gene Families Gene Families part 2 03 327/727 Lecture 8 What is a Case study: ian globin genes Gene trees and how they differ from species trees Homology, orthology, and paralogy Last tuesday 1
More informationA SINE in the genome of the cephalochordate amphioxus is an Alu element
Int. J. Biol. Sci. 2006, 2 61 Research paper International Journal of Biological Sciences ISSN 1449-2288 www.biolsci.org 2006 2(2):61-65 2006 Ivyspring International Publisher. All rights reserved A SINE
More informationDrosophila melanogaster and D. simulans, two fruit fly species that are nearly
Comparative Genomics: Human versus chimpanzee 1. Introduction The chimpanzee is the closest living relative to humans. The two species are nearly identical in DNA sequence (>98% identity), yet vastly different
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationFitness constraints on horizontal gene transfer
Fitness constraints on horizontal gene transfer Dan I Andersson University of Uppsala, Department of Medical Biochemistry and Microbiology, Uppsala, Sweden GMM 3, 30 Aug--2 Sep, Oslo, Norway Acknowledgements:
More informationIntroduction to Bioinformatics Introduction to Bioinformatics
Dr. rer. nat. Gong Jing Cancer Research Center Medicine School of Shandong University 2012.11.09 1 Chapter 4 Phylogenetic Tree 2 Phylogeny Evidence from morphological ( 形态学的 ), biochemical, and gene sequence
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationIntroduction to Sequence Alignment. Manpreet S. Katari
Introduction to Sequence Alignment Manpreet S. Katari 1 Outline 1. Global vs. local approaches to aligning sequences 1. Dot Plots 2. BLAST 1. Dynamic Programming 3. Hash Tables 1. BLAT 4. BWT (Burrow Wheeler
More informationBiochemistry 324 Bioinformatics. Pairwise sequence alignment
Biochemistry 324 Bioinformatics Pairwise sequence alignment How do we compare genes/proteins? When we have sequenced a genome, we try and identify the function of unknown genes by finding a similar gene
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationImpact of recurrent gene duplication on adaptation of plant genomes
Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret Plant Genomes
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More informationBrowsing Genomic Information with Ensembl Plants
Browsing Genomic Information with Ensembl Plants Etienne de Villiers, PhD (Adapted from slides by Bert Overduin EMBL-EBI) Outline of workshop Brief introduction to Ensembl Plants History Content Tutorial
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationSmall RNA in rice genome
Vol. 45 No. 5 SCIENCE IN CHINA (Series C) October 2002 Small RNA in rice genome WANG Kai ( 1, ZHU Xiaopeng ( 2, ZHONG Lan ( 1,3 & CHEN Runsheng ( 1,2 1. Beijing Genomics Institute/Center of Genomics and
More informationAnnotation and Nomenclature: A Zebrafish Example. Ingo Braasch, Julian Catchen and John Postlethwait
Annotation and Nomenclature: A Zebrafish Example Ingo Braasch, Julian Catchen and John Postlethwait Annotation and Nomenclature: An Example: Zebrafish The goal Solutions Annotation and Nomenclature: An
More informationTitle slide (1) Tree of life 1891 Ernst Haeckel, Title on left
MDIBL talk July 14, 2005 The Evolution of Cytochrome P450 in animals. Title slide (1) Tree of life 1891 Ernst Haeckel, Title on left My opening slide is a collage (2) containing 35 eukaryotic species with
More informationEukaryotic Gene Expression
Eukaryotic Gene Expression Lectures 22-23 Several Features Distinguish Eukaryotic Processes From Mechanisms in Bacteria 123 Eukaryotic Gene Expression Several Features Distinguish Eukaryotic Processes
More information