Repetitive sequences analysis

Size: px
Start display at page:

Download "Repetitive sequences analysis"

Transcription

1 Repetitive sequences analysis Érica Ramos Repetitive elements characterization Martins et al., 2010.!'

2 Repetitive elements characterization Martins et al., Identical or similar sequences, which can be in tandem or dispersed throughout the genome. Repetitive element Multigene Families Satellite (SatDNA) Minisatellite (VNTR) Microsatellite (SSR) Transposable elements (TEs) Description Group of genes that descend from a common ancestral gene and therefore have similar functions and similar sequence. Highly repeated sequences, units bp, vary in structure, location and quantity. HC marker. Moderately repeated, units bp, variable number of repeats (markers). Short, highly polymorphic repeats, units 1-6 bp (markers). Mobile repeated sequences, able to transpose in the genome. ('

3 General sctructure of TEs Martins et al., Martins et al., )'

4 Martins et al., Repetitive landscape in genome de Koning APJ, Gu W, Castoe TA, Batzer MA, et al. (2011) Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLoS Genet 7(12): e doi: /journal.pgen &'

5 IDENTIFYING REPETITIVE ELEMENTS: GENOMIC TOOLS Next-generation data issues! Assembly problems! Martins et al., *'

6 Biological issues Repetitive sequences are poorly conserved Can be truncated Are under lots of transposable and duplication events Identification of repetitive elements 3 mainly principles: Homology Structure De novo +'

7 Homology search! Similarity and identity with described elements (filogenetically related species) 1)RepeatMasker Repeats library Assembled reads Input REPEAT MASKER Alignment (crossmatch/wu- Blast) Masking sequence CSV or PNG file Output Homology search! Similarity and identity with described elements (filogenetically related species) Repeats library Assembled reads Input REPEAT MASKER Alignment (crossmatch/wu- Blast) Masking sequence Important to gene/ests annotation CSV or PNG file Output %'

8 "'

9 II) BLAST Blast Local Alignment Search Tool Desenhado para buscas em grandes banco de dados Structure search! Elements signatures : common structures for some class of elements Wicker et a.l 2007! LTRfinder,'

10 De novo search k-mer approach! I) Annotating assemblies Sequence self-comparison Periodicity approache! II) Clustering reads (repeat explorer)* I) Annotating assemblies: K-mer approach! Scan overrepresented oligos (short k-mer) allowing some mismatches Depends on which repetitive element and which genom: oligo size and mismatches REPuter Vmatch Repeat-match!$'

11 I) Annotating assemblies: Sequence selfcomparison! Similarity search! Clustering of hits! Consensus element I) Annotating assemblies: Periodicity approach! Sliding window analysis of assembly, searching periodicity.!!'

12 II) Clustering reads: REPEAT EXPLORER! Graph-based clustering Novak et. al. 2010!('

13 Comparison table!"#$%&' (&)*+#*,"-'./-*&/)*+#*,"' 0%1%2%,3' 4#567#65"'8*-"&'."'+%)%' 926-#"5/+,'5"*&-' -./0' A40470'4B4<'C6D'76?>'<;3948' A40470'4B4<'C6D'76?>'<;3948' A40470'<4D'4C434<0/' -C4J29C4' A40470'<4D'4C434<0/' 16D'76B48.G4'5.0.' N60'<445'763?;0.K6<.C'O<6DC45G4' P/4'84.5/'528470C>' '06'54/782945'/4:;4<74/' ' ' E.7F2<4'C4.8<2<G'?869C43/' '06'54/782945'/08;70;84' ' H<C>'540470'F2GF'<;3948'6I'76?24/' A6'<60'52/K<G;2/F'L4/' E;/0'94'3.<<;.C>'7;8.045' M;9C27'/48B48'2/'/C6D' ' ' Automated annotation! Pipeline with combined methods! generates a consensus annotation:! JigSaw (necessary training)! EVidenceModeler (user must set expected errors)! Evigan (unsupervisioned learning method)!)'

14 References! Bergman, M.C.; Quesneville, H. Discovering and detecting transposableelements in genome sequences. Briefings in bioinformatics (2007).! Janicki, M.; Rooke, R.; Yang, G. Bioinformatics and genomic analysis of transposable elements in eukaryotic genomes. Chromosome Research (2011).! Kining, A. P. J.; Gu, W; Castoe, T. A.; Batzer, M. A.; Pollock, D. D. Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLOS Genetics (2011).! Lerat, E. Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity (2010).! Maka!owski, W; Pande, A; Gotea, V; Makalowska, I.Transposable Elements and Their Identification. Evolutionary Genomics: Statistical and Computational Methods (2012).! Martins C, Cabral-de-Mello DC, Valente GT, Mazzuchelli J, Oliveira SG (2010). Cytogenetic mapping and its contribution to the knowledge of animal genomes. In Genetic Mapping. Columbus F (Ed.). Nova Science Publisher, Hauppauge, NY, USA.! Novák, P.; Neumann, P.; Pech, J.; Macas, J.Repeat explorer: a Galaxy-based web server for genomewide characterization of eukaryotic repetitive elements from next-generation sequence reads! Novák, P. ; Neumann, P; Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics (2010).! Wicker, T. ; Sabot, F.; Hua-Van, A.; Bennetzen, J.L.; Capy, P.; Chalhoub, P.; Flavell, A.; Leroy, P. ; Morgante, M.; Panaud, O.; Paux, E.; SanMiguel, P. ; Schulman, A. H. A unified classification system for eukaryotic transposable elements. Nature Reviews Genetics (2007)! Yandell, M.; Hence, D. A beginner s guide to eukaryotic genome annotation. Nature Reviews Genetics (2012).!&'

BIOINFORMATICS. PILER: identification and classification of genomic repeats. Robert C. Edgar 1* and Eugene W. Myers 2 1 INTRODUCTION

BIOINFORMATICS. PILER: identification and classification of genomic repeats. Robert C. Edgar 1* and Eugene W. Myers 2 1 INTRODUCTION BIOINFORMATICS Vol. 1 no. 1 2003 Pages 1 1 PILER: identification and classification of genomic repeats Robert C. Edgar 1* and Eugene W. Myers 2 1 195 Roque Moraes Drive, Mill Valley, CA, U.S.A., bob@drive5.com.

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Frequently Asked Questions (FAQs)

Frequently Asked Questions (FAQs) Frequently Asked Questions (FAQs) Q1. What is meant by Satellite and Repetitive DNA? Ans: Satellite and repetitive DNA generally refers to DNA whose base sequence is repeated many times throughout the

More information

Comparative genomics: Overview & Tools + MUMmer algorithm

Comparative genomics: Overview & Tools + MUMmer algorithm Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Chapter 18 Active Reading Guide Genomes and Their Evolution

Chapter 18 Active Reading Guide Genomes and Their Evolution Name: AP Biology Mr. Croft Chapter 18 Active Reading Guide Genomes and Their Evolution Most AP Biology teachers think this chapter involves an advanced topic. The questions posed here will help you understand

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

23/01/2018. PiRATE: a Pipeline to Retrieve and Annotate TEs of non-model organisms. Transposable elements (TEs) Impact of TEs on genomes

23/01/2018. PiRATE: a Pipeline to Retrieve and Annotate TEs of non-model organisms. Transposable elements (TEs) Impact of TEs on genomes Transposable elements () PiRATE: a Pipeline to Retrieve and Annotate of non-model organisms are DNAsequences able to move (= transposition) into the host genome of eucaryotic and procaryotic organisms

More information

Multiple Alignment of Genomic Sequences

Multiple Alignment of Genomic Sequences Ross Metzger June 4, 2004 Biochemistry 218 Multiple Alignment of Genomic Sequences Genomic sequence is currently available from ENTREZ for more than 40 eukaryotic and 157 prokaryotic organisms. As part

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

RGP finder: prediction of Genomic Islands

RGP finder: prediction of Genomic Islands Training courses on MicroScope platform RGP finder: prediction of Genomic Islands Dynamics of bacterial genomes Gene gain Horizontal gene transfer Gene loss Deletion of one or several genes Duplication

More information

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

More information

HIGH PERFORMANCE CLUSTER AND GRID COMPUTING SOLUTIONS FOR SCIENCE UMESHKUMAR KESWANI. Presented to the Faculty of the Graduate School of

HIGH PERFORMANCE CLUSTER AND GRID COMPUTING SOLUTIONS FOR SCIENCE UMESHKUMAR KESWANI. Presented to the Faculty of the Graduate School of HIGH PERFORMANCE CLUSTER AND GRID COMPUTING SOLUTIONS FOR SCIENCE By UMESHKUMAR KESWANI Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of

More information

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Introduction to de novo RNA-seq assembly

Introduction to de novo RNA-seq assembly Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI

GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI 1 GENOME-WIDE ANALYSIS OF CORE PROMOTER REGIONS IN EMILIANIA HUXLEYI Justin Dailey and Xiaoyu Zhang Department of Computer Science, California State University San Marcos San Marcos, CA 92096 Email: daile005@csusm.edu,

More information

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

TE content correlates positively with genome size

TE content correlates positively with genome size TE content correlates positively with genome size Mb 3000 Genomic DNA 2500 2000 1500 1000 TE DNA Protein-coding DNA 500 0 Feschotte & Pritham 2006 Transposable elements. Variation in gene numbers cannot

More information

Genomes Comparision via de Bruijn graphs

Genomes Comparision via de Bruijn graphs Genomes Comparision via de Bruijn graphs Student: Ilya Minkin Advisor: Son Pham St. Petersburg Academic University June 4, 2012 1 / 19 Synteny Blocks: Algorithmic challenge Suppose that we are given two

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Comparative Genomics II

Comparative Genomics II Comparative Genomics II Advances in Bioinformatics and Genomics GEN 240B Jason Stajich May 19 Comparative Genomics II Slide 1/31 Outline Introduction Gene Families Pairwise Methods Phylogenetic Methods

More information

Introduction to the SNP/ND concept - Phylogeny on WGS data

Introduction to the SNP/ND concept - Phylogeny on WGS data Introduction to the SNP/ND concept - Phylogeny on WGS data Johanne Ahrenfeldt PhD student Overview What is Phylogeny and what can it be used for Single Nucleotide Polymorphism (SNP) methods CSI Phylogeny

More information

Jay Moore,, Graham King, James Lynn. Data integration for Brassica comparative genomics

Jay Moore,, Graham King, James Lynn. Data integration for Brassica comparative genomics Jay Moore,, Graham King, James Lynn Data integration for Brassica comparative genomics How best to bring together diverse data about Brassica genome organisation? How best to make data accessible and useful

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign Phylogenomics, Multiple Sequence Alignment, and Metagenomics Tandy Warnow University of Illinois at Urbana-Champaign Phylogeny (evolutionary tree) Orangutan Gorilla Chimpanzee Human From the Tree of the

More information

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic Cross Discipline Analysis made possible with Data Pipelining J.R. Tozer SciTegic System Genesis Pipelining tool created to automate data processing in cheminformatics Modular system built with generic

More information

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism

More information

K-means-based Feature Learning for Protein Sequence Classification

K-means-based Feature Learning for Protein Sequence Classification K-means-based Feature Learning for Protein Sequence Classification Paul Melman and Usman W. Roshan Department of Computer Science, NJIT Newark, NJ, 07102, USA pm462@njit.edu, usman.w.roshan@njit.edu Abstract

More information

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS

HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS HORIZONTAL TRANSFER IN EUKARYOTES KIMBERLEY MC GRAIL FERNÁNDEZ GENOMICS OVERVIEW INTRODUCTION MECHANISMS OF HGT IDENTIFICATION TECHNIQUES EXAMPLES - Wolbachia pipientis - Fungus - Plants - Drosophila ananassae

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies

Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies Daren C. Card 1, Drew R. Schield 1, Jacobo Reyes-Velasco 1, Matthew K. Fujita 1, Audra L. Andrew 1, Sara

More information

"Omics" - Experimental Approachs 11/18/05

Omics - Experimental Approachs 11/18/05 "Omics" - Experimental Approachs Bioinformatics Seminars "Omics" Experimental Approaches Nov 18 Fri 12:10 BCB Seminar in E164 Lago Using P-Values for the Planning and Analysis of Microarray Experiments

More information

Session 5: Phylogenomics

Session 5: Phylogenomics Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree

More information

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007

Mathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007 -2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open

More information

Molecular Biology: from sequence analysis to signal processing. University of Sao Paulo. Junior Barrera

Molecular Biology: from sequence analysis to signal processing. University of Sao Paulo. Junior Barrera Molecular Biology: from sequence analysis to signal processing Junior Barrera University of Sao Paulo Layout Introduction Knowledge evolution in Genetics Data acquisition Data Analysis A system for genetic

More information

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven) BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Conservation Genetics. Outline

Conservation Genetics. Outline Conservation Genetics The basis for an evolutionary conservation Outline Introduction to conservation genetics Genetic diversity and measurement Genetic consequences of small population size and extinction.

More information

A DNA Sequence 2017/12/6 1

A DNA Sequence 2017/12/6 1 A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta

More information

Applications of genome alignment

Applications of genome alignment Applications of genome alignment Comparing different genome assemblies Locating genome duplications and conserved segments Gene finding through comparative genomics Analyzing pathogenic bacteria against

More information

Supplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males

Supplementary Figure 1 The number of differentially expressed genes for uniparental males (green), uniparental females (yellow), biparental males Supplementary Figure 1 The number of differentially expressed genes for males (green), females (yellow), males (red), and females (blue) in caring vs. control comparisons in the caring gene set and the

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Unsupervised Learning in Spectral Genome Analysis

Unsupervised Learning in Spectral Genome Analysis Unsupervised Learning in Spectral Genome Analysis Lutz Hamel 1, Neha Nahar 1, Maria S. Poptsova 2, Olga Zhaxybayeva 3, J. Peter Gogarten 2 1 Department of Computer Sciences and Statistics, University of

More information

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109

CONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109 CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification

More information

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

More information

Unfixed endogenous retroviral insertions in the human population. Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw

Unfixed endogenous retroviral insertions in the human population. Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw Unfixed endogenous retroviral insertions in the human population Emanuele Marchi, Alex Kanapin, Gkikas Magiorkinis and Robert Belshaw Supplementary Methods Common sources of 'false positives' in mining

More information

objective functions...

objective functions... objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set

More information

CNV Methods File format v2.0 Software v2.0.0 September, 2011

CNV Methods File format v2.0 Software v2.0.0 September, 2011 File format v2.0 Software v2.0.0 September, 2011 Copyright 2011 Complete Genomics Incorporated. All rights reserved. cpal and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries.

More information

Molecular Markers, Natural History, and Evolution

Molecular Markers, Natural History, and Evolution Molecular Markers, Natural History, and Evolution Second Edition JOHN C. AVISE University of Georgia Sinauer Associates, Inc. Publishers Sunderland, Massachusetts Contents PART I Background CHAPTER 1:

More information

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Research Proposal Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family. Name: Minjal Pancholi Howard University Washington, DC. June 19, 2009 Research

More information

AS A SERVICE TO THE RESEARCH COMMUNITY, GENOME BIOLOGY PROVIDES A 'PREPRINT' DEPOSITORY

AS A SERVICE TO THE RESEARCH COMMUNITY, GENOME BIOLOGY PROVIDES A 'PREPRINT' DEPOSITORY http://genomebiology.com/2002/3/12/preprint/0011.1 This information has not been peer-reviewed. Responsibility for the findings rests solely with the author(s). Deposited research article MRD: a microsatellite

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Supplementary Information

Supplementary Information Supplementary Information LINE-1-like retrotransposons contribute to RNA-based gene duplication in dicots Zhenglin Zhu 1, Shengjun Tan 2, Yaqiong Zhang 2, Yong E. Zhang 2,3 1. School of Life Sciences,

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Hands-On Nine The PAX6 Gene and Protein

Hands-On Nine The PAX6 Gene and Protein Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.

More information

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector.

The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. The Developmental Transcriptome of the Mosquito Aedes aegypti, an invasive species and major arbovirus vector. Omar S. Akbari*, Igor Antoshechkin*, Henry Amrhein, Brian Williams, Race Diloreto, Jeremy

More information

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France

Detecting unfolded regions in protein sequences. Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France Detecting unfolded regions in protein sequences Anne Poupon Génomique Structurale de la Levure IBBMC Université Paris-Sud / CNRS France Large proteins and complexes: a domain approach Structural studies

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

#33 - Genomics 11/09/07

#33 - Genomics 11/09/07 BCB 444/544 Required Reading (before lecture) Lecture 33 Mon Nov 5 - Lecture 31 Phylogenetics Parsimony and ML Chp 11 - pp 142 169 Genomics Wed Nov 7 - Lecture 32 Machine Learning Fri Nov 9 - Lecture 33

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

More information

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Analysis of N-terminal Acetylation data with Kernel-Based Clustering Analysis of N-terminal Acetylation data with Kernel-Based Clustering Ying Liu Department of Computational Biology, School of Medicine University of Pittsburgh yil43@pitt.edu 1 Introduction N-terminal acetylation

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

DNA, Chromosomes, and Genes

DNA, Chromosomes, and Genes N, hromosomes, and Genes 1 You have most likely already learned about deoxyribonucleic acid (N), chromosomes, and genes. You have learned that all three of these substances have something to do with heredity

More information

Orthologs Detection and Applications

Orthologs Detection and Applications Orthologs Detection and Applications Marcus Lechner Bioinformatics Leipzig 2009-10-23 Marcus Lechner (Bioinformatics Leipzig) Orthologs Detection and Applications 2009-10-23 1 / 25 Table of contents 1

More information

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS

A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology

More information

Multiple Whole Genome Alignment

Multiple Whole Genome Alignment Multiple Whole Genome Alignment BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 206 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Impact of recurrent gene duplication on adaptation of plant genomes

Impact of recurrent gene duplication on adaptation of plant genomes Impact of recurrent gene duplication on adaptation of plant genomes Iris Fischer, Jacques Dainat, Vincent Ranwez, Sylvain Glémin, Jacques David, Jean-François Dufayard, Nathalie Chantret Plant Genomes

More information

Chapter 2. Gene Orthology Assessment with OrthologID. Mary Egan, Ernest K. Lee, Joanna C. Chiu, Gloria Coruzzi, and Rob DeSalle.

Chapter 2. Gene Orthology Assessment with OrthologID. Mary Egan, Ernest K. Lee, Joanna C. Chiu, Gloria Coruzzi, and Rob DeSalle. Chapter 2 Gene Orthology Assessment with OrthologID Mary Egan, Ernest K. Lee, Joanna C. Chiu, Gloria Coruzzi, and Rob DeSalle Abstract OrthologID (http://nypg.bio.nyu.edu/orthologid/) allows for the rapid

More information

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles

Computational Genetics Winter 2013 Lecture 10. Eleazar Eskin University of California, Los Angeles Computational Genetics Winter 2013 Lecture 10 Eleazar Eskin University of California, Los ngeles Pair End Sequencing Lecture 10. February 20th, 2013 (Slides from Ben Raphael) Chromosome Painting: Normal

More information

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014

Assigning Taxonomy to Marker Genes. Susan Huse Brown University August 7, 2014 Assigning Taxonomy to Marker Genes Susan Huse Brown University August 7, 2014 In a nutshell Taxonomy is assigned by comparing your DNA sequences against a database of DNA sequences from known taxa Marker

More information

Evolution (Chapters 15 & 16)

Evolution (Chapters 15 & 16) Evolution (Chapters 15 & 16) Before You Read... Use the What I Know column to list the things you know about evolution. Then list the questions you have about evolution in the What I Want to Find Out column.

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

The nature of genomes. Viral genomes. Prokaryotic genome. Nonliving particle. DNA or RNA. Compact genomes with little spacer DNA

The nature of genomes. Viral genomes. Prokaryotic genome. Nonliving particle. DNA or RNA. Compact genomes with little spacer DNA The nature of genomes Genomics: study of structure and function of genomes Genome size variable, by orders of magnitude number of genes roughly proportional to genome size Plasmids symbiotic DNA molecules,

More information

Graduate Funding Information Center

Graduate Funding Information Center Graduate Funding Information Center UNC-Chapel Hill, The Graduate School Graduate Student Proposal Sponsor: Program Title: NESCent Graduate Fellowship Department: Biology Funding Type: Fellowship Year:

More information

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST Introduction Bioinformatics is a powerful tool which can be used to determine evolutionary relationships and

More information

Typical Life Cycle of Algae and Fungi. 5 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Typical Life Cycle of Algae and Fungi. 5 Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Module 3B Meiosis and Sexual Life Cycles In this module, we will examine a second type of cell division used by eukaryotic cells called meiosis. In addition, we will see how the 2 types of eukaryotic cell

More information

Annotation of Drosophila grimashawi Contig12

Annotation of Drosophila grimashawi Contig12 Annotation of Drosophila grimashawi Contig12 Marshall Strother April 27, 2009 Contents 1 Overview 3 2 Genes 3 2.1 Genscan Feature 12.4............................................. 3 2.1.1 Genome Browser:

More information

Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula

Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula Supplementary Information for: The genome of the extremophile crucifer Thellungiella parvula Maheshi Dassanayake 1,9, Dong-Ha Oh 1,9, Jeffrey S. Haas 1,2, Alvaro Hernandez 3, Hyewon Hong 1,4, Shahjahan

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

G4120: Introduction to Computational Biology

G4120: Introduction to Computational Biology ICB Fall 2009 G4120: Introduction to Computational Biology Oliver Jovanovic, Ph.D. Columbia University Department of Microbiology & Immunology Copyright 2008 Oliver Jovanovic, All Rights Reserved. Genome

More information

1. CHEMISTRY OF LIFE. Tutorial Outline

1. CHEMISTRY OF LIFE. Tutorial Outline Tutorial Outline North Carolina Tutorials are designed specifically for the Common Core State Standards for English language arts, the North Carolina Standard Course of Study for Math, and the North Carolina

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

Synteny Portal Documentation

Synteny Portal Documentation Synteny Portal Documentation Synteny Portal is a web application portal for visualizing, browsing, searching and building synteny blocks. Synteny Portal provides four main web applications: SynCircos,

More information

Tandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb

Tandem repeat 16,225 20,284. 0kb 5kb 10kb 15kb 20kb 25kb 30kb 35kb Overview Fosmid XAAA112 consists of 34,783 nucleotides. Blat results indicate that this fosmid has significant identity to the 2R chromosome of D.melanogaster. Evidence suggests that fosmid XAAA112 contains

More information

Practical considerations of working with sequencing data

Practical considerations of working with sequencing data Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering

Genome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation

More information

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Jeremy Chang Identifying protein protein interactions with statistical coupling analysis Abstract: We used an algorithm known as statistical coupling analysis (SCA) 1 to create a set of features for building

More information