Multi-Assembly Problems for RNA Transcripts
|
|
- Sharleen Sparks
- 6 years ago
- Views:
Transcription
1 Multi-Assembly Problems for RNA Transcripts Alexandru Tomescu Department of Computer Science University of Helsinki Joint work with Veli Mäkinen, Anna Kuosmanen, Romeo Rizzi, Travis Gagie, Alex Popa CiE July 3, / 33
2 CENTRAL DOGMA OF MOLECULAR BIOLOGY DNA gene 1 intron exon transcription pre-mrna alternative splicing mature mrna transcripts translation proteins 2 / 33
3 RNA-SEQUENCING DNA methylation gene mature mrna transcripts RNA sequencing proteins Problem: assemble the RNA transcripts from the RNA-Seq reads and quantify their expression levels 3 / 33
4 MULTI-ASSEMBLY Assembly of fragments from different, but related, sequences transcriptomics (RNA-Seq) viral quasi-species metagenomics Assumptions: " existing reference (genome-guided multi-assembly) $ no existing annotation 4 / 33
5 SPLICING GRAPHS / 33
6 SPLICING GRAPHS Splicing graphs: exons nodes 2 2 reads overlapping two exons arcs + coverage information Existing reference 7 = directed acyclic graphs (DAGs) / 33
7 OVERLAP GRAPHS reads nodes overlaps arcs coverage information 7 Existing reference = directed acyclic graphs (DAGs) 5 6 / 33
8 OUTLINE OF THE TALK Three problem formulations: 1. Assembly only 2. Simultaneous assembly and estimation of expression levels 3. Assembly only, with long reads, or paired-end reads 7 / 33
9 OUTLINE OF THE TALK Three problem formulations: 1. Assembly only 2. Simultaneous assembly and estimation of expression levels 3. Assembly only, with long reads, or paired-end reads 8 / 33
10 ASSEMBLY: MINIMUM PATH COVER (MPC) What is the minimum number of paths required to cover all nodes of a DAG? RNA-Seq: Cufflinks 2010, CLASS 2012, BRANCH 2013 Viral quasi-species: ShoRAH / 33
11 ASSEMBLY: MINIMUM PATH COVER (MPC) What is the minimum number of paths required to cover all nodes of a DAG? RNA-Seq: Cufflinks 2010, CLASS 2012, BRANCH 2013 Viral quasi-species: ShoRAH / 33
12 ASSEMBLY: MINIMUM PATH COVER (MPC) What is the minimum number of paths required to cover all nodes of a DAG? RNA-Seq: Cufflinks 2010, CLASS 2012, BRANCH 2013 Viral quasi-species: ShoRAH / 33
13 ASSEMBLY: MINIMUM PATH COVER (MPC) In general it is NP-hard (one path iff G has a Hamiltonian path) But it is solvable in polynomial-time on DAGs: Dilworth s theorem Fulkerson s constructive proof 1956 by a maximum matching algorithm, solvable in time O(t(G) n) the weighted version can be solved in time O(n 2 log n + t(g)n) where m t(g) n 2 is #arcs in the transitive closure of G. 11 / 33
14 MIN-COST MPC VIA MIN-COST FLOWS Unweighted case: MPC via min-flows, e.g. [Pijls, Potharst, 2013] Weighted case: MPC via min-cost flows 12 / 33
15 MIN-COST MPC VIA MIN-COST FLOWS Unweighted case: MPC via min-flows, [Pijls, Potharst, 2013] Weighted case: MPC via min-cost flows 13 / 33
16 MIN-COST MPC VIA MIN-COST FLOWS Unweighted case: MPC via min-flows, [Pijls, Potharst, 2013] Weighted case: MPC via min-cost flows 14 / 33
17 MPC VIA MIN-COST FLOWS This min-cost flow problem can be solved in time O(n 2 log n + nm) by [Gabow and Tarjan, 1991] observed in [Rizzi, T., Mäkinen, 2014] This is better than O(n 2 log n + nt(g)), since m t(g) n 2 as soon as there is a path of length O(n), we have t(g) = O(n 2 ) 15 / 33
18 OUTLINE OF THE TALK Three problem formulations: 1. Assembly only 2. Simultaneous assembly and estimation of expression levels 3. Assembly only, with long reads, or paired-end reads 16 / 33
19 ASSEMBLY AND ESTIMATION OF EXPRESSION LEVELS INPUT: An arc-weighted DAG G, and A superset S of the sources, and a superset T of the sinks TASK: Find a collection of paths P 1,..., P k in G, and their expression levels e 1,..., e k, such that: every P i starts in S, and ends in T, and the following cost is minimized w(x, y) (x,y) E j : (x,y) P j e j. Variants for RNA-Seq in: IsoInfer 2010, IsoLasso 2011, CLIIQ 2012, FlipFlop / 33
20 ASSEMBLY AND ESTIMATION OF EXPRESSION LEVELS a b c d e f g h a b c d e f g h Cost is / 33
21 ASSEMBLY AND ESTIMATION OF EXPRESSION LEVELS Previous solutions based on enumeration of all paths (+ILP) Solvable in polynomial-time by min-cost flows [T., Kuosmanen, Rizzi, Mäkinen, 2013] If number k of paths is given in input, then NP-hard But solvable in time O(W k aw(g) k n 2 ) [T., Gagie, Popa, Rizzi, Kuosmanen, Mäkinen, 2015] 19 / 33
22 OUTLINE OF THE TALK Three problem formulations: 1. Assembly only 2. Simultaneous assembly and estimation of expression levels 3. Assembly only, with long reads, or paired-end reads 20 / 33
23 ASSEMBLY WITH LONG READS / 33
24 ASSEMBLY WITH LONG READS (2) 22 / 33
25 ASSEMBLY WITH LONG READS 23 / 33
26 MIN-COST MPC WITH SUBPATH CONSTRAINTS INPUT: An arc-weighted DAG G, and 1. A superset S of the sources, and a superset T of the sinks 2. A family P in = {P in 1,..., Pin c } of directed paths in G TASK: Find a minimum number k of directed paths P sol 1,..., Psol k in G such that 1. Every node in V(G) occurs in some P sol i 2. Every path P in P in is a subpath of some P sol i 3. Every path P sol i starts in S and ends in T k 4. w(e) is minimum among all such k paths i=1 edge e P sol i introduced by [Bao, Jiang, Girke, 2013, BRANCH], but the case of overlapping constraints not solved 24 / 33
27 MIN-COST MPC WITH SUBPATH CONSTRAINTS s t 25 / 33
28 MIN-COST MPC WITH SUBPATH CONSTRAINTS Subpath constraints as arc demands: / 33
29 MIN-COST MPC WITH SUBPATH CONSTRAINTS Problem 1: a constraint P included in another constraint Q Remove P Can be implemented in time O(N) with a suffix tree for large alphabets, [Farach, 1997] N = sum of lengths of Subpath Constraints 27 / 33
30 MIN-COST MPC WITH SUBPATH CONSTRAINTS Problem 2: Suffix-prefix overlaps Iteratively merge constraints with longest suffix-prefix overlap All suffix-prefix overlaps can be found in optimal time O(N + overlaps ) by [Gusfield, Landau and Schieber, 1992] Our iterative merging also takes O(N + overlaps ) time 28 / 33
31 MIN-COST MPC WITH SUBPATH CONSTRAINTS Pre-processing phase O(N + overlaps ) The flow network has size: O(n) nodes and O(m + c) arcs Min-cost MPC with Subpath Constraints can be solved in time O(N + overlaps + n 2 log n + n(m + c)) using [Gabow and Tarjan, 1991] [Rizzi, T., Mäkinen, 2014] 29 / 33
32 MPC WITH PAIRED SUBPATH CONSTRAINTS INPUT: A DAG G and 1. A family P in = {(P in directed paths in G 1,1, Pin 1,2 ),..., (Pin t,1, Pin t,2 )} of pairs of TASK: Find a minimum number k of directed paths P sol 1,..., Psol k in G such that 1. Every node in V(G) occurs in some P sol i 2. For every pair (P in j,1, Pin j,2 ) Pin, there exists P sol i such that both P in j,1 and Pin j,2 are subpaths of Psol i introduced by [Song and Florea, 2013, CLASS] NP-hard [Rizzi, T., Mäkinen, 2014] [Beerenwinkel, Beretta, Bonizzoni, Dondi and Pirola, 2014] 30 / 33
33 CONCLUSIONS Min-cost Minimum Path Cover O(n 2 log n + nm) Simultaneous assembly and expression estimation polynomial-time, but NP-hard for given k Min-cost Minimum Path Cover with Subpath Constraints O(N + overlaps + n 2 log n + n(m + c)) c = number of Subpath Constraints N = sum of lengths of Subpath Constraints Minimum Path Cover with Pairs of Subpaths Constraints NP-hard 31 / 33
34 Multi-assembly Assembly Assembly and expression levels Long, and paired-end reads End A DVERTISEMENT data structures and algorithms, and for their applications in ly precise style, illustrated with a great collection of any exercises, makes it an ideal resource for researchers, d, Germany hematically precise, compelling explanations make the de bioinformatics accessible to a wide audience. tate University, USA ech, USA orithms and data structures that power modern sequence overs a range of topics from the foundations of biological and hidden Markov models), to classical index structures d suffix trees), Burrows Wheeler indexes, graph algorithms, cs applications. The chapters feature numerous examples, es, and problems, providing graduate students and lkit for the emerging applications of high-throughput website ( offers LaTeX source files evant links. he Department of Computer Science of the University of I MÄKINEN is also in charge of bioinformatics education in ELAZZOUGUI conducts research on hashing, space-efficient rithms. Dr FABIO CUNIAL focuses on string algorithms ANDRU I. TOMESCU s interests lie at the intersection of mputer science. tp://hillaryfayle.wordpress.com), eena Samela. GENOME-SCALE ALGORITHM DESIGN prehensive systematization of the concepts and tools at the matics The authors have created a rare, self-contained roduce the neophyte and assist the seasoned researcher urses directed at a mixed audience coming from diverse, Mäkinen, Belazzougui, Cunial and Tomescu ted book that fills a gap in the recent literature of textbooks a, Italy Veli Mäkinen, Djamal Belazzougui, Fabio Cunial and Alexandru I. Tomescu GENOME-SCALE ALGORITHM DESIGN BIOLOGICAL SEQUENCE ANALYSIS IN THE ERA OF HIGH-THROUGHPUT SEQUENCING 32 / 33
35 Multi-assembly Assembly Assembly and expression levels Long, and paired-end reads End Thank you 33 / 33
Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering
Proteomics 2 nd semester, 2013 1 Text book Principles of Proteomics by R. M. Twyman, BIOS Scientific Publications Other Reference books 1) Proteomics by C. David O Connor and B. David Hames, Scion Publishing
More informationTowards More Effective Formulations of the Genome Assembly Problem
Towards More Effective Formulations of the Genome Assembly Problem Alexandru Tomescu Department of Computer Science University of Helsinki, Finland DACS June 26, 2015 1 / 25 2 / 25 CENTRAL DOGMA OF BIOLOGY
More informationarxiv: v1 [cs.cc] 15 Nov 2016
Diploid Alignment is NP-hard Romeo Rizzi 1, Massimo Cairo 1, Veli Mäkinen 2, and Daniel Valenzuela 2 1 Department of Computer Science, University of Verona, Italy 2 Helsinki Institute for Information echnology,
More informationAlgorithms in Computational Biology (236522) spring 2008 Lecture #1
Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??
More informationGenome Assembly. Sequencing Output. High Throughput Sequencing
Genome High Throughput Sequencing Sequencing Output Example applications: Sequencing a genome (DNA) Sequencing a transcriptome and gene expression studies (RNA) ChIP (chromatin immunoprecipitation) Example
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationWhat is Systems Biology
What is Systems Biology 2 CBS, Department of Systems Biology 3 CBS, Department of Systems Biology Data integration In the Big Data era Combine different types of data, describing different things or the
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationGEP Annotation Report
GEP Annotation Report Note: For each gene described in this annotation report, you should also prepare the corresponding GFF, transcript and peptide sequence files as part of your submission. Student name:
More informationA Multiobjective Approach to the Weighted Longest Common Subsequence Problem
A Multiobjective Approach to the Weighted Longest Common Subsequence Problem David Becerra, Juan Mendivelso, and Yoan Pinzón Universidad Nacional de Colombia Facultad de Ingeniería Department of Computer
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationLeast Random Suffix/Prefix Matches in Output-Sensitive Time
Least Random Suffix/Prefix Matches in Output-Sensitive Time Niko Välimäki Department of Computer Science University of Helsinki nvalimak@cs.helsinki.fi 23rd Annual Symposium on Combinatorial Pattern Matching
More informationStructure-Based Comparison of Biomolecules
Structure-Based Comparison of Biomolecules Benedikt Christoph Wolters Seminar Bioinformatics Algorithms RWTH AACHEN 07/17/2015 Outline 1 Introduction and Motivation Protein Structure Hierarchy Protein
More informationarxiv: v1 [cs.ds] 2 Dec 2009
Variants of Constrained Longest Common Subsequence arxiv:0912.0368v1 [cs.ds] 2 Dec 2009 Paola Bonizzoni Gianluca Della Vedova Riccardo Dondi Yuri Pirola Abstract In this work, we consider a variant of
More informationJumbled String Matching: Motivations, Variants, Algorithms
Jumbled String Matching: Motivations, Variants, Algorithms Zsuzsanna Lipták University of Verona (Italy) Workshop Combinatorial structures for sequence analysis in bioinformatics Milano-Bicocca, 27 Nov
More informationIsoform discovery and quantification from RNA-Seq data
Isoform discovery and quantification from RNA-Seq data C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Deloger November 2016 C. Toffano-Nioche, T. Dayris, Y. Boursin, M. Isoform Deloger discovery and quantification
More information1/22/13. Example: CpG Island. Question 2: Finding CpG Islands
I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationIntroduction to de novo RNA-seq assembly
Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies
More informationAnalysis and visualization of protein-protein interactions. Olga Vitek Assistant Professor Statistics and Computer Science
1 Analysis and visualization of protein-protein interactions Olga Vitek Assistant Professor Statistics and Computer Science 2 Outline 1. Protein-protein interactions 2. Using graph structures to study
More informationTheoretical Computer Science
Theoretical Computer Science 410 (2009) 2759 2766 Contents lists available at ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Note Computing the longest topological
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationHigh-throughput sequencing: Alignment and related topic
High-throughput sequencing: Alignment and related topic Simon Anders EMBL Heidelberg HTS Platforms E s ta b lis h e d p la tfo rm s Illu m in a H is e q, A B I S O L id, R o c h e 4 5 4 N e w c o m e rs
More informationAn External-Memory Algorithm for String Graph Construction
An External-Memory Algorithm for String Graph Construction Paola Bonizzoni Gianluca Della Vedova Yuri Pirola Marco Previtali Raffaella Rizzi DISCo, Univ. Milano-Bicocca, Milan, Italy arxiv:1405.7520v2
More informationExhaustive search. CS 466 Saurabh Sinha
Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction
More informationChapter 15 Active Reading Guide Regulation of Gene Expression
Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,
More informationTranslation Part 2 of Protein Synthesis
Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More information1 Computational Problems
Stanford University CS254: Computational Complexity Handout 2 Luca Trevisan March 31, 2010 Last revised 4/29/2010 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation
More informationWeighted Acyclic Di-Graph Partitioning by Balanced Disjoint Paths
Weighted Acyclic Di-Graph Partitioning by Balanced Disjoint Paths H. Murat AFSAR Olivier BRIANT Murat.Afsar@g-scop.inpg.fr Olivier.Briant@g-scop.inpg.fr G-SCOP Laboratory Grenoble Institute of Technology
More informationSearching Sear ( Sub- (Sub )Strings Ulf Leser
Searching (Sub-)Strings Ulf Leser This Lecture Exact substring search Naïve Boyer-Moore Searching with profiles Sequence profiles Ungapped approximate search Statistical evaluation of search results Ulf
More informationAnalytic Pattern Matching: From DNA to Twitter. AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico
Analytic Pattern Matching: From DNA to Twitter Wojciech Szpankowski Purdue University W. Lafayette, IN 47907 June 19, 2016 AxA Workshop, Venice, 2016 Dedicated to Alberto Apostolico Joint work with Philippe
More informationComputational Structural Bioinformatics
Computational Structural Bioinformatics ECS129 Instructor: Patrice Koehl http://koehllab.genomecenter.ucdavis.edu/teaching/ecs129 koehl@cs.ucdavis.edu Learning curve Math / CS Biology/ Chemistry Pre-requisite
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationNotes for Lecture Notes 2
Stanford University CS254: Computational Complexity Notes 2 Luca Trevisan January 11, 2012 Notes for Lecture Notes 2 In this lecture we define NP, we state the P versus NP problem, we prove that its formulation
More informationGoing Beyond SNPs with Next Genera5on Sequencing Technology Personalized Medicine: Understanding Your Own Genome Fall 2014
Going Beyond SNPs with Next Genera5on Sequencing Technology 02-223 Personalized Medicine: Understanding Your Own Genome Fall 2014 Next Genera5on Sequencing Technology (NGS) NGS technology Discover more
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationRNA Processing: Eukaryotic mrnas
RNA Processing: Eukaryotic mrnas Eukaryotic mrnas have three main parts (Figure 13.8): 5! untranslated region (5! UTR), varies in length. The coding sequence specifies the amino acid sequence of the protein
More informationMathangi Thiagarajan Rice Genome Annotation Workshop May 23rd, 2007
-2 Transcript Alignment Assembly and Automated Gene Structure Improvements Using PASA-2 Mathangi Thiagarajan mathangi@jcvi.org Rice Genome Annotation Workshop May 23rd, 2007 About PASA PASA is an open
More informationHidden Markov Models 1
Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one
More informationEarly History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0
More informationThe Saguaro Genome. Toward the Ecological Genomics of a Sonoran Desert Icon. Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS
The Saguaro Genome Toward the Ecological Genomics of a Sonoran Desert Icon Dr. Dario Copetti June 30, 2015 STEMAZing workshop TCSS Why study a genome? - the genome contains the genetic information of an
More informationLecture 5: September Time Complexity Analysis of Local Alignment
CSCI1810: Computational Molecular Biology Fall 2017 Lecture 5: September 21 Lecturer: Sorin Istrail Scribe: Cyrus Cousins Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes
More informationHierarchical Overlap Graph
Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,
More informationIntroduction to Molecular and Cell Biology
Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationSA-REPC - Sequence Alignment with a Regular Expression Path Constraint
SA-REPC - Sequence Alignment with a Regular Expression Path Constraint Nimrod Milo Tamar Pinhas Michal Ziv-Ukelson Ben-Gurion University of the Negev, Be er Sheva, Israel Graduate Seminar, BGU 2010 Milo,
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationA New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings
A New Lightweight Algorithm to compute the BWT and the LCP array of a Set of Strings arxiv:1607.08342v1 [cs.ds] 28 Jul 2016 Paola Bonizzoni Gianluca Della Vedova Serena Nicosia Marco Previtali Raffaella
More information2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology
2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationOn the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem
On the Fixed Parameter Tractability and Approximability of the Minimum Error Correction problem Paola Bonizzoni, Riccardo Dondi, Gunnar W. Klau, Yuri Pirola, Nadia Pisanti and Simone Zaccaria DISCo, computer
More informationCS Lecture 3. More Bayesian Networks
CS 6347 Lecture 3 More Bayesian Networks Recap Last time: Complexity challenges Representing distributions Computing probabilities/doing inference Introduction to Bayesian networks Today: D-separation,
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More informationA. Soldier and Traveling
A. Soldier and Traveling time limit per test: 1 second memory limit per test: 256 megabytes : standard : standard In the country there are n cities and m bidirectional roads between them. Each city has
More informationHidden Markov Methods. Algorithms and Implementation
Hidden Markov Methods. Algorithms and Implementation Final Project Report. MATH 127. Nasser M. Abbasi Course taken during Fall 2002 page compiled on July 2, 2015 at 12:08am Contents 1 Example HMM 5 2 Forward
More informationTRANSCRIPTION VS TRANSLATION FILE
23 April, 2018 TRANSCRIPTION VS TRANSLATION FILE Document Filetype: PDF 352.85 KB 0 TRANSCRIPTION VS TRANSLATION FILE Get an answer for 'Compare and contrast transcription and translation in Prokaryotes
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationDna Study Guide Biology READ ONLINE
Dna Study Guide Biology READ ONLINE IB Biology Notes - 3.3 DNA structure - IB Biology notes on 3.3 DNA structure. Tweet. IB Guides why fail? Home; Blog; Chat; Submit Content; Languages A1. English A1;
More informationBias in RNA sequencing and what to do about it
Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA
More informationAnnotation of Plant Genomes using RNA-seq. Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA)
Annotation of Plant Genomes using RNA-seq Matteo Pellegrini (UCLA) In collaboration with Sabeeha Merchant (UCLA) inuscu1-35bp 5 _ 0 _ 5 _ What is Annotation inuscu2-75bp luscu1-75bp 0 _ 5 _ Reconstruction
More informationA Simple Protein Synthesis Model
A Simple Protein Synthesis Model James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University September 3, 213 Outline A Simple Protein Synthesis Model
More informationTravelling Salesman Problem
Travelling Salesman Problem Fabio Furini November 10th, 2014 Travelling Salesman Problem 1 Outline 1 Traveling Salesman Problem Separation Travelling Salesman Problem 2 (Asymmetric) Traveling Salesman
More informationGene Ontology. Shifra Ben-Dor. Weizmann Institute of Science
Gene Ontology Shifra Ben-Dor Weizmann Institute of Science Outline of Session What is GO (Gene Ontology)? What tools do we use to work with it? Combination of GO with other analyses What is Ontology? 1700s
More informationMore Dynamic Programming
CS 374: Algorithms & Models of Computation, Spring 2017 More Dynamic Programming Lecture 14 March 9, 2017 Chandra Chekuri (UIUC) CS374 1 Spring 2017 1 / 42 What is the running time of the following? Consider
More informationLecture 2: Pairwise Alignment. CG Ron Shamir
Lecture 2: Pairwise Alignment 1 Main source 2 Why compare sequences? Human hexosaminidase A vs Mouse hexosaminidase A 3 www.mathworks.com/.../jan04/bio_genome.html Sequence Alignment עימוד רצפים The problem:
More informationSimulation of Gene Regulatory Networks
Simulation of Gene Regulatory Networks Overview I have been assisting Professor Jacques Cohen at Brandeis University to explore and compare the the many available representations and interpretations of
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationEfficient Haplotype Inference with Boolean Satisfiability
Efficient Haplotype Inference with Boolean Satisfiability Joao Marques-Silva 1 and Ines Lynce 2 1 School of Electronics and Computer Science University of Southampton 2 INESC-ID/IST Technical University
More informationNP-Completeness. ch34 Hewett. Problem. Tractable Intractable Non-computable computationally infeasible super poly-time alg. sol. E.g.
NP-Completeness ch34 Hewett Problem Tractable Intractable Non-computable computationally infeasible super poly-time alg. sol. E.g., O(2 n ) computationally feasible poly-time alg. sol. E.g., O(n k ) No
More informationConstraint-based Subspace Clustering
Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions
More informationMore Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 More Dynamic Programming Lecture 14 Tuesday, October 17, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 48 What is the running time of the following?
More informationBig Idea 1: Does the process of evolution drive the diversity and unit of life?
AP Biology Syllabus 2016-2017 Course Overview: AP Biology is equivalent to an introductory college level biology program in order to develop student led inquiry into science. The class is designed to go
More informationBehavioral Science, Math, Science, and Physical Education Fall COURSE OUTLINE Critical Concepts in Biology
Butler County Community College Susan Forrest/ William Langley Behavioral Science, Math, Science, and Physical Education Fall 2003 COURSE OUTLINE Critical Concepts in Biology Course Description: BI 106.
More informationRepeat resolution. This exposition is based on the following sources, which are all recommended reading:
Repeat resolution This exposition is based on the following sources, which are all recommended reading: 1. Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions,
More informationLecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins
Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationTheoretical aspects of ERa, the fastest practical suffix tree construction algorithm
Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem
More informationThe official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook
Stony Brook University The official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook University. Alll Rigghht tss
More informationBayesian Clustering of Multi-Omics
Bayesian Clustering of Multi-Omics for Cardiovascular Diseases Nils Strelow 22./23.01.2019 Final Presentation Trends in Bioinformatics WS18/19 Recap Intermediate presentation Precision Medicine Multi-Omics
More informationINTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA
INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationCSE 549: Computational Biology. Computer Science for Biologists Biology
CSE 549: Computational Biology Computer Science for Biologists Biology What is Computer Science? http://people.cs.pitt.edu/~kirk/cs2110/computer_science_major.png What is Computer Science? Not actually
More informationDOWNLOAD OR READ : TEXTBOOK OF STRUCTURAL BIOLOGY SERIES IN STRUCTURAL BIOLOGY PDF EBOOK EPUB MOBI
DOWNLOAD OR READ : TEXTBOOK OF STRUCTURAL BIOLOGY SERIES IN STRUCTURAL BIOLOGY PDF EBOOK EPUB MOBI Page 1 Page 2 textbook of structural biology series in structural biology textbook of structural biology
More informationGrundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson
Grundlagen der Bioinformatik, SS 10, D. Huson, April 12, 2010 1 1 Introduction Grundlagen der Bioinformatik Summer semester 2010 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a)
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationHidden Markov Models. Three classic HMM problems
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems
More informationSupplemental Information
Molecular Cell, Volume 52 Supplemental Information The Translational Landscape of the Mammalian Cell Cycle Craig R. Stumpf, Melissa V. Moreno, Adam B. Olshen, Barry S. Taylor, and Davide Ruggero Supplemental
More informationUNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11
UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11 REVIEW: Signals that Start and Stop Transcription and Translation BUT, HOW DO CELLS CONTROL WHICH GENES ARE EXPRESSED AND WHEN? First of
More informationOrganization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationAnalysis and Design of Algorithms Dynamic Programming
Analysis and Design of Algorithms Dynamic Programming Lecture Notes by Dr. Wang, Rui Fall 2008 Department of Computer Science Ocean University of China November 6, 2009 Introduction 2 Introduction..................................................................
More information