List of Code Challenges. Meet the Authors Meet the Development Team... xxxii Meet our Adopting Institutions... xxxiv Acknowledgments...
|
|
- Beverley Perry
- 5 years ago
- Views:
Transcription
1 Contents List of Code Challenges xxv Meet the Authors xxxi Meet the Development Team xxxii Meet our Adopting Institutions xxxiv Acknowledgments xxxv 1 Where in the Genome Does DNA Replication Begin? 2 A Journey of a Thousand Miles Hidden Messages in the Replication Origin... 5 DnaA boxes... 5 Hidden messages in The Gold-Bug... 6 Counting words... 7 The Frequent Words Problem... 8 Frequent words in Vibrio cholerae Some Hidden Messages are More Surprising than Others An Explosion of Hidden Messages Looking for hidden messages in multiple genomes The Clump Finding Problem The Simplest Way to Replicate DNA Asymmetry of Replication Peculiar Statistics of the Forward and Reverse Half-Strands Lurking biological phenomenon or statistical fluke? Deamination xi
2 The skew diagram Some Hidden Messages are More Elusive than Others A Final Attempt at Finding DnaA Boxes in E. coli Epilogue: Complications in ori Predictions Open Problems Multiple replication origins in a bacterial genome Finding replication origins in archaea Finding replication origins in yeast Computing probabilities of patterns in a string Charging Stations The frequency array Converting patterns to numbers and vice-versa Finding frequent words by sorting Solving the Clump Finding Problem Solving the Frequent Words with Mismatches Problem Generating the neighborhood of a string Finding frequent words with mismatches by sorting Detours Big-O notation Probabilities of patterns in a string The most beautiful experiment in biology Directionality of DNA strands The Towers of Hanoi The overlapping words paradox Bibliography Notes Which DNA Patterns Play the Role of Molecular Clocks? 66 Do We Have a Clock Gene? Motif Finding Is More Difficult Than You Think Identifying the evening element Hide and seek with motifs A brute force algorithm for motif finding Scoring Motifs From motifs to profile matrices and consensus strings Towards a more adequate motif scoring function Entropy and the motif logo From Motif Finding to Finding a Median String xii
3 The Motif Finding Problem Reformulating the Motif Finding Problem The Median String Problem Why have we reformulated the Motif Finding Problem? Greedy Motif Search Using the profile matrix to roll dice Analyzing greedy motif finding Motif Finding Meets Oliver Cromwell What is the probability that the sun will not rise tomorrow? Laplace s Rule of Succession An improved greedy motif search Randomized Motif Search Rolling dice to find motifs Why randomized motif search works How Can a Randomized Algorithm Perform So Well? Gibbs Sampling Gibbs Sampling in Action Epilogue: How Does Tuberculosis Hibernate to Hide from Antibiotics? Charging Stations Solving the Median String Problem Detours Gene expression DNA arrays Buffon s needle Complications in motif finding Relative entropy Bibliography Notes How Do We Assemble Genomes? 115 Exploding Newspapers The String Reconstruction Problem Genome assembly is more difficult than you think Reconstructing strings from k-mers Repeats complicate genome assembly String Reconstruction as a Walk in the Overlap Graph From a string to a graph The genome vanishes xiii
4 Two graph representations Hamiltonian paths and universal strings Another Graph for String Reconstruction Gluing nodes and de Bruijn graphs Walking in the de Bruijn Graph Eulerian paths Another way to construct de Bruijn graphs Constructing de Bruijn graphs from k-mer composition De Bruijn graphs versus overlap graphs The Seven Bridges of Königsberg Euler s Theorem From Euler s Theorem to an Algorithm for Finding Eulerian Cycles Constructing Eulerian cycles From Eulerian cycles to Eulerian paths Constructing universal strings Assembling Genomes from Read-Pairs From reads to read-pairs Transforming read-pairs into long virtual reads From composition to paired composition Paired de Bruijn graphs A pitfall of paired de Bruijn graphs Epilogue: Genome Assembly Faces Real Sequencing Data Breaking reads into k-mers Splitting the genome into contigs Assembling error-prone reads Inferring multiplicities of edges in de Bruijn graphs Charging Stations The effect of gluing on the adjacency matrix Generating all Eulerian cycles Reconstructing a string spelled by a path in the paired de Bruijn graph. 167 Maximal non-branching paths in a graph Detours A short history of DNA sequencing technologies Repeats in the human genome Graphs The icosian game Tractable and intractable problems xiv
5 From Euler to Hamilton to de Bruijn The seven bridges of Kaliningrad Pitfalls of assembling double-stranded DNA The BEST Theorem Bibliography Notes How Do We Sequence Antibiotics? 184 The Discovery of Antibiotics How Do Bacteria Make Antibiotics? How peptides are encoded by the genome Where is Tyrocidine encoded in the Bacillus brevis genome? From linear to cyclic peptides Dodging the Central Dogma of Molecular Biology Sequencing Antibiotics by Shattering Them into Pieces Introduction to mass spectrometry The Cyclopeptide Sequencing Problem A Brute Force Algorithm for Cyclopeptide Sequencing A Branch-and-Bound Algorithm for Cyclopeptide Sequencing Mass Spectrometry Meets Golf From theoretical to real spectra Adapting cyclopeptide sequencing for spectra with errors From 20 to More than 100 Amino Acids The Spectral Convolution Saves the Day Epilogue: From Simulated to Real Spectra Open Problems The Beltway and Turnpike Problems Sequencing cyclic peptides in primates Charging Stations Generating the theoretical spectrum of a peptide How fast is CYCLOPEPTIDESEQUENCING? Trimming the peptide leaderboard Detours Gause and Lysenkoism Discovery of codons Quorum sensing Molecular mass Selenocysteine and pyrrolysine xv
6 Pseudo-polynomial algorithm for the Turnpike Problem Split genes Bibliography Notes How Do We Compare Biological Sequences? 224 Cracking the Non-Ribosomal Code The RNA Tie Club From protein comparison to the non-ribosomal code What do oncogenes and growth factors have in common? Introduction to Sequence Alignment Sequence alignment as a game Sequence alignment and the longest common subsequence The Manhattan Tourist Problem What is the best sightseeing strategy? Sightseeing in an arbitrary directed graph Sequence Alignment is the Manhattan Tourist Problem in Disguise An Introduction to Dynamic Programming: The Change Problem Changing money greedily Changing money recursively Changing money using dynamic programming The Manhattan Tourist Problem Revisited From Manhattan to an Arbitrary Directed Acyclic Graph Sequence alignment as building a Manhattan-like graph Dynamic programming in an arbitrary DAG Topological orderings Backtracking in the Alignment Graph Scoring Alignments What is wrong with the LCS scoring model? Scoring matrices From Global to Local Alignment Global alignment Limitations of global alignment Free taxi rides in the alignment graph The Changing Faces of Sequence Alignment Edit distance Fitting alignment Overlap alignment xvi
7 Penalizing Insertions and Deletions in Sequence Alignment Affine gap penalties Building Manhattan on three levels Space-Efficient Sequence Alignment Computing alignment score using linear memory The Middle Node Problem A surprisingly fast and memory-efficient alignment algorithm The Middle Edge Problem Epilogue: Multiple Sequence Alignment Building a three-dimensional Manhattan A greedy multiple alignment algorithm Detours Fireflies and the non-ribosomal code Finding a longest common subsequence without building a city Constructing a topological ordering PAM scoring matrices Divide-and-conquer algorithms Scoring multiple alignments Bibliography Notes Are There Fragile Regions in the Human Genome? 296 Of Mice and Men How different are the human and mouse genomes? Synteny blocks Reversals Rearrangement hotspots The Random Breakage Model of Chromosome Evolution Sorting by Reversals A Greedy Heuristic for Sorting by Reversals Breakpoints What are breakpoints? Counting breakpoints Sorting by reversals as breakpoint elimination Rearrangements in Tumor Genomes From Unichromosomal to Multichromosomal Genomes Translocations, fusions, and fissions From a genome to a graph xvii
8 2-breaks Breakpoint Graphs Computing the 2-Break Distance Rearrangement Hotspots in the Human Genome The Random Breakage Model meets the 2-Break Distance Theorem The Fragile Breakage Model Epilogue: Synteny Block Construction Genomic dot-plots Finding shared k-mers Constructing synteny blocks from shared k-mers Synteny blocks as connected components in graphs Open Problem: Can Rearrangements Shed Light on Bacterial Evolution? Charging Stations From genomes to the breakpoint graph Solving the 2-Break Sorting Problem Detours Why is the gene content of mammalian X chromosomes so conserved?. 346 Discovery of genome rearrangements The exponential distribution Bill Gates and David X. Cohen flip pancakes Sorting linear permutations by reversals Bibliography Notes Which Animal Gave Us SARS? 352 The Fastest Outbreak Trouble at the Metropole Hotel The evolution of SARS Transforming Distance Matrices into Evolutionary Trees Constructing a distance matrix from coronavirus genomes Evolutionary trees as graphs Distance-based phylogeny construction Toward An Algorithm for Distance-Based Phylogeny Construction A quest for neighboring leaves Computing limb lengths Additive Phylogeny Trimming the tree Attaching a limb xviii
9 An algorithm for distance-based phylogeny construction Constructing an evolutionary tree of coronaviruses Using Least Squares to Construct Approximate Distance-Based Phylogenies. 372 Ultrametric Evolutionary Trees The Neighbor-Joining Algorithm Transforming a distance matrix into a neighbor-joining matrix Analyzing coronaviruses with the neighbor-joining algorithm Limitations of distance-based approaches to tree construction Character-Based Tree Reconstruction Character tables From anatomical to genetic characters How many times has evolution invented insect wings? The Small Parsimony Problem The Large Parsimony Problem Epilogue: Evolutionary Trees Fight Crime Detours When did HIV jump from primates to humans? Searching for a tree fitting a distance matrix The four point condition Did bats give us SARS? Why does the neighbor-joining algorithm find neighboring leaves? Computing limb lengths in the neighbor-joining algorithm Giant panda: bear or raccoon? Where did humans come from? Bibliography Notes How Did Yeast Become a Wine Maker? 416 An Evolutionary History of Wine Making How long have we been addicted to alcohol? The diauxic shift Identifying Genes Responsible for the Diauxic Shift Two evolutionary hypotheses with different fates Which yeast genes drive the diauxic shift? Introduction to Clustering Gene expression analysis Clustering yeast genes The Good Clustering Principle xix
10 Clustering as an Optimization Problem Farthest First Traversal k-means Clustering Squared error distortion k-means clustering and the center of gravity The Lloyd Algorithm From centers to clusters and back again Initializing the Lloyd algorithm k-means++ Initializer Clustering Genes Implicated in the Diauxic Shift Limitations of k-means Clustering From Coin Flipping to k-means Clustering Flipping coins with unknown biases Where is the computational problem? From coin flipping to the Lloyd algorithm Return to clustering Making Soft Decisions in Coin Flipping Expectation maximization: the E-step Expectation maximization: the M-step The expectation maximization algorithm Soft k-means Clustering Applying expectation maximization to clustering Centers to soft clusters Soft clusters to centers Hierarchical Clustering Introduction to distance-based clustering Inferring clusters from a tree Analyzing the diauxic shift with hierarchical clustering Epilogue: Clustering Tumor Samples Detours Whole genome duplication or a series of duplications? Measuring gene expression Microarrays Proof of the Center of Gravity Theorem Transforming an expression matrix into a distance/similarity matrix Clustering and corrupted cliques Bibliography Notes xx
11 9 How Do We Locate Disease-Causing Mutations? 468 What Causes Ohdo Syndrome? Introduction to Multiple Pattern Matching Herding Patterns into a Trie Constructing a trie Applying the trie to multiple pattern matching Preprocessing the Genome Instead Introduction to suffix tries Using suffix tries for pattern matching Suffix Trees Suffix Arrays Constructing a suffix array Pattern matching with the suffix array The Burrows-Wheeler Transform Genome compression Constructing the Burrows-Wheeler transform From repeats to runs Inverting the Burrows-Wheeler Transform A first attempt at inverting the Burrows-Wheeler transform The First-Last Property Using the First-Last property to invert the Burrows-Wheeler transform. 493 Pattern Matching with the Burrows-Wheeler Transform A first attempt at Burrows-Wheeler pattern matching Moving backward through a pattern The Last-to-First mapping Speeding Up Burrows-Wheeler Pattern Matching Substituting the Last-to-First mapping with count arrays Getting rid of the first column of the Burrows-Wheeler matrix Where are the Matched Patterns? Burrows and Wheeler Set Up Checkpoints Epilogue: Mismatch-Tolerant Read Mapping Reducing approximate pattern matching to exact pattern matching BLAST: Comparing a sequence against a database Approximate pattern matching with the Burrows-Wheeler transform Charging Stations Constructing a suffix tree Solving the Longest Shared Substring Problem xxi
12 Partial suffix array construction Detours The reference human genome Rearrangements, insertions, and deletions in human genomes The Aho-Corasick algorithm From suffix trees to suffix arrays From suffix arrays to suffix trees Binary search Bibliography Notes Why Have Biologists Still Not Developed an HIV Vaccine? 530 Classifying the HIV Phenotype How does HIV evade the human immune system? Limitations of sequence alignment Gambling with Yakuza Two Coins up the Dealer s Sleeve Finding CG-Islands Hidden Markov Models From coin flipping to a Hidden Markov Model The HMM diagram Reformulating the Casino Problem The Decoding Problem The Viterbi graph The Viterbi algorithm How fast is the Viterbi algorithm? Finding the Most Likely Outcome of an HMM Profile HMMs for Sequence Alignment How do HMMs relate to sequence alignment? Building a profile HMM Transition and emission probabilities of a profile HMM Classifying proteins with profile HMMs Aligning a protein against a profile HMM The return of pseudocounts The troublesome silent states Are profile HMMs really all that useful? Learning the Parameters of an HMM Estimating HMM parameters when the hidden path is known xxii
13 Viterbi learning Soft Decisions in Parameter Estimation The Soft Decoding Problem The forward-backward algorithm Baum-Welch Learning The Many Faces of HMMs Epilogue: Nature is a Tinkerer and not an Inventor Detours The Red Queen Effect Glycosylation DNA methylation Conditional probability Bibliography Notes Was T. rex Just a Big Chicken? 586 Paleontology Meets Computing Which Proteins Are Present in This Sample? Decoding an Ideal Spectrum From Ideal to Real Spectra Peptide Sequencing Scoring peptides against spectra Where are the suffix peptides? Peptide sequencing algorithm Peptide Identification The Peptide Identification Problem Identifying peptides in the unknown T. rex proteome Searching for peptide-spectrum matches Peptide Identification and the Infinite Monkey Theorem False discovery rate The monkey and the typewriter Statistical significance of a peptide-spectrum match Spectral Dictionaries T. rex Peptides: Contaminants or Treasure Trove of Ancient Proteins? The hemoglobin riddle The dinosaur DNA controversy Epilogue: From Unmodified to Modified Peptides Post-translational modifications xxiii
14 Searching for modifications as an alignment problem Building a Manhattan grid for spectral alignment Spectral alignment algorithm Detours Gene prediction Finding all paths in a graph The Anti-Symmetric Path Problem Transforming spectra into spectral vectors The infinite monkey theorem The probabilistic space of peptides in a spectral dictionary Are terrestrial dinosaurs really the ancestors of birds? Solving the Most Likely Peptide Vector Problem Selecting parameters for transforming spectra into spectral vectors Bibliography Notes Appendix: Introduction to Pseudocode 639 What is Pseudocode? Nuts and Bolts of Pseudocode The if condition The for loop The while loop Recursive algorithms Arrays Glossary 649 Bibliography 671 Image Courtesies 683 xxiv
List of Code Challenges. About the Textbook Meet the Authors... xix Meet the Development Team... xx Acknowledgments... xxi
Contents List of Code Challenges xvii About the Textbook xix Meet the Authors................................... xix Meet the Development Team............................ xx Acknowledgments..................................
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training
More informationHidden Markov Models 1
Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one
More informationHIDDEN MARKOV MODELS
HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationCONTENTS. P A R T I Genomes 1. P A R T II Gene Transcription and Regulation 109
CONTENTS ix Preface xv Acknowledgments xxi Editors and contributors xxiv A computational micro primer xxvi P A R T I Genomes 1 1 Identifying the genetic basis of disease 3 Vineet Bafna 2 Pattern identification
More informationHidden Markov Models
Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm
More informationHidden Markov Models. Three classic HMM problems
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems
More information11.3 Decoding Algorithm
11.3 Decoding Algorithm 393 For convenience, we have introduced π 0 and π n+1 as the fictitious initial and terminal states begin and end. This model defines the probability P(x π) for a given sequence
More informationHidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:
Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationDr. Amira A. AL-Hosary
Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationTandem Mass Spectrometry: Generating function, alignment and assembly
Tandem Mass Spectrometry: Generating function, alignment and assembly With slides from Sangtae Kim and from Jones & Pevzner 2004 Determining reliability of identifications Can we use Target/Decoy to estimate
More informationAmira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut
Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological
More informationWas T. rex Just a Big Chicken? Computational Proteomics
Was T. rex Just a Big Chicken? Computational Proteomics Phillip Compeau and Pavel Pevzner adjusted by Jovana Kovačević Bioinformatics Algorithms: an Active Learning Approach 215 by Compeau and Pevzner.
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationMarkov Chains and Hidden Markov Models. = stochastic, generative models
Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,
More informationGenome Rearrangements In Man and Mouse. Abhinav Tiwari Department of Bioengineering
Genome Rearrangements In Man and Mouse Abhinav Tiwari Department of Bioengineering Genome Rearrangement Scrambling of the order of the genome during evolution Operations on chromosomes Reversal Translocation
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationComparative genomics: Overview & Tools + MUMmer algorithm
Comparative genomics: Overview & Tools + MUMmer algorithm Urmila Kulkarni-Kale Bioinformatics Centre University of Pune, Pune 411 007. urmila@bioinfo.ernet.in Genome sequence: Fact file 1995: The first
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More information"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky
MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationEVOLUTIONARY DISTANCES
EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:
More informationC3020 Molecular Evolution. Exercises #3: Phylogenetics
C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from
More informationAN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM
AN EXACT SOLVER FOR THE DCJ MEDIAN PROBLEM MENG ZHANG College of Computer Science and Technology, Jilin University, China Email: zhangmeng@jlueducn WILLIAM ARNDT AND JIJUN TANG Dept of Computer Science
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationIntroduction to de novo RNA-seq assembly
Introduction to de novo RNA-seq assembly Introduction Ideal day for a molecular biologist Ideal Sequencer Any type of biological material Genetic material with high quality and yield Cutting-Edge Technologies
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationMETHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.
Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern
More informationCHAPTERS 24-25: Evidence for Evolution and Phylogeny
CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2015 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationBackground: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of
Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what
More informationPhylogenetic inference
Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types
More informationPhylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches
Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationPair Hidden Markov Models
Pair Hidden Markov Models Scribe: Rishi Bedi Lecturer: Serafim Batzoglou January 29, 2015 1 Recap of HMMs alphabet: Σ = {b 1,...b M } set of states: Q = {1,..., K} transition probabilities: A = [a ij ]
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationHidden Markov Models for biological sequence analysis
Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA
More informationGreedy Algorithms. CS 498 SS Saurabh Sinha
Greedy Algorithms CS 498 SS Saurabh Sinha Chapter 5.5 A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of length l. Enumerative approach O(l n
More informationnetworks in molecular biology Wolfgang Huber
networks in molecular biology Wolfgang Huber networks in molecular biology Regulatory networks: components = gene products interactions = regulation of transcription, translation, phosphorylation... Metabolic
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationPlan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping
Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationMATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME
MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:
More informationComputational Biology
Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,
More informationChapter 19: Taxonomy, Systematics, and Phylogeny
Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand
More informationI519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB
I519 Introduction to Bioinformatics, 2011 Genome Comparison Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Whole genome comparison/alignment Build better phylogenies Identify polymorphism
More informationBackground: Comment [1]: Comment [2]: Comment [3]: Comment [4]: mass spectrometry
Background: Imagine it is time for your lunch break, you take your sandwich outside and you sit down to enjoy your lunch with a beautiful view of Montana s Rocky Mountains. As you look up, you see what
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationChapter 16: Reconstructing and Using Phylogenies
Chapter Review 1. Use the phylogenetic tree shown at the right to complete the following. a. Explain how many clades are indicated: Three: (1) chimpanzee/human, (2) chimpanzee/ human/gorilla, and (3)chimpanzee/human/
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationLecture 9. Intro to Hidden Markov Models (finish up)
Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is
More informationPhylogenetics. BIOL 7711 Computational Bioscience
Consortium for Comparative Genomics! University of Colorado School of Medicine Phylogenetics BIOL 7711 Computational Bioscience Biochemistry and Molecular Genetics Computational Bioscience Program Consortium
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationBINF6201/8201. Molecular phylogenetic methods
BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics
More informationHidden Markov Models for biological sequence analysis I
Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationA PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS
A PARSIMONY APPROACH TO ANALYSIS OF HUMAN SEGMENTAL DUPLICATIONS CRYSTAL L. KAHN and BENJAMIN J. RAPHAEL Box 1910, Brown University Department of Computer Science & Center for Computational Molecular Biology
More informationAlgorithms for Bioinformatics
Adapted from slides by Alexandru Tomescu, Leena Salmela, Veli Mäkinen, Esa Pitkänen 582670 Algorithms for Bioinformatics Lecture 5: Combinatorial Algorithms and Genomic Rearrangements 1.10.2015 Background
More informationMolecular evolution - Part 1. Pawan Dhar BII
Molecular evolution - Part 1 Pawan Dhar BII Theodosius Dobzhansky Nothing in biology makes sense except in the light of evolution Age of life on earth: 3.85 billion years Formation of planet: 4.5 billion
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More information8/23/2014. Phylogeny and the Tree of Life
Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major
More informationGrundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence
rundlagen der Bioinformatik, SS 09,. Huson, June 16, 2009 81 7 Markov chains and Hidden Markov Models We will discuss: Markov chains Hidden Markov Models (HMMs) Profile HMMs his chapter is based on: nalysis,
More informationCMPSCI 311: Introduction to Algorithms Second Midterm Exam
CMPSCI 311: Introduction to Algorithms Second Midterm Exam April 11, 2018. Name: ID: Instructions: Answer the questions directly on the exam pages. Show all your work for each question. Providing more
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationA Phylogenetic Network Construction due to Constrained Recombination
A Phylogenetic Network Construction due to Constrained Recombination Mohd. Abdul Hai Zahid Research Scholar Research Supervisors: Dr. R.C. Joshi Dr. Ankush Mittal Department of Electronics and Computer
More informationIntroduction to spectral alignment
SI Appendix C. Introduction to spectral alignment Due to the complexity of the anti-symmetric spectral alignment algorithm described in Appendix A, this appendix provides an extended introduction to the
More informationExhaustive search. CS 466 Saurabh Sinha
Exhaustive search CS 466 Saurabh Sinha Agenda Two different problems Restriction mapping Motif finding Common theme: exhaustive search of solution space Reading: Chapter 4. Restriction Mapping Restriction
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationName: Class: Date: ID: A
Class: _ Date: _ Ch 17 Practice test 1. A segment of DNA that stores genetic information is called a(n) a. amino acid. b. gene. c. protein. d. intron. 2. In which of the following processes does change
More informationPhylogenetic Trees. How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species?
Why? Phylogenetic Trees How do the changes in gene sequences allow us to reconstruct the evolutionary relationships between related species? The saying Don t judge a book by its cover. could be applied
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationBIOLOGY YEAR AT A GLANCE RESOURCE ( )
BIOLOGY YEAR AT A GLANCE RESOURCE (2016-17) DATES TOPIC/BENCHMARKS QUARTER 1 LAB/ACTIVITIES 8/22 8/25/16 I. Introduction to Biology Lab 1: Seed Germination A. What is Biology B. Science in the real world
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationBIOLOGY YEAR AT A GLANCE RESOURCE ( ) REVISED FOR HURRICANE DAYS
BIOLOGY YEAR AT A GLANCE RESOURCE (2017-18) REVISED FOR HURRICANE DAYS DATES TOPIC/BENCHMARKS QUARTER 1 LAB/ACTIVITIES 8/21 8/24/17 I. Introduction to Biology A. What is Biology B. Science in the real
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationData Mining in Bioinformatics HMM
Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics
More informationUnderstanding relationship between homologous sequences
Molecular Evolution Molecular Evolution How and when were genes and proteins created? How old is a gene? How can we calculate the age of a gene? How did the gene evolve to the present form? What selective
More information