Computational Theory MAT542 (Computational Methods in Genomics) Benjamin King Mount Desert Island Biological Laboratory

Similar documents
STRUCTURAL BIOINFORMATICS I. Fall 2015

Dynamical Modeling in Biology: a semiotic perspective. Junior Barrera BIOINFO-USP

Networks & pathways. Hedi Peterson MTAT Bioinformatics

Protein Structure and Visualisation. Introduction to PDB and PyMOL

From neuronal oscillations to complexity

Introduction to Systems Biology

Network motifs in the transcriptional regulation network (of Escherichia coli):

Models of Computation

Big-O Notation and Complexity Analysis

Data Structures in Java

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

An introduction to SYSTEMS BIOLOGY

Lecture 4: Finite Automata

Sugars, such as glucose or fructose are the basic building blocks of more complex carbohydrates. Which of the following

Introduction to Molecular Evolution (BIOL 3046) Course website:

Short Course Mathematical Molecular Biology Bob Eisenberg Shanghai Jiao Tong University Sponsor Zhenli Xu

RNA and Protein Structure Prediction

Lecture Notes for Fall Network Modeling. Ernest Fraenkel

86 Part 4 SUMMARY INTRODUCTION

Computational Complexity and Intractability: An Introduction to the Theory of NP. Chapter 9

Unit 1A: Computational Complexity

Tools and Algorithms in Bioinformatics

Analysis of Algorithms. Unit 5 - Intractable Problems

Introduction to Bioinformatics

CHAPTER 3 FUNDAMENTALS OF COMPUTATIONAL COMPLEXITY. E. Amaldi Foundations of Operations Research Politecnico di Milano 1

Data Structures in Java. Session 22 Instructor: Bert Huang

BME 5742 Biosystems Modeling and Control

Principles of Computing, Carnegie Mellon University. The Limits of Computing

The Beauty and Joy of Computing

#33 - Genomics 11/09/07

Spring Lecture 21 NP-Complete Problems

Cross Discipline Analysis made possible with Data Pipelining. J.R. Tozer SciTegic

Introduction to Complexity Theory

Math 345 Intro to Math Biology Lecture 20: Mathematical model of Neuron conduction

Hands-on Course in Computational Structural Biology and Molecular Simulation BIOP590C/MCB590C. Course Details

Algorithm Design and Analysis

Measuring Goodness of an Algorithm. Asymptotic Analysis of Algorithms. Measuring Efficiency of an Algorithm. Algorithm and Data Structure

CS/COE

P vs NP & Computational Complexity

Computational Structural Bioinformatics

1 R.V k V k 1 / I.k/ here; we ll stimulate the action potential another way.) Note that this further simplifies to. m 3 k h k.

A Database of human biological pathways

Lecture 3: Finite Automata

Computational Biology: Basics & Interesting Problems

Computational Molecular Biology (

Computer Sciences Department

CSCE555 Bioinformatics. Protein Function Annotation

Chapter 9: The Perceptron

Computational Methods for Nonlinear Systems

In complexity theory, algorithms and problems are classified by the growth order of computation time as a function of instance size.

Electrophysiology of the neuron

CMSC 441: Algorithms. NP Completeness

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Limits and Future of Computing Where do we go from here?

Voltage-clamp and Hodgkin-Huxley models

Modeling Biological Systems Opportunities for Computer Scientists

Hidden Markov Models

ICS141: Discrete Mathematics for Computer Science I

Data Structures and Algorithms

The Role of Network Science in Biology and Medicine. Tiffany J. Callahan Computational Bioscience Program Hunter/Kahn Labs

Complexity Theory Part II

Fall 2008 CSE Qualifying Exam. September 13, 2008

What is Systems Biology

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Unravelling the biochemical reaction kinetics from time-series data

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Running Time Evaluation

CSCI3390-Lecture 18: Why is the P =?NP Problem Such a Big Deal?

Gene Network Science Diagrammatic Cell Language and Visual Cell

NP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness

Computational Biology Course Descriptions 12-14

Homework 9: Protein Folding & Simulated Annealing : Programming for Scientists Due: Thursday, April 14, 2016 at 11:59 PM

Protein Structure Prediction

Computational methods for predicting protein-protein interactions

Introduction to Computer Science Lecture 5: Algorithms

Chapter 2. Reductions and NP. 2.1 Reductions Continued The Satisfiability Problem (SAT) SAT 3SAT. CS 573: Algorithms, Fall 2013 August 29, 2013

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

Lecture 3: Finite Automata

Kd = koff/kon = [R][L]/[RL]

Algorithms and Complexity theory

Nondeterministic Polynomial Time

Computational Biology From The Perspective Of A Physical Scientist

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

STUDENT PAPER. Santiago Santana University of Illinois, Urbana-Champaign Blue Waters Education Program 736 S. Lombard Oak Park IL, 60304

Friday Four Square! Today at 4:15PM, Outside Gates

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Syllabus BINF Computational Biology Core Course

Structure-based maximal affinity model predicts small-molecule druggability

What Can Physics Say About Life Itself?

NP-Completeness Part II

Updated: 10/11/2018 Page 1 of 5

Input Decidable Language -- Program Halts on all Input Encoding of Input -- Natural Numbers Encoded in Binary or Decimal, Not Unary

Modelling biological oscillations

Practice Midterm Exam

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Graph Alignment and Biological Networks

Category Question Data

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

3/14/2016. The Big ORF Theory: The Intensive Enumeration & The Triplet Approximation. The Central Dogma: DNA RNA Protein

Transcription:

Computational Theory MAT542 (Computational Methods in Genomics) Benjamin King Mount Desert Island Biological Laboratory bking@mdibl.org

The $1000 Genome Is Revolutionizing How We Study Biology and Practice Medicine $2.7B Over Decade $1000 in ~3 Days

$1000 Human Genome Resequencing Massively Parallel Sequencing Reference Human Assembly ~30x coverage Every base covered by approximately least 30 reads Reference sequence Your genome for $9,500 Illumina s Personal Genome Service (http://www.everygenome.com)

James Watson 1962 Nobel Prize Craig Venter

Biology Increasingly Data Intensive Need to leverage existing knowledge to advance your own research Existing Knowledge

Analysis Techniques Becoming More Transparent

Analysis of Genomic Data Not Only Critical, But Fun Wolbachia endosymbionts found in fly embryos 95% (1,440,650 base pairs) for one of the new species

Bioinformatics - the creation and advancement of algorithms, computational and statistical techniques, and theory to solve formal and practical problems posed by or inspired from the management and analysis of biological data. Computational Biology - hypothesis-driven investigation of a specific biological problem using computers, carried out with experimental and simulated data, with the primary goal of discovery and the advancement of biological knowledge. www.wikipedia.org

Bioinformatics Example: Comparative Toxicogenomics Database

Comparative Toxicogenomics Database

... that transforms into a discovery tool Curated Facts Inferred Discoveries C G G D Integrate C G G D G C C D C G D A A G G C A G G D Attributes GO, KEGG, Reactome, etc. Allan Peter Davis

Autistic Disorder: genes 1. direct 2. inferred via network 3. direct and inferred G M autism G C G M autism (240 direct genes) C autism C inferred Allan Peter Davis

Autistic Disorder: chemicals 2. inferred C-D, via network diazinon 27 genes Inference Score computed using local network topology statistics; ranks C-D inferences for atypical connectivity (higher score = better) 27 genes autism inferred Allan Peter Davis

BioGRID Gene and Protein Interactions at CTD Inferred C-D, via network Bisphenol A 220 genes 220 genes breast neoplasms inferred

Theory of Computation

Theory of Computation ENIAC Unveiled on Feb. 14, 1946

Computation Many applications including: Statistics Bioinformatics and Computational Biology Computation Often Involves The Use of Models Develop a model that describes a system - Helps you understand the system - Use it to predict responses of the system INPUT Biological System OUTPUT INPUT Model OUTPUT

Model of Electric Current Flow Across the Cell Membrane of the Giant Nerve Fiber of a Squid A.L. Hodgkin and A.F. Huxley, 1952 I = C M dv M dt + i Na + i K + i L

Hille, B. Ionic Basis of Resting and Action Potentials. Compr Physiol 2011, Supplement 1: Handbook of Physiology, The Nervous System, Cellular Biology of Neurons: 99-136.

Genetic regulatory network controlling the development of the body plan of the sea urchin embryo Davidson et al., Science, 295(5560):1669-1678.

Modeling Protein Structure Mouse MHC class I Protein H-2K(d) with Bound Peptide from Influenza A/PR/8/34 Nucleoprotein Gascoigne NR. Do T cells need endogenous peptides for activation? Nat Rev Immunol. 2008 Nov;8(11):895-900. Mitaksov V, Fremont DH. Structural definition of the H-2Kd peptide-binding motif. J Biol Chem. 2006 Apr 14;281(15):10618-25.

Molecular Dynamics Proteins constantly moving: Local Motions (0.01 to 5 Å, 10-15 to 10-1 s) Atomic fluctuations Sidechain motions Loop motions Rigid Body Motions (1 to 10 Å, 10-9 to 1s) Helix motions Domain motions (hinge bending) Subunit motions Large-Scale Motions (> 5 Å, 10-7 to 10 4 s) Helix coil transitions Dissociation / Association Folding and unfolding Model System Containing Protein: - Energy within system - E = kt - Each atom has: - Position in system (x, y, z) - Velocity - Set of constraints (e.g., bonds)

Influenza Neuraminidase N1 Complexed with Tamiflu (Oseltamivir) Inhibitor https://www.youtube.com/watch?v=vj8ri57ge_m

Algorithms Precise method usable by a computer for the solution of a problem Composed of a finite set of steps, each of which may require one or more operations Each operation must be definite Each operation must be effective (can be completed) Operations must terminate after a finite number of operations Have zero or more inputs Have one or more outputs

Types of Algorithms Deterministic algorithms Result of each operation clearly defined Nondeterministic algorithms Contains operations whose outcome are not uniquely defined Limited to a specific set of possibilities

Algorithms are Written in a Programming Language Languages are defined so that each legitimate sentence has a unique meaning A program is the expression of an algorithm in a programming language Scripting Languages Program Program C/C++ Java Compiler Perl Python Compiled Code (executable in operating system) Use Interpreter to execute commands Processor Processor

Algorithms Research How to devise algorithms e.g., Dynamic programming, How to validate algorithms Proof that algorithm works How to analyze algorithms (Computational Complexity Theory) How to express algorithms e.g., Object-oriented programming, procedural programming How to test a program Debugging test for faults Profiling measuring time and space it takes to run

Computational Complexity Theory How much computing time and space (memory) an algorithm will require? Used to make qualitative judgments about the value of one algorithm over another O notation Represent computing time f(n) = O(g(n)) Number of inputs or outputs

Big O Notation Represents length of computing time sorting two nested loops (e.g., all versus all search)

Complexity Classes of Computational Problems P = deterministic in polynomial time Size of input determines length of computation by a polynomial NP = non-deterministic in polynomial time Traveling salesman problem Shortest route to visit all points n! permutations NP-hard -> not solvable by an algorithm that is guaranteed to: run in polynomial time and always produces an optimal solution e.g., Find shortest route for salesman Reason that Heuristic Algorithms are used NP-complete -> Special case of NP-hard problems that are also NP e.g., Given costs and a budget, is there a route that costs less than budget?

http://www.nr.com Reference for ways to implement many algorithms Example topics: Fitting data to a straight line Markov Chain Monte Carlo Fast Fourier Transform

Overview of 4 Lectures Introduction to Computation and Programming Mon. Sept. 14 Programming (Text File Processing) Wed. Sept 16 & Mon. Sept 21 Genome Sequencing and Informatics Wed. Sept 23 Homework Due on Oct. 7 th (Wed) by 2pm

Perl http://www.cpan.org Comprehensive Perl Archive Network http://www.activestate.com Active Perl (for PC, Mac OS X, Linux)

Programming Concepts Variables Used to store: character string integer real number Boolean value (True or False) $a = Go Bears ; $b = 25; $c = 3.1415; $d = 0; Data Structures Store collections of data in an organized fashion Common Operations Mathematical operations Testing for specific values (if / then loop) Iteration (for, while loops) Translation operations Printing messages Reading in files Writing output

#!/usr/bin/perl # Header # Example script # Variable declarations $a = Go ; $b = Black ; $c = Bears ; go_bears.pl # Main print $a, $b, $c; perl go_bears.pl Go Black Bears

Anatomy of a Program #!/bin/perl # Header # Variable declarations # Input and Output file handling # Main

Variables $a = TAATAA ; print $a; $n = 25; $m = 100; $sum = $n + $m; Scalar Types: character string integer real number Boolean value (True or False)