Algorithmics and Bioinformatics

Similar documents
Computational Biology: Basics & Interesting Problems

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Objective 3.01 (DNA, RNA and Protein Synthesis)

UNIT 5. Protein Synthesis 11/22/16

BME 5742 Biosystems Modeling and Control

Lesson Overview. Ribosomes and Protein Synthesis 13.2

2. What was the Avery-MacLeod-McCarty experiment and why was it significant? 3. What was the Hershey-Chase experiment and why was it significant?

GCD3033:Cell Biology. Transcription

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

Full file at CHAPTER 2 Genetics

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Short Answers Worksheet Grade 6

Multiple Choice Review- Eukaryotic Gene Expression

Introduction to molecular biology. Mitesh Shrestha

From Gene to Protein

Biology I Fall Semester Exam Review 2014

Number of questions TEK (Learning Target) Biomolecules & Enzymes

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Chapter 17. From Gene to Protein. Biology Kevin Dees

Bio 119 Bacterial Genomics 6/26/10

GENETICS UNIT VOCABULARY CHART. Word Definition Word Part Visual/Mnemonic Related Words 1. adenine Nitrogen base, pairs with thymine in DNA and uracil

Translation Part 2 of Protein Synthesis

Biology 2018 Final Review. Miller and Levine

Bioinformatics Chapter 1. Introduction

From gene to protein. Premedical biology

Molecular evolution - Part 1. Pawan Dhar BII

EVOLUTION ALGEBRA Hartl-Clark and Ayala-Kiger

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Flow of Genetic Information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

AQA Biology A-level. relationships between organisms. Notes.

Define: Alleles. Define: Chromosome. In DNA and RNA, molecules called bases pair up in certain ways.

Lesson 4: Understanding Genetics

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Biology I Level - 2nd Semester Final Review

Round 1. Mitosis & Meiosis Inheritance (10 questions)

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

What is the central dogma of biology?

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

Honors Biology Fall Final Exam Study Guide

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Principles of Genetics

LIFE SCIENCE CHAPTER 5 & 6 FLASHCARDS

Honors Biology Reading Guide Chapter 11

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

CCHS 2015_2016 Biology Fall Semester Exam Review

PROTEIN SYNTHESIS INTRO

Unit 5- Concept 1 THE DNA DISCOVERY

Genomes and Their Evolution

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3)

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Biology Semester 2 Final Review

Interphase & Cell Division

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become four, four become eight, and so on.

DNA THE CODE OF LIFE 05 JULY 2014

Regulation of Gene Expression

1. In most cases, genes code for and it is that

Sugars, such as glucose or fructose are the basic building blocks of more complex carbohydrates. Which of the following

Biology 112 Practice Midterm Questions

NOTES - Ch. 16 (part 1): DNA Discovery and Structure

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The

NATS 104 LIFE ON EARTH SPRING, 2004 FIRST 100-pt EXAM. (each question 2 points)

Cells and the Stuff They re Made of. Indiana University P575 1

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

What Organelle Makes Proteins According To The Instructions Given By Dna

Introduction to Genetics. Why do biological relatives resemble one another?

Behavioral Science, Math, Science, and Physical Education Fall COURSE OUTLINE Critical Concepts in Biology

Unit 3 - Molecular Biology & Genetics - Review Packet

MIP543 RNA Biology Fall 2015

BIOLOGY STANDARDS BASED RUBRIC

Science Department-High School

Molecular Biology - Translation of RNA to make Protein *

GREENWOOD PUBLIC SCHOOL DISTRICT Genetics Pacing Guide FIRST NINE WEEKS Semester 1

Mole_Oce Lecture # 24: Introduction to genomics

CGS 5991 (2 Credits) Bioinformatics Tools

CCHS 2016_2017 Biology Fall Semester Exam Review

Sequences, Structures, and Gene Regulatory Networks

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life?

2. What is meiosis? The process of forming gametes (sperm and egg) 4. Where does meiosis take place? Ovaries- eggs and testicles- sperm

8. Use the following terms: interphase, prophase, metaphase, anaphase, telophase, chromosome, spindle fibers, centrioles.

What Mad Pursuit (1988, Ch.5) Francis Crick (1916 ) British molecular Biologist 12 BIOLOGY, CH 1

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Transcription:

Algorithmics and Bioinformatics Gregory Kucherov and Philippe Gambette LIGM/CNRS Université Paris-Est Marne-la-Vallée, France

Schedule Course webpage: https://wikimpri.dptinfo.ens-cachan.fr/doku.php?id=cours:c-1-32 Lectures on Monday 2pm 6:15pm between Sept 11 and Nov 6

Mode of grading from https://wikimpri.dptinfo.ens-cachan.fr/doku.php? id=cours:c-1-32& - evaluation_exams_2017-2018! two programming exercises (deadline Nov 6): coefficient 1 [more details at the next lecture]! written exam (2 hours long on Nov 6): coefficient 3

Plan Part 1: Molecular biology primer for computer scientists Part 2: Basics of sequence algorithmics! Dynamic programming and sequence comparison

What is bioinformatics

Bionformatics: different viewpoints! Bioinformatics is [for practicians] software tools and associated analysis methods (often statistical) [for biological scientists] a way of thinking of biology (evolution) in terms of information processes Nothing in biology makes sense except in the light of evolution, Theodosius Dobzhansky E.Koonin, The logic of chance, FT Press, 2011 E.Koonin, M.Galperin, Sequence-Evolution-Function. Computational Approaches in Comparative Genomics, Springer 2003 J.Pevsner, Bioinformatics and Functional Genomics, Wiley, 2015 (3 rd edition) [for computer scientists] an exciting application area [for some theoreticians] source of new problems Ask not what mathematics can do for biology; ask rather what biology can do for mathematics!, Stanislaw Ulam! Bioinformatics is not the same as DNA computing cf e.g. L.Adleman s experiment of 1994

Part 1: Molecular biology primer

Cells! living organisms consist of cells! cells are complex molecular systems hosting cascades of chemical reactions (methabolic pathways)! prokaryotic and eukaryotic are distinguished; eukaryotic cells have nucleus

Kingdoms of life prokaryotes eukaryotes

Three main types of molecules in living cell! main biological molecules are polymers DNA (Deoxyribonucleic acid) consists of nucleotides: A (adenine), C (cytosine), G (guanine), T (thymine) RNA (Ribonucleic acid): A,C,G,U (uracil) proteins consist of amino acids: 20 types (A,C,D,E, F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y)

Three main types of molecules in living cell! DNA holds genetic information has double helix structure contained in chromosomes (linear in eukaryotes, often circular in prokaryotes)

Three main types of molecules in living cell! DNA holds genetic information has double helix structure contained in chromosomes (linear in eukaryotes, often circular in prokaryotes)! RNA participates in protein synthesis (mrna, trna, rrna) has different regulatory functions

Three main types of molecules in living cell! DNA holds genetic information has double helix structure contained in chromosomes (linear in eukaryotes, often circular in prokaryotes)! RNA participates in protein synthesis (mrna, trna, rrna) has different regulatory functions! proteins do all essential work in the cell

Human chromosomes! Somatic cells in humans have 2 pairs of 22 chromosomes + XX (female) or XY (male) = total of 46 chromosomes! Germline cells have 22 chromosomes + either X or Y = total of 23 chromosomes! Human body has 10 13-10 14 cells, all have the same genome (set of DNA)

Central dogma of molecular biology

Genes! Gene = fragment of DNA that encodes a functional RNA or protein product! molecular units of heredity defining phenotypic traits (e.g. eye colour)! human genome contains ~20,000 proteincoding genes taking ~2% of the genome! a gene can have different variants (alleles)! two human individuals differ in ~0.1% of DNA

A few alleles of my chromosome 6

Gene expression: promoters

Splicing (eukaryotes)

Alternative splicing

Genetic code (translation)

Codon structure of a gene Nucleotide sequence of the gfp10 cdna from Aequorea victoria and the deduced aa sequence (Prashera et al, Gene 92). Green arrows: intron positions (spliced out)

RNA! folds to form a secondary structure that is functionally important hydrogen bonds: Watson-Crick: A-U, C-G weak:g-u, U-C, G-A

Protein structure

Some numbers! bacterial genomes are typically 10 6-10 7 bp, human genome is ~3 10 9 bp! gene sizes are very variable, up to a few Mbp (10 6 ) in human genome! transcript (RNA) sizes for human genome are in range 1 to 20 Kbp (10 3 )! proteins are typically a few hundreds of amino acids (100-800aa)

Genome sizes human: 3G maize (corn): 5G salamander: 50G trumpet lily: 90G toad: 6.9G amoeba dubia: 670G

Sequencing! first DNA were sequenced in the 70s (1972-76)! first proteins were sequenced much earlier (1949)! How DNA is sequenced?

Genome sequencing

Recovery of shredded newspaper

Sanger sequencing! Sanger method (1977-2000s): sequencing fragments of ~800bp! whole-genome shotgun sequencing: from multiple copies of genomic DNA obtain inserts of certain length (a few Kbp) clone inserts into a vector used to infect bacteria sequence DNA collected from bacteria using shotgun approach reads assemble reads! 1977, first genome sequenced: bacteriophage phi X 174 (5,386bp)

Milestones: full genome sequencing! 1995: H.influenzae (~1.8 10 6 bp, ~1700 genes): first full sequenced genome of a free-living organism! 1998: C.elegans (~100 10 6 bp, ~20,000 genes)! 2000: D.melanogaster (~140 10 6 bp, ~15,000 genes)! 2000: Arabidopsis thaliana (~135 10 6 bp, ~27,000 genes)! February 2001: H.sapiens (~3,000 10 6 bp, ~21,000 genes) published in Nature (public consortium) and Science (Celera company) $2.7 billion (public project), several years 30M reads of length <800bp, total 24Gbp or ~8-fold coverage of 3Gbp human genome

Next-generation sequencing! ~2005- : Next-Generation Sequencing (NGS) produces millions (1M-1G) of short reads (35-400bp) of DNA in hours! Example: Octopus genome (Nature, August 13, 2015), size ~2.7Gbp sequenced with 60x coverage total 162 Gbp

Main NGS technologies Illumina HiSeq Oxford Nanopore

Data deluge in genomics! Data flow in DNA sequencing surpassed the flow of advances in hardware and storage! Moore law (chip performance doubling every two years) vs volume of sequencing data (x1.8 every year)! Next-generation sequencing technologies generate of order 10 10 B (=10GB) per run! Recall that the sequencing of human genome (2001) used ~24GB which took several years! Nowadays, a typical sequencing project may use 150-200 GB of sequence data

Number of complete genomes sequenced Number of complete genomes sequenced as for May 2014. from http://gregoryzynda.com/ncbi/genome/python/2014/03/31/ncbi-genome.html

Cost of sequencing Cost of sequencing of human-size genome. from https://en.wikipedia.org/wiki/personal_genomics

Nucleotide databases! International Nucleotide Sequence Database Collaboration (insdc.org) GenBank: NCBI, USA ENA: EMBL, Europe DDBJ: Japan! Contributors: research labs, sequencing programs,! All types of nucleotide sequences - chromosomes, contigs, RNAs, EST, - but also (huge) NGS read collections

EMBL: nucleotide bank growth

ENA: Sequence Read Archive growth

Today s datasets reach teras bacterial genome human genome human genome sequencing project 2001 typical eukaryotic genome sequencing project in ~2010 today s projects Sequence Read Archive (e.g. TCGA project is ~2.3Pb) Current worldwide annual sequencing capacity 106 megas 109 gigas 1012 teras 1015 petas 1018

1000 genomes project and amazon

Google cloud

Impact of NGS! NGS made genomic studies cheaper, high-throughput and (importantly) genome-wide! NGS revolutionized genomic research and made (already) possible a number of unprecedented discoveries: metagenomics ncrna reconstructing ancient genomes (Neanderthal, mammoth, ) 1000 genomes project low-cost sequencing and personalized medicine (23andme, )

Protein databases! UniProt (universal protein resourse) Swiss-Prot TrEMBL PIR-PSD! PDB: Protein Data Bank database of protein structures (as well as structures of other macromolecules)