Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Similar documents
Computational Biology: Basics & Interesting Problems

Lesson Overview. Ribosomes and Protein Synthesis 13.2

BME 5742 Biosystems Modeling and Control

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Objective 3.01 (DNA, RNA and Protein Synthesis)

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

UNIT 5. Protein Synthesis 11/22/16

Translation Part 2 of Protein Synthesis

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Multiple Choice Review- Eukaryotic Gene Expression

Introduction to molecular biology. Mitesh Shrestha

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

1. In most cases, genes code for and it is that

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GCD3033:Cell Biology. Transcription

Chapter 17. From Gene to Protein. Biology Kevin Dees

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Biology I Fall Semester Exam Review 2014

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

AQA Biology A-level. relationships between organisms. Notes.

Genomes and Their Evolution

GENETICS UNIT VOCABULARY CHART. Word Definition Word Part Visual/Mnemonic Related Words 1. adenine Nitrogen base, pairs with thymine in DNA and uracil

Full file at CHAPTER 2 Genetics

From gene to protein. Premedical biology

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

What is the central dogma of biology?

Honors Biology Fall Final Exam Study Guide

What Organelle Makes Proteins According To The Instructions Given By Dna

week: 4 Date: Microscopes Cell Structure Cell Function Standards None 1b, 1h 1b, 1h, 4f, 5a 1a, 1c, 1d, 1e, 1g, 1j

Sugars, such as glucose or fructose are the basic building blocks of more complex carbohydrates. Which of the following

LIFE SCIENCE CHAPTER 5 & 6 FLASHCARDS

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become four, four become eight, and so on.

Bioinformatics Chapter 1. Introduction

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline

Chapter 002 The Chemistry of Biology

PROTEIN SYNTHESIS INTRO

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Big Idea 1: Does the process of evolution drive the diversity and unit of life?

MIP543 RNA Biology Fall 2015

Molecular Biology - Translation of RNA to make Protein *

Biology Semester 2 Final Review

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

BIOLOGY STANDARDS BASED RUBRIC

From Gene to Protein

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Lecture 5. How DNA governs protein synthesis. Primary goal: How does sequence of A,G,T, and C specify the sequence of amino acids in a protein?

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5

Translation. Genetic code

Bio 101 General Biology 1

Flow of Genetic Information

STRUCTURAL BIOINFORMATICS I. Fall 2015

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Virginia Western Community College BIO 101 General Biology I

Honors Biology Reading Guide Chapter 11

Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3)

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Biomolecules. Energetics in biology. Biomolecules inside the cell

MOLECULAR BIOLOGY BIOL 021 SEMESTER 2 (2015) COURSE OUTLINE

1/23/2012. Atoms. Atoms Atoms - Electron Shells. Chapter 2 Outline. Planetary Models of Elements Chemical Bonds

1. (5) Draw a diagram of an isomeric molecule to demonstrate a structural, geometric, and an enantiomer organization.

Chapter

Instructor: Dr. Darryl Kropf 203 S. Biology ; please put cell biology in subject line

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Chapter 6.2. p

Peddie Summer Day School

PHYSICS & BIOLOGY IN MEDICINE 218 RADIOLOGIC FUNCTIONAL ANATOMY. Fall 2017

Bio 119 Bacterial Genomics 6/26/10

What Mad Pursuit (1988, Ch.5) Francis Crick (1916 ) British molecular Biologist 12 BIOLOGY, CH 1

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

The Gene The gene; Genes Genes Allele;

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become, four become eight, and so on.

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Contra Costa College Course Outline

Eukaryotic vs. Prokaryotic genes

Round 1. Mitosis & Meiosis Inheritance (10 questions)

Readings Lecture Topics Class Activities Labs Projects Chapter 1: Biology 6 th ed. Campbell and Reese Student Selected Magazine Article

Biology 2018 Final Review. Miller and Levine

BIOLOGY Grades Summer Units: 10 high school credits UC Requirement Category: d. General Description:

Chapter 1. DNA is made from the building blocks adenine, guanine, cytosine, and. Answer: d

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

Cell Structure and Function

Computational Biology Course Descriptions 12-14

Los Angeles Valley College. Tentative Biology 03 Syllabus, Section Spring 2016

Regulation of Gene Expression

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Biology 112 Practice Midterm Questions

Short Answers Worksheet Grade 6

Transcription:

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:?? Lecture: Monday 12:30-14:30, Taub 5 Tutorial: Tuesday 9:30-10:30, Taub 4 url: webcourse.cs.technion.ac.il/~cs236522 This class has been initially edited from Nir Friedman s lecture at the Hebrew University. Changes made by Dan Geiger, then by Shlomo Moran. 1

Course Information Requirements & Grades: 15-25% homework, in five assignments. [Submit in two weeks time]. Homework is obligatory. 75-85% test. Must pass beyond 55 for the homework s grade to count Exams dates: Moed A 7/9/2008 Moed B 3/11/2008 2

Bibliography Biological Sequence Analysis, R.Durbin et al., Cambridge University Press, 1998 Introduction to Molecular Biology, J. Setubal, J. Meidanis, PWS publishing Company, 1997 Bioinformatics, A. Polanski & M. Kimmel, Springer (2007) 3

Course Prerequisites Computer Science and Probability Background Data structure 1 (cs234218) Algorithms 1 (cs234247) Probability (any course) Some Biology Background Formally: None, to allow CS students to take this course. Recommended: Molecular Biology 1 (especially for those in the Bioinformatics track), or a similar Biology course, and/or a serious desire to complement your knowledge in Biology by reading the appropriate material (see the course web site). 4

Relations to Some Other Courses Bioinformatics Software (cs236523). The course Introduction to Bioinformatics covers practical aspects and hands on experience with many web-based bioinformatics programs. Albeit not a formal requirement, it is recommended that you look on the web site www.cs.technion.ac.il/~cs236606 and examine the relevant software. Bioinformatics algorithms (cs236522). This is the current course which focuses on modeling some bioinformatics problems and presents algorithms for their solution. Bioinformatics project (cs236524). Developing bioinformatics tools under close guidance. 5

Biological Background First home work assignment: Read the first chapter (pages 1-30) of Setubal et al., 1997. (copies are available in the Taub building library, and in the central library). Answer the questions of the first assignment in the course site (more details in the course site) 6

Computational Biology Computational biology is the application of computational tools and techniques to (primarily) molecular biology. It enables new ways of study in life sciences, allowing analytic and predictive methodologies that support and enhance laboratory work. It is a multidisciplinary area of study that combines Biology, Computer Science, and Statistics. Computational biology is also called Bioinformatics. 7

Examples of Areas of Interest Building evolutionary trees from molecular (and other) data Efficiently constructing genomes of various organisms Understanding the structure of genomes (SNP, SSR, Genes) Understanding function of genes in the cell cycle and disease Deciphering structure and function of proteins SNP: Single Nucleotide Polymorphism SSR: Simple Sequence Repeat 8

Exponential growth of biological information 1965-1995: growth of sequences, structures, and literature. 9

Four Aspects Biological What is the task? Algorithmic How to perform the task at hand efficiently? Learning How to adapt/estimate/learn parameters and models describing the task from examples Statistics How to differentiate true phenomena from artifacts 10

Example: Sequence Comparison Biological Evolution preserves sequences, thus similar genes might have similar function Algorithmic Consider all ways to align one sequence against another Learning How do we define similar sequences? Use examples to define similarity Statistics When we compare to ~10 6 sequences, what is a random match and what is true one 11

Course Goals Learning about computational tools for (primarily) molecular biology. Describe computational tasks that are posed by modern molecular biology Discuss the biological motivation and setup for these tasks Understand the kinds of solutions that exist and what principles justify them 12

Topics I Dealing with DNA/Protein sequences: Informal biological background. (1 week) Finding similar sequence (~3 weeks) Models of sequences: Hidden Markov Models (~2 weeks) Parameter estimation: ML methods and the EM algorithm (~4 weeks) 13

Topics II Reconstructing evolutionary trees: Background: Darwin s theory of evolution Distance based methods (~2 weeks) Character based methods (~2 weeks) The presentations are similar to these given in the spring 2007 Semester, and can be found in the site of that semester. Updated presentations will be uploaded to the course site before the lectures. 14

Topics III (if time allows) Protein World: How proteins fold - secondary & tertiary structure How to predict protein folds from sequences data How to analyze proteins changes from raw experimental measurements (MassSpec) 15

Human Genome Most human cells contain 46 chromosomes: 2 sex chromosomes (X,Y): XY in males. XX in females. 22 pairs of chromosomes named autosomes. 16

DNA Organization 17 Source: Alberts et al

The Double Helix 18 Source: Alberts et al

Four nucleotide types: Adenine Guanine Cytosine Thymine DNA Components Hydrogen bonds (electrostatic connection): A-T C-G 19

Genome Sizes E.Coli (bacteria) 4.6 x 10 6 bases Yeast (simple fungi) 15 x 10 6 bases Smallest human chromosome 50 x 10 6 bases Entire human genome 3 x 10 9 bases 20

Genetic Information Genome the collection of genetic information. Chromosomes storage units of genes. Gene basic unit of genetic information. They determine the inherited characters. 21

Genes The DNA strings include: Coding regions ( genes ) E. coli has ~4,000 genes Yeast has ~6,000 genes C. Elegans has ~13,000 genes Humans have ~32,000 genes Control regions These typically are adjacent to the genes They determine when a gene should be expressed Junk DNA (unknown function - ~90% of the DNA in human s chromosomes) 22

The Cell All cells of an organism contain the same DNA content (and the same genes) yet there is a variety of cell types. 23

Example: Tissues in Stomach How is this variety encoded and expressed? 24

Central Dogma Transcription תרגום שעתוק Translation Gene mrna Protein cells express different subset of the genes In different tissues and under different conditions 25

Transcription Coding sequences can be transcribed to RNA RNA Similar to DNA, slightly different nucleotides: different backbone Uracil (U) instead of Thymine (T) Source: Mathews & van Holde 26

Transcription: RNA Editing 1. Transcribe to RNA 2. Eliminate introns 3. Splice (connect) exons * Alternative splicing exists Exons hold information, they are more stable during evolution. This process takes place in the nucleus. The mrna molecules diffuse through the nucleus membrane to the outer cell plasma. 27

RNA roles Messenger RNA (mrna) Encodes protein sequences. Each three nucleotide acids translate to an amino acid (the protein building block). Transfer RNA (trna) Decodes the mrna molecules to amino-acids. It connects to the mrna with one side and holds the appropriate amino acid on its other side. Ribosomal RNA (rrna) Part of the ribosome, a machine for translating mrna to proteins. It catalyzes (like enzymes) the reaction that attaches the hanging amino acid from the trna to the amino acid chain being created.... 28

Translation Translation is mediated by the ribosome Ribosome is a complex of protein & rrna molecules The ribosome attaches to the mrna at a translation initiation site Then ribosome moves along the mrna sequence and in the process constructs a sequence of amino acids (polypeptide) which is released and folds into a protein. 29

Genetic Code There are 20 amino acids from which proteins are build. 30

Protein Structure Proteins are polypeptides of 70-3000 amino-acids This structure is (mostly) determined by the sequence of amino-acids that make up the protein 31

Protein Structure 32

Evolution Related organisms have similar DNA Similarity in sequences of proteins Similarity in organization of genes along the chromosomes Evolution plays a major role in biology Many mechanisms are shared across a wide range of organisms During the course of evolution existing components are adapted for new functions 33

Evolution Evolution of new organisms is driven by Diversity Different individuals carry different variants of the same basic blue print Mutations The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. Selection bias 34

The Tree of Life 35 Source: Alberts et al

Example of a graph theoretic problem related to evolution trees: the perfect phylogeny problem 36

Characters in Species A (discrete) character is a property which distinguishes between species (e.g. dental structure, a certain gene) A characters state is a value of the character (human dental structure). Problem: Given set of species, specified by their characters, reconstruct their evolutionary tree. 37

Species Vertices Characters Colorings States Colors Each species is identified by its states Evolutionary tree A tree with many colorings (one coloring for each character) - containing the given vertices = No teeth = teeth A B C D 38

Another tree for the same character Which tree is more reasonable? A C B D = No teeth = teeth 39

Evolutionary trees should avoid reversal transitions A species regains a state it s direct ancestor has lost. Famous (and rare) examples: Teeth in birds. Legs in snakes. 40

Evolutionary trees should avoid convergence transitions Two species possess the same state while their least common ancestor possesses a different state. Famous example: The marsupials. 41

42

Common Assumption: Characters with Reversal or Convergent transitions are highly unlikely in the Evolutionary Tree A character that exhibits neither reversals nor convergence is denoted homoplasy free. 43

A character is Homoplasy Free The corresponding coloring is convex (each color induces a connected subtree) 44

A partial coloring is convex if it can be completed to a (total) convex coloring 45

The Perfect Phylogeny Problem Input: a set of species, and many characters, each assign states (colors) to the species. Question: is there a tree T containing the species as vertices, in which all the characters (colorings) are convex? 46

The Perfect Phylogeny Problem (combinatorial setting) Input: Some colorings (C 1,,C k ) of a set of vertices (in the example: 3 colorings: left, center, right, each by (the same) two colors). RRB BBR RRR RBR Problem: Is there a tree T which includes these vertices, s.t. (T,C i ) is convex for i=1,,k? NP-Hard In general, in P for some special cases 47