Advanced Algorithms and Models for Computational Biology

Similar documents
Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Procedure to Create NCBI KOGS

11/24/13. Science, then, and now. Computational Structural Bioinformatics. Learning curve. ECS129 Instructor: Patrice Koehl

Computational Structural Bioinformatics

ECOL/MCB 320 and 320H Genetics

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

Welcome to BIOL 572: Recombinant DNA techniques

Types of biological networks. I. Intra-cellurar networks

Genomes and Their Evolution

6.096 Algorithms for Computational Biology. Prof. Manolis Kellis

Introduction to Bioinformatics. Shifra Ben-Dor Irit Orr

Introduction to Bioinformatics

Graph Alignment and Biological Networks

Introduction to Bioinformatics Integrated Science, 11/9/05

Bacterial Genetics & Operons

BSC 4934: QʼBIC Capstone Workshop" Giri Narasimhan. ECS 254A; Phone: x3748

How much non-coding DNA do eukaryotes require?

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

CGS 5991 (2 Credits) Bioinformatics Tools

Plant Molecular and Cellular Biology Lecture 8: Mechanisms of Cell Cycle Control and DNA Synthesis Gary Peter

Chapter 15 Active Reading Guide Regulation of Gene Expression

Biological Pathways Representation by Petri Nets and extension

Understanding Science Through the Lens of Computation. Richard M. Karp Nov. 3, 2007

Unit 5: Cell Division and Development Guided Reading Questions (45 pts total)

Lecture 10: Cyclins, cyclin kinases and cell division

Computational Biology: Basics & Interesting Problems

BioControl - Week 6, Lecture 1

Signal Transduction. Dr. Chaidir, Apt

growth, the replacement of damaged cells, and development from an embryo into an adult. These cell division occur by means of a process of mitosis

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Biology 105/Summer Bacterial Genetics 8/12/ Bacterial Genomes p Gene Transfer Mechanisms in Bacteria p.

Introduction to Molecular and Cell Biology

Valley Central School District 944 State Route 17K Montgomery, NY Telephone Number: (845) ext Fax Number: (845)

Map of AP-Aligned Bio-Rad Kits with Learning Objectives

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Introduction. Gene expression is the combined process of :

!"#$%&!"&'(&%")(*(+& '4567,846/-&*,/69450.:&*,.;42/9450&<&#7595.=097,.4.& #259,40& &95&523/0,--,.

Review: mostly probability and some statistics

Curriculum Map. Biology, Quarter 1 Big Ideas: From Molecules to Organisms: Structures and Processes (BIO1.LS1)

Recitation 2: Probability

Chapter 11. Development: Differentiation and Determination

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Genome Sequences and Evolution

active site Region of an enzyme surface to which a substrate molecule binds in order to undergo a catalyzed reaction.

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

AP Biology Gene Regulation and Development Review

Three types of RNA polymerase in eukaryotic nuclei

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

Warm Up. What are some examples of living things? Describe the characteristics of living things

Honors Biology Reading Guide Chapter 11

You are required to know all terms defined in lecture. EXPLORE THE COURSE WEB SITE 1/6/2010 MENDEL AND MODELS

Biology: Life on Earth

Introduction to cells

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Chapter 18 Lecture. Concepts of Genetics. Tenth Edition. Developmental Genetics

Classical AI and ML research ignored this phenomena Another example

Mouth animalcules (bacteria)

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Principles of Genetics

CHAPTER : Prokaryotic Genetics

GSBHSRSBRSRRk IZTI/^Q. LlML. I Iv^O IV I I I FROM GENES TO GENOMES ^^^H*" ^^^^J*^ ill! BQPIP. illt. goidbkc. itip31. li4»twlil FIFTH EDITION

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Enduring understanding 1.A: Change in the genetic makeup of a population over time is evolution.

Analysis of Escherichia coli amino acid transporters

BE/APh161: Physical Biology of the Cell Homework 2 Due Date: Wednesday, January 18, 2017

Molecular evolution - Part 1. Pawan Dhar BII

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

full file at

Unit 2: Cellular Chemistry, Structure, and Physiology Module 5: Cellular Reproduction

A A A A B B1

The Minimal-Gene-Set -Kapil PHY498BIO, HW 3

Basic modeling approaches for biological systems. Mahesh Bule

Gene Regulation and Expression

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Developmental genetics: finding the genes that regulate development

Bioinformatics 2 - Lecture 4

The Gene The gene; Genes Genes Allele;

2 The Proteome. The Proteome 15

Frequently Asked Questions (FAQs)

Introduction Biology before Systems Biology: Reductionism Reduce the study from the whole organism to inner most details like protein or the DNA.

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 1: The process of evolution drives the diversity and unity of life.

AP Curriculum Framework with Learning Objectives

5.1 Cell Division and the Cell Cycle

Model plants and their Role in genetic manipulation. Mitesh Shrestha

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools

Biology 112 Practice Midterm Questions

Cellular Organization

Reading Assignments. A. Systems of Cell Division. Lecture Series 5 Cell Cycle & Cell Division

Lecture Series 5 Cell Cycle & Cell Division

Gene expression in prokaryotic and eukaryotic cells, Plasmids: types, maintenance and functions. Mitesh Shrestha

Statistics for scientists and engineers

Molecular Developmental Physiology and Signal Transduction

Basics on Probability. Jingrui He 09/11/2007

Transcription:

Advanced Algorithms and Models for Computational Biology Introduction to cell biology genomics development and probability Eric ing Lecture January 3 006 Reading: Chap. DTM book Introduction to cell biology functional genomics development etc.

Model Organisms Bacterial hage: T4

Bacteria: E. Coli The Budding Yeast: Saccharomyces cerevisiae 3

The Fission Yeast: Schizosaccharomyces pombe The Nematode: Caenorhabditis elegans SMALL: ~ 50 µm TRANSARENT 959 CELLS 300 NEURONS SHORT GENERATION TIME SIMLE GROWTH MEDIUM SELF- FERTILIZING HERMAHRODITE RAID ISOLATION AND CLONING OF MULTILE TYES OF MUTANT ORGANISMS 4

The Fruit Fly: Drosophila Melanogaster The Mouse transgenic for human growth hormone 5

rokaryotic and Eukaryotic Cells A Close Look of a Eukaryotic Cell The structure: The information flow: 6

Cell Cycle Signal Transduction A variety of plasma membrane receptor proteins bind extracellular signaling molecules and transmit signals across the membrane to the cell interior 7

Signal Transduction athway Functional Genomics and -omics 8

A Multi-resolution View of the Chromosome DNA Content of Representative Types of Cells Organism ROKARYOTIC Mycoplasma genitalum Bacterium Helicobacter pylori Bacterium Haemophilus influenza Bacterium EUKARYOTIC Number of base pairs millions 0.58.67.83 Number of encoded proteins 470 590 743 Number of chromosomes Saccharomyces cerevisiae yeast 5885 7 Drosophila melanogaster insect 65 360 4 Caenorhabditis elegans worm 97 9099 6 Homo sapiens human 900 30000 TO 40000 3 Arabidopsis thaliana plant 5 5498 0 9

Functional Genomics The various genome projects have yielded the complete DNA sequences of many organisms. E.g. human mouse yeast fruitfly etc. Human: 3 billion base-pairs 30-40 thousand genes. Challenge: go from sequence to function i.e. define the role of each gene and understand how the genome functions as a whole. Regulatory Machinery of Gene Expression motif 0

Classical Analysis of Transcription Regulation Interactions Gel shift : electorphoretic mobility shift assay EMSA for DNA-binding proteins * rotein-dna complex * Free DNA probe Advantage: sensitive Disadvantage: requires stable complex; little structural information about which protein is binding Modern Analysis of Transcription Regulation Interactions Genome-wide Location Analysis ChI-chip Advantage: High throughput Disadvantage: Inaccurate

Gene Regulatory Network Biological Networks and Systems Biology rotein-protein Interaction networks Regulatory networks Gene Expression networks Systems Biology: understanding cellular event under a systemlevel context Metabolic networks Genome + proteome + lipome +

Gene Regulatory Functions in Development Temporal-spatial Gene Regulation and Regulatory Artifacts A normal fly Hopeful monster? 3

Gene Regulation and Carcinogenesis oncogenetic stimuli ie. Ras p4 time required for DNA repaircell damage severe DNA damage M G 0 or G G p53 p53 Cancer! p6 S p5 p extracellular stimuli TGF-β activates Inhibits activates activates transcriptional activation Cdk + EF - hosphorylation of Rb Cycli CNA n DNA repair Rb Gadd45 + E A B G CNA not cycle specific romotes Apoptosis Fas TNF TGF-β... The athogenesis of Cancer Normal BCH DYS CIS SCC 4

Genetic Engineering: Manipulating the Genome Restriction Enzymes naturally occurring in bacteria that cut DNA at very specific places. Recombinant DNA 5

Transformation Formation of Cell Colony 6

How was Dolly cloned? Dolly is an exact genetic replica of another sheep. Definitions Recombinant DNA: Two or more segments of DNA that have been combined by humans into a sequence that does not exist in nature. Cloning: Making an exact genetic copy. A clone is one of the exact genetic copies. Cloning vector: Self-replicating agents that serve as vehicles to transfer and replicate genetic material. 7

Software and Databases NCBI/NLM Databases Genbank ubmed DB DNA rotein rotein 3D Literature Introduction to robability fx µ x 8

Basic robability Theory Concepts A sample space S is the set of all possible outcomes of a conceptual or physical repeatable experiment. S can be finite or infinite. E.g. S may be the set of all possible nucleotides of a DNA site: S { AT CG} A random variable is a function that associates a unique numerical value a token with every outcome of an experiment. The value of the r.v. will vary from trial to trial as the experiment is repeated ω S E.g. seeing an "A" at a site = o/w =0. This describes the true or false outcome a random event. Can we describe richer outcomes in the same way? i.e. = 3 4 for being A C G T --- think about what would happen if we take expectation of. Unit-Base Random vector i =[ ia it ig ic ] T i =[000] T seeing a "G" at site i ω Basic rob. Theory Concepts ctd In the discrete case a probability distribution on S and hence on the domain of is an assignment of a non-negative real number s to each s S or each valid value of x such that Σ s S s=. 0 s intuitively s corresponds to the frequency or the likelihood of getting s in the experiments if repeated many times call θ s = s the parameters in a discrete probability distribution A probability distribution on a sample space is sometimes called a probability model in particular if several different distributions are under consideration write models as M M probabilities as M M e.g. M may be the appropriate prob. dist. if is from "splice site" M is for the "background". M is usually a two-tuple of {dist. family dist. parameters} 9

Discrete Distributions Bernoulli distribution: Berp p for x = 0 x = p for x = x x Multinomial distribution: Multθ A Multinomial indicator variable: C = G T p x j = Multinomial distribution: Multnθ where = [ 0 ] j and = w.p. θ { j = where j index the observed nucleotide} xa xc xg xt xk x j = θa θc θg θt = θ k θ = θ = x Count variable: = M where x j = n j x K x x x θ θ LθK =! LxK! x! x k x = p p n! n! K p x = θ x! x! Lx! K j x j j j [ACGT] j j [ACGT] = θ =. Basic rob. Theory Concepts ctd A continuous random variable can assume any value in an interval on the real line or in a region in a high dimensional space usually corresponds to a real-valued measurements of some property e.g. length position It is not possible to talk about the probability of the random variable assuming a particular value --- x = 0 Instead we talk about the probability of the random variable assuming a value within a given interval or half interval [ x x ] < x = [ x ] The probability of the random variable assuming a value within some given interval from x to x is defined to be the area under the graph of the probability density function between x and x. x robability mass: [ x x] = p x dx note that Cumulative distribution function CDF: robability density function DF: x p x = x = d dx x < x = x + p x dx =. p x ' dx ' 0

Continuous Distributions Uniform robability Density Function p x = / b a for a x b = 0 elsewhere Normal robability Density Function p x = x µ / σ e πσ The distribution is symmetric and is often illustrated x µ as a bell-shaped curve. Two parameters µ mean and σ standard deviation determine the location and shape of the distribution. The highest point on the normal curve is at the mean which is also the median and mode. fx The mean can be any numerical value: negative zero or positive. Exponential robability Distribution x / µ density : p x = x / µ e o CDF : x x = e µ fx.4.3.. x < = area =.4866 x 0 3 4 5 6 7 8 9 0 Time Between Successive Arrivals mins. Statistical Characterizations Expectation: the center of mass mean value first moment: Sample mean: E = i i S x p x i xp x dx discrete continuous Variance: the spreadness second moment: Var = x S [ x E ] p x i [ x E ] p x dx i discrete continuous Sample variance

Basic rob. Theory Concepts ctd Joint probability: For events E i.e. =x and H say Y=y the probability of both events are true: E and H := xy Conditional probability The probability of E is true given outcome of H E and H := x y Marginal probability The probability of E is true regardless of the outcome of H E := x=σ x xy utting everything together: x y = xy/y Independence and Conditional Independence Recall that for events E i.e. =x and H say Y=y the conditional probability of E given H written as E H is E and H/H = the probability of both E and H are true given H is true E and H are statistically independent if E = E H i.e. prob. E is true doesn't depend on whether H is true; or equivalently E and H=EH. E and F are conditionally independent given H if E HF = E H or equivalently EF H = E HF H

3 = 5 4 3 6 4 3 5 3 4 3 6 5 4 3 = = 6 5 4 3 6 5 4 3 i i 3 4 5 6 p 6 5 p p 5 4 p 4 p p 3 3 4 5 6 = 3 4 5 4 6 5 Representing multivariate dist. Joint probability dist. on multiple variables: If i 's are independent: i = i If i 's are conditionally independent the joint can be factored to simpler products e.g. The Graphical Model representation The Bayesian Theory The Bayesian Theory: e.g. for date D and model M M D = D MM/D the posterior equals to the likelihood times the prior up to a constant. This allows us to capture uncertainty about the model in a principled way