CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

Similar documents
CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators

CSE P 590 A Spring : MLE, EM

GCD3033:Cell Biology. Transcription

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Chapter 15 Active Reading Guide Regulation of Gene Expression

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Outline CSE 527 Autumn 2009

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

DNA Binding Proteins CSE 527 Autumn 2007

BME 5742 Biosystems Modeling and Control

CSEP 527 Spring Maximum Likelihood Estimation and the E-M Algorithm

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

Eukaryotic vs. Prokaryotic genes

Gene Control Mechanisms at Transcription and Translation Levels

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

RNA Processing: Eukaryotic mrnas

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Controlling Gene Expression

1. In most cases, genes code for and it is that

From gene to protein. Premedical biology

Lecture 7: Simple genetic circuits I

Regulation of Gene Expression

Old FINAL EXAM BIO409/509 NAME. Please number your answers and write them on the attached, lined paper.

Regulation of Transcription in Eukaryotes

Prokaryotic Regulation

15.2 Prokaryotic Transcription *

Regulation of Gene Expression

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Introduction. Gene expression is the combined process of :

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Molecular Biology (9)

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Chapter 17. From Gene to Protein. Biology Kevin Dees

From Gene to Protein

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Translation and Operons

Flow of Genetic Information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Multiple Choice Review- Eukaryotic Gene Expression

CS-E5880 Modeling biological networks Gene regulatory networks

Computational Biology: Basics & Interesting Problems

Chapter 18: Control of Gene Expression

O 3 O 4 O 5. q 3. q 4. Transition

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Regulation of gene expression. Premedical - Biology

Regulation of Transcription in Eukaryotes. Nelson Saibo

Peter Pristas. Gene regulation in eukaryotes

Name Period The Control of Gene Expression in Prokaryotes Notes

RNA Synthesis and Processing

Lecture 4: Transcription networks basic concepts

Initiation of translation in eukaryotic cells:connecting the head and tail

Villa et al. (2005) Structural dynamics of the lac repressor-dna complex revealed by a multiscale simulation. PNAS 102:

Eukaryotic Gene Expression

13.4 Gene Regulation and Expression

Three types of RNA polymerase in eukaryotic nuclei

Topic 4 - #14 The Lactose Operon

Section 7. Junaid Malek, M.D.

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Computational Cell Biology Lecture 4

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

PROTEIN SYNTHESIS INTRO

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

Molecular Biology of the Cell

16 CONTROL OF GENE EXPRESSION

Honors Biology Reading Guide Chapter 11

HMMs and biological sequence analysis

Lecture 25: Protein Synthesis Key learning goals: Be able to explain the main stuctural features of ribosomes, and know (roughly) how many DNA and

Gene Regulation and Expression

Ch. 18 Regula'on of Gene Expression BIOL 222

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

CSEP 590B Fall Motifs: Representation & Discovery

Molecular Biology of the Cell

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Control of Prokaryotic (Bacterial) Gene Expression. AP Biology

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

What is the central dogma of biology?

The Gene The gene; Genes Genes Allele;

Biophysics Lectures Three and Four

Chapter 12. Genes: Expression and Regulation

32 Gene regulation, continued Lecture Outline 11/21/05

-14. -Abdulrahman Al-Hanbali. -Shahd Alqudah. -Dr Ma mon Ahram. 1 P a g e

12-5 Gene Regulation

CHAPTER : Prokaryotic Genetics

Introduction to Molecular and Cell Biology

APGRU6L2. Control of Prokaryotic (Bacterial) Genes

UNIT 5. Protein Synthesis 11/22/16

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

Transcription:

CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1

FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2

Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm Bio: Gene expression and regulation Next week: Motif description & discovery 3

MLE Maximum Likelihood Estimators 4

Probability Basics, I Ex. Ex. Sample Space {1, 2,..., 6} R Distribution p 1,..., p 6 0; p i = 1 f(x) >= 0; 1 i 6 R f(x)dx = 1 e.g. p 1 = = p 6 = 1/6 f(x) = 1 2πσ 2 e (x µ)2 /(2σ 2 ) pdf, not probability 5

Probability Basics, II 6

Parameter Estimation Assuming sample x 1, x 2,..., x n is from a parametric distribution f(x θ), estimate θ. E.g.: f(x) = 1 2πσ 2 e (x µ)2 /(2σ 2 ) θ = (µ, σ 2 ) 7

Maximum Likelihood Parameter Estimation One (of many) approaches to param. est. Likelihood of (indp) observations x 1, x 2,..., x n L(x 1, x 2,..., x n ) = n f(x i θ) i=1 As a function of θ, what θ maximizes the likelihood of the data actually observed Typical approach: or L( x θ) = 0 θ θ log L( x θ) = 0 8

Example 1 n coin flips, x 1, x 2,..., x n ; n 0 tails, n 1 heads, n 0 + n 1 = n; θ = probability of heads L(x 1, x 2,..., x n θ) = (1 θ) n 0 θ n 1 log L(x 1, x 2,..., x n θ) = n 0 log(1 θ) + n 1 log θ θ log L(x 1, x 2,..., x n θ) = n 0 1 θ + n 1 θ Setting to zero and solving: θ = n 1 n (Also verify it s max, not min, & not better on boundary) 9

Ex. 2: x i N(µ, σ 2 ), σ 2 = 1, µ unknown And verify it s max, not min & not better on boundary 10

Ex 3: x i N(µ, σ 2 ), µ, σ 2 both unknown 3 2 0.8 1 0.6 0-0.4 0.4-0.2 0 θ 1 θ 2 0.2 0.4 0.2 11

Ex. 3, (cont.) ln L(x 1, x 2,..., x n θ 1, θ 2 ) = 1 2 ln 2πθ 2 (x i θ 1 ) 2 2θ 2 1 i n θ 2 ln L(x 1, x 2,..., x n θ 1, θ 2 ) = 1 2π + (x i θ 1 ) 2 2 2πθ 2 2θ 2 = 0 1 i n 2 ( ˆθ 2 = 1 i n (x i ˆθ ) 1 ) 2 /n = s 2 A consistent, but biased estimate of population variance. (An example of overfitting.) Unbiased estimate is: ˆθ 2 = 1 i n (x i ˆθ 1 ) 2 n 1 Moral: MLE is a great idea, but not a magic bullet 12

EM The Expectation-Maximization Algorithm 13

More Complex Example This? Or this? 14

Gaussian Mixture Models / Model-based Clustering Parameters θ means µ 1 µ 2 variances σ1 2 σ2 2 mixing parameters τ 1 τ 2 = 1 τ 1 P.D.F. f(x µ 1, σ1) 2 f(x µ 2, σ2) 2 Likelihood L(x 1, x 2,..., x n µ 1, µ 2, σ 2 1, σ 2 2, τ 1, τ 2 ) = n i=1 2 j=1 τ jf(x i µ j, σ 2 j ) No closedform max 15

Likelihood Surface 0.15 0.1 20 0.05 10 0-20 0-10 0-10 10 20-20 16

0.15 0.1 20 0.05 10 0-20 0 x i = -10 10.2, 10, 9.8 0.2, 0, 0.2 11.8, 12, 12.2 0 10 20-20 -10 σ 2 = 1.0 τ 1 =.5 τ 2 =.5 17

(-5,12) (-10,6) 0.15 (12,-5) 0.1 20 0.05 (6,-10) 10 0-20 0 x i = -10 10.2, 10, 9.8 0.2, 0, 0.2 11.8, 12, 12.2 0 10 20-20 -10 σ 2 = 1.0 τ 1 =.5 τ 2 =.5 18

A What-If Puzzle Messy: no closed form solution known for finding θ maximizing L But what if we knew the hidden data? 19

EM as Egg vs Chicken IF z ij known, could estimate parameters θ IF parameters θ known, could estimate z ij But we know neither; (optimistically) iterate: E: calculate expected z ij, given parameters M: calc MLE of parameters, given E(z ij ) 20

The E-step Assume θ known & fixed A (B): the event that x i was drawn from f 1 (f 2 ) D: the observed datum x i Expected value of z i1 is P(A D) E = 0 P (0) + 1 P (1) } Repeat for each x i 21

The M-Step 22

M-step Details 23

EM Summary Fundamentally a max likelihood parameter estimation problem Useful if analysis is more tractable when 0/1 hidden data z known Iterate: E-step: estimate E(z) given θ M-step: estimate θ maximizing E(likelihood) given E(z) 24

EM Issues Under mild assumptions (sect 11.6), EM is guaranteed to increase likelihood with every E-M iteration, hence will converge. But may converge to local, not global, max. (Recall the 4-bump surface...) Issue is probably intrinsic, since EM is often applied to NP-hard problems (including clustering, above, and motif-discovery, soon) Nevertheless, widely used, often effective 25

Relative entropy 26

Relative Entropy AKA Kullback-Liebler Distance/Divergence, AKA Information Content Given distributions P, Q H(P Q) = x Ω P (x) log P (x) Q(x) Notes: Let P (x) log P (x) Q(x) = 0 if P (x) = 0 [since lim y log y = 0] y 0 Undefined if 0 = Q(x) < P (x) 27

ln x x 1 1 0.5 1 1.5 2 2.5-1 -2 ln x 1 x ln(1/x) 1 x ln x 1 1/x 28

Theorem: H(P Q) 0 H(P Q) = P (x) x P (x) log Q(x) ( ) x P (x) 1 Q(x) P (x) = x (P (x) Q(x)) = x P (x) x Q(x) = 1 1 = 0 Furthermore: H(P Q) = 0 if and only if P = Q Bottom line: bigger means more different 29

Gene Expression & Regulation 30

Gene Expression Recall a gene is a DNA sequence To say a gene is expressed means that it 1. is transcribed from DNA to RNA 2. the mrna is processed in various ways 3. is exported from the nucleus (eukaryotes) 4. is translated into protein A key point: not all genes are expressed all the time, in all cells, or at equal levels 31

Transcription RNA polymerase complex E. coli: 5 proteins (2α, β, β, σ) σ is initiation factor; finds promoter, then released/replaced by elongation factors Eukaryotes: 3 pols, each >10 subunits attaches to DNA, melts helix, makes RNA copy (5 3 ) of template (3 5 ) at ~30nt/sec Alberts, et al. 32

Some genes are heavily transcribed (but many are not). Alberts, et al.

5 Processing: Capping methylated G added to 5 end, and methyl added to ribose of 1st nucleotide of transcript probably helps distinguish protein-coding mrnas from other RNA junk prevents degradation facilitates start of translation 34

3 Processing: Poly A (Eukaryotes) Transcript cleaved after AAUAAA (roughly) pol keeps running (until it falls off) but no 5 cap added to strand downstream of poly A site, so it s rapidly degraded 10s - 100s of A s added to 3 end of transcript - its poly A tail 35

More processing: Splicing Also in eukaryotes, most genes are spliced: protein coding exons are interrupted by non-coding introns, which are cut out & degraded, exons spliced together More details about this when we get to gene finding 36

Alberts, et al.

Nuclear Export In eukaryotes, mature mrnas are actively transported out of the nucleus & ferried to specific destinations (e.g., mitochondria, ribosomes) 38

Regulation In most cells, pro- or eukaryote, easily a 10,000-fold difference between least- and most-highly expressed genes Regulation happens at all steps. E.g., some transcripts can be sequestered then released, or rapidly degraded, some are weakly translated, some are very actively translated, some are highly transcribed, some are not transcribed at all Below, focus on 1st step only: transcriptional regulation 39

DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 10%?, of all human proteins) modulate transcription of protein coding genes 40

The Double Helix 41 Los Alamos Science

In the groove Different patterns of potential H bonds at edges of different base pairs, accessible esp. in major groove Alberts, et al. 42

Helix-Turn-Helix DNA Binding Motif Alberts, et al. 43

H-T-H Dimers Alberts, et al. Bind 2 DNA patches, ~ 1 turn apart Increases both specificity and affinity 44

Zinc Finger Motif Alberts, et al. 45

Leucine Zipper Motif Homo-/hetero-dimers and combinatorial control Alberts, et al. 46

Bacterial Met Repressor a beta-sheet DNA binding domain Negative feedback loop: high Met level repress Met synthesis genes Met precursor Alberts, et al. 47

Summary Learning from data: MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) Expression & regulation Expression: creation of gene products Regulation: when/where/how much of each gene product; complex and critical Next week: using MLE/EM to find regulatory motifs in biological sequence data 48