CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

Similar documents
CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

Review of Maximum Likelihood Estimators

CSE P 590 A Spring : MLE, EM

GCD3033:Cell Biology. Transcription

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Chapter 15 Active Reading Guide Regulation of Gene Expression

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Outline CSE 527 Autumn 2009

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

DNA Binding Proteins CSE 527 Autumn 2007

BME 5742 Biosystems Modeling and Control

CSEP 527 Spring Maximum Likelihood Estimation and the E-M Algorithm

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

Eukaryotic vs. Prokaryotic genes

Gene Control Mechanisms at Transcription and Translation Levels

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

RNA Processing: Eukaryotic mrnas

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Controlling Gene Expression

1. In most cases, genes code for and it is that

From gene to protein. Premedical biology

Lecture 7: Simple genetic circuits I

Regulation of Gene Expression

Old FINAL EXAM BIO409/509 NAME. Please number your answers and write them on the attached, lined paper.

Regulation of Transcription in Eukaryotes

Prokaryotic Regulation

15.2 Prokaryotic Transcription *

Regulation of Gene Expression

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Bi 8 Lecture 11. Quantitative aspects of transcription factor binding and gene regulatory circuit design. Ellen Rothenberg 9 February 2016

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Introduction. Gene expression is the combined process of :

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Molecular Biology (9)

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Chapter 17. From Gene to Protein. Biology Kevin Dees

From Gene to Protein

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Translation and Operons

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Multiple Choice Review- Eukaryotic Gene Expression

Flow of Genetic Information

CS-E5880 Modeling biological networks Gene regulatory networks

Computational Biology: Basics & Interesting Problems

Chapter 18: Control of Gene Expression

Quiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

O 3 O 4 O 5. q 3. q 4. Transition

Regulation of Transcription in Eukaryotes. Nelson Saibo

Regulation of gene expression. Premedical - Biology

Peter Pristas. Gene regulation in eukaryotes

Name Period The Control of Gene Expression in Prokaryotes Notes

RNA Synthesis and Processing

Lecture 4: Transcription networks basic concepts

Initiation of translation in eukaryotic cells:connecting the head and tail

Villa et al. (2005) Structural dynamics of the lac repressor-dna complex revealed by a multiscale simulation. PNAS 102:

Eukaryotic Gene Expression

13.4 Gene Regulation and Expression

Three types of RNA polymerase in eukaryotic nuclei

Topic 4 - #14 The Lactose Operon

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Section 7. Junaid Malek, M.D.

Computational Cell Biology Lecture 4

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

PROTEIN SYNTHESIS INTRO

Gene regulation I Biochemistry 302. Bob Kelm February 25, 2005

16 CONTROL OF GENE EXPRESSION

Molecular Biology of the Cell

Honors Biology Reading Guide Chapter 11

HMMs and biological sequence analysis

Lecture 25: Protein Synthesis Key learning goals: Be able to explain the main stuctural features of ribosomes, and know (roughly) how many DNA and

Gene Regulation and Expression

Ch. 18 Regula'on of Gene Expression BIOL 222

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Molecular Biology of the Cell

CSEP 590B Fall Motifs: Representation & Discovery

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Control of Prokaryotic (Bacterial) Gene Expression. AP Biology

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

4. Why not make all enzymes all the time (even if not needed)? Enzyme synthesis uses a lot of energy.

What is the central dogma of biology?

The Gene The gene; Genes Genes Allele;

Biophysics Lectures Three and Four

Chapter 12. Genes: Expression and Regulation

32 Gene regulation, continued Lecture Outline 11/21/05

-14. -Abdulrahman Al-Hanbali. -Shahd Alqudah. -Dr Ma mon Ahram. 1 P a g e

12-5 Gene Regulation

CHAPTER : Prokaryotic Genetics

Introduction to Molecular and Cell Biology

APGRU6L2. Control of Prokaryotic (Bacterial) Genes

UNIT 5. Protein Synthesis 11/22/16

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

Transcription:

CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm Bio: Gene expression and regulation MLE Maximum Likelihood Estimators Next week: Motif description & discovery 3 4

Probability Basics, I Ex. Ex. Probability Basics, II Sample Space {1, 2,..., 6} R Distribution e.g. p 1,..., p 6 ; 1 i 6 p i = 1 f(x) >= ; f(x)dx = 1 R p 1 = = p 6 = 1/6 f(x) = 1 2πσ 2 e (x µ)2 /(2σ 2 ) pdf, not probability 5 6 Parameter Estimation Assuming sample x 1, x 2,..., x n is from a parametric distribution f(x!), estimate!. E.g.: f(x) = 1 2πσ 2 e (x µ)2 /(2σ 2 ) θ = (µ, σ 2 ) Maximum Likelihood Parameter Estimation One (of many) approaches to param. est. Likelihood of (indp) observations x 1, x 2,..., x n n L(x 1, x 2,..., x n ) = f(x i θ) i=1 As a function of!, what! maximizes the likelihood of the data actually observed Typical approach: or L( x θ) = θ log L( x θ) = θ 7 8

Example 1 n coin flips, x 1, x 2,..., x n ; n tails, n 1 heads, n + n 1 = n;! = probability of heads L(x 1, x 2,..., x n θ) = (1 θ) n θ n 1 log L(x 1, x 2,..., x n θ) = n log(1 θ) + n 1 log θ θ log L(x 1, x 2,..., x n θ) = n 1 θ + n 1 θ Setting to zero and solving: θ = n 1 n (Also verify it s max, not min, & not better on boundary) 9 Ex. 2: x i N(µ, σ 2 ), σ 2 = 1, µ unknown And verify it s max, not min & not better on boundary 1 Ex 3: x i N(µ, σ 2 ), µ, σ 2 both unknown Ex. 3, (cont.) ln L(x 1, x 2,..., x n θ 1, θ 2 ) = 1 2 ln 2πθ 2 (x i θ 1 ) 2 2θ 2 1 i n θ 2 ln L(x 1, x 2,..., x n θ 1, θ 2 ) = 1 2π + (x i θ 1 ) 2 2 2πθ 2 2θ 2 = 1 i n 2 ( ˆθ 2 = 1 i n (x i ˆθ ) 1 ) 2 /n = s 2 3 2 1.6.8 A consistent, but biased estimate of population variance. (An example of overfitting.) Unbiased estimate is: -.4 -.2.2.2.4 ˆθ 2 = 1 i n (x i ˆθ 1 ) 2 n 1! 1! 2 12.4 11 Moral: MLE is a great idea, but not a magic bullet

More Complex Example EM The Expectation-Maximization Algorithm This? Or this? 13 14 Gaussian Mixture Models / Model-based Clustering Likelihood Surface Parameters θ means µ 1 µ 2 variances σ1 2 σ2 2 mixing parameters τ 1 τ 2 = 1 τ 1.15.1 2 P.D.F. f(x µ 1, σ 2 1) f(x µ 2, σ 2 2).5 1 Likelihood L(x 1, x 2,..., x n µ 1, µ 2, σ 2 1, σ 2 2, τ 1, τ 2 ) = n i=1 2 j=1 τ jf(x i µ j, σ 2 j ) No closedform max -2-1 1-1 15 2-2 16

(-5,12) (-1,6).15.15 (12,-5).1 2.1 2.5 1.5 (6,-1) 1 x i = -2-1 1.2, 1, 9.8.2,,.2 1-1 σ 2 = 1. τ 1 =.5 τ 2 =.5 11.8, 12, 12.2 2-2 17 11.8, 12, 12.2 2-2 18 x i = -2-1 1.2, 1, 9.8.2,,.2 1-1 σ 2 = 1. τ 1 =.5 τ 2 =.5 A What-If Puzzle EM as Egg vs Chicken Messy: no closed form solution known for finding! maximizing L IF z ij known, could estimate parameters! IF parameters! known, could estimate z ij But we know neither; (optimistically) iterate: E: calculate expected z ij, given parameters But what if we knew the hidden data? 19 M: calc MLE of parameters, given E(z ij ) 2

The E-step Assume! known & fixed A (B): the event that x i was drawn from f 1 (f 2 ) D: the observed datum x i Expected value of z i1 is P(A D) E = P () + 1 P (1) The M-Step } Repeat for each x i 22 21 M-step Details EM Summary Fundamentally a max likelihood parameter estimation problem Useful if analysis is more tractable when /1 hidden data z known Iterate: E-step: estimate E(z) given! M-step: estimate! maximizing E(likelihood) given E(z) 23 24

EM Issues Under mild assumptions (sect 11.6), EM is guaranteed to increase likelihood with every E-M iteration, hence will converge. But may converge to local, not global, max. (Recall the 4-bump surface...) Issue is probably intrinsic, since EM is often applied to NP-hard problems (including clustering, above, and motif-discovery, soon) Nevertheless, widely used, often effective Relative entropy 25 26 Relative Entropy ln x x 1 AKA Kullback-Liebler Distance/Divergence, AKA Information Content Given distributions P, Q Notes: H(P Q) = x Ω P (x) log P (x) Q(x) Let P (x) log P (x) = if P (x) = [since lim y log y = ] Q(x) y 1-1 -2.5 1 1.5 2 2.5 ln x 1 x ln(1/x) 1 x ln x 1 1/x Undefined if = Q(x) < P (x) 27 28

Theorem: H(P Q) H(P Q) = P (x) x P (x) log Q(x) ( ) x P (x) 1 Q(x) P (x) = x (P (x) Q(x)) = x P (x) x Q(x) = 1 1 = Gene Expression & Regulation Furthermore: H(P Q) = if and only if P = Q Bottom line: bigger means more different 29 3 Gene Expression Recall a gene is a DNA sequence To say a gene is expressed means that it 1. is transcribed from DNA to RNA 2. the mrna is processed in various ways 3. is exported from the nucleus (eukaryotes) 4. is translated into protein A key point: not all genes are expressed all the time, in all cells, or at equal levels Transcription RNA polymerase complex E. coli: 5 proteins (2", #, #, $) $ is initiation factor; finds promoter, then released/replaced by elongation factors Eukaryotes: 3 pols, each >1 subunits attaches to DNA, melts helix, makes RNA copy (5 % 3 ) of template (3 % 5 ) at ~3nt/sec 31 32

Some genes are heavily transcribed (but many are not). 5 Processing: Capping methylated G added to 5 end, and methyl added to ribose of 1st nucleotide of transcript probably helps distinguish protein-coding mrnas from other RNA junk prevents degradation facilitates start of translation 34 3 Processing: Poly A More processing: Splicing Transcript cleaved after AAUAAA (roughly) pol keeps running (until it falls off) but no 5 cap added to strand downstream of poly A site, so it s rapidly degraded Also in eukaryotes, most genes are spliced: protein coding exons are interrupted by non-coding introns, which are cut out & degraded, exons spliced together 1s - 1s of A s added to 3 end of transcript - its poly A tail More details about this when we get to gene finding (Eukaryotes) 35 36

Nuclear Export In eukaryotes, mature mrnas are actively transported out of the nucleus & ferried to specific destinations (e.g., mitochondria, ribosomes) 38 Regulation In most cells, pro- or eukaryote, easily a 1,-fold difference between least- and most-highly expressed genes Regulation happens at all steps. E.g., some transcripts can be sequestered then released, or rapidly degraded, some are weakly translated, some are very actively translated, some are highly transcribed, some are not transcribed at all Below, focus on 1st step only: transcriptional regulation DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 1%?, of all human proteins) modulate transcription of protein coding genes 39 4

In the groove The Double Helix Different patterns of potential H bonds at edges of different base pairs, accessible esp. in major groove 42 41 Los Alamos Science Helix-Turn-Helix DNA Binding Motif H-T-H Dimers Bind 2 DNA patches, ~ 1 turn apart Increases both specificity and affinity 43 44

Leucine Zipper Motif Zinc Finger Motif Homo-/hetero-dimers and combinatorial control 45 Bacterial Met Repressor 46 Summary a beta-sheet DNA binding domain Negative feedback loop: high Met level repress Met synthesis genes Learning from data: MLE: Max Likelihood Estimators EM: Expectation Maximization (MLE w/hidden data) Expression & regulation Expression: creation of gene products Regulation: when/where/how much of each gene product; complex and critical Next week: using MLE/EM to find regulatory motifs in biological sequence data Met precursor 47 48