Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos. Molecular Genetics 101. Cell s internal world

Size: px
Start display at page:

Download "Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos. Molecular Genetics 101. Cell s internal world"

Transcription

1 Sequence Analysis BBSI 2006: ecture #χ+ Takis Benos 2006 Molecular Genetics 0 Cell s internal world

2 DNA - Chromosomes - Genes We cannot define it but we know it when we see it A loose definition: What is a gene? Gene is a DNA/RNA information unit that is able to perform a function in a cellular environment Central Dogma: rotein coding genes transcribed translated DNA RNA protein 2

3 Open Reading Frames ORFs aatagcgaat tttccaacga caaaagctaa atatcgcaaa aacctcagta aaaatcttgc 60 tggagctatt attgctaagt aacatttacc ccctgaagtt aatggatcaa tcaagagaga 20 tgtgggctgt aatgaatcgt cttattgaat taacaggttg gatcgttctt gtcgtttcag 80 tcattcttct tggcgtggcg agtcacattg acaactatca gccacctgaa cagagtgctt 240 cggtacaaca caagtaagct ctgcacttgt ggagcgacat gctgcccgtc cgggtgcatg 300 ttttcacttg tcggatatta aaccaggaat ttattatctt gttcgatgtt gtaataaa 358 Open Reading Frames ORFs aatagcgaat tttccaacga caaaagctaa atatcgcaaa aacctcagta aaaatcttgc 60 tggagctatt attgctaagt aacatttacc ccctgaagtt aatggatcaa tcaagagaga 20 tgtgggctgt aatgaatcgt cttattgaat taacaggttg gatcgttctt gtcgtttcag 80 M N R tcattcttct tggcgtggcg agtcacattg acaactatca gccacctgaa cagagtgctt 240 cggtacaaca caagtaagct ctgcacttgt ggagcgacat gctgcccgtc cgggtgcatg 300 ttttcacttg tcggatatta K stop aaccaggaat ttattatctt gttcgatgtt gtaataaa 358 MNRIETGWIVVVSVIGVASHIDNYQEQSASVQHK Gene s characteristics aatagcgaat tttccaacga caaaagctaa atatcgcaaa aacctcagta aaaatcttgc 60 tggagctatt attgctaagt aacatttacc ccctgaagtt aatggatcaa tcaagagaga 20 tgtgggctgt aatgaatcgt cttattgaat taacaggttg gatcgttctt gtcgtttcag 80 tcattcttct tggcgtggcg agtcacattg acaactatca gccacctgaa cagagtgctt 240 cggtacaaca caagtaagct ctgcacttgt ggagcgacat gctgcccgtc cgggtgcatg 300 ttttcacttg tcggatatta aaccaggaat ttattatctt gttcgatgtt gtaataaa 358 ATG TAA promoter 5 UTR ORF 3 UTR mrna 3

4 Transcription cugaaguu aauggaucaa ucaagagaga 20 ugugggcugu aaugaaucgu cuuauugaau uaacagguug gaucguucuu gucguuucag 80 ucauucuucu uggcguggcg agucacauug acaacuauca gccaccugaa cagagugcuu 240 cgguacaaca caaguaagcu cugcacuugu ggagcgacau gcugcccguc cgggugcaug 300 uuuucacuug ucggauauua aaccaggaau uuauuaucuu guucgauguu guaauaaa 358 CA cuga aauaaa aaaaaaa olya tail Translation cuga aauaaa aaaaaaa MNRIETGWIVVVSVIGVASHIDNYQEQSASVQHK rotein coding genes cntd Source: OTSYn.html 4

5 Genes and roteins Alterations of the DNA Base substitutions: silent: no a.a. replacement missense: a.a. replacement non-sense: a.a. stop codon replacement Alterations of the DNA cntd UGU UGC silent UGA non-sense UGG missense Source: 5

6 Inheritance - genetic differences Molecular evolution Two species will acquire mutations proportionally to their divergence time. However: all proteins do not change in the same pace a given protein does not necessarily change in the same pace throughout time different parts of the same protein change at different paces Molecular evolution cntd Human CA_HUMAN; 0508 vs. Drosophila CA_IG; 062 Query: MAKGRSVVKGYQTFSAREGGRRVTGEGAGISTRSRFNEISGDNGW 60 MA+G RSVVKG Q FSARE G RV TGEGA IST++RF+EISGDNGW+ Sbjct: MARGARSVVKGCQFSARECGHRVGTGEGACISTKTRFSEISGDNGWI 60 Query: 6 NYHFWRETGTHKVHHHVQNFQKYGIYREKGNVESVYVIDEDVAFKSEGNER 20 NY FW+E GT K+H HHVQNFQKYGIYREKGN+ESVY+IDEDVAFK EGNER Sbjct: 6 NYRFWKEKGTQKIHYHHVQNFQKYGIYREKGNESVYIIDEDVAFKFEGNER 20 Query: 2 FIWVAYHQYYQRIGVKKSAAWKKDRVANQEVMAEATKNFDAVSRDFVS 80 + IWVAYHQ+YQ++GVKKS AWKKDR+ N EVMAEA KNF+D VS+DFV Sbjct: 2 YNIWVAYHQHYQKVGVKKSGAWKKDRVNTEVMAEAIKNFIDTVSQDFVG 80 Query: 8 VHRRIKKAGSGNYSGDISDDFRFAFESITNVIFGERQGMEEVVNEAQRFIDAIYQM 240 VHRRIK+ GSG +SGDI +DFRFAFESITNVIFGER GMEE+V+EAQ+FIDA+YQM Sbjct: 8 VHRRIKQQGSGKFSGDIREDFRFAFESITNVIFGERGMEEIVDEAQKFIDAVYQM 240 Query: 24 FHTSVMNDFRFRTKTWKDHVAAWDVIFSKADIYTQNFYWERQKGSVHHDYRG 300 FHTSVMNDFRFRTKTW+DHVAAWD IF+KA+ YTQNFYW+R+K ++Y G Sbjct: 24 FHTSVMNDFRFRTKTWRDHVAAWDTIFNKAEKYTQNFYWDRRKRE-FNNYG 299 Query: 30 MYRGDSKMSFEDIKANVTEMAGGVDTTSMTQWHYEMARNKVQDMRAEVAAR 360 +YRG+ K+ ED+KANVTEMAGGVDTTSMTQWHYEMAR+ VQ+MR EV AR Sbjct: 300 IYRGNDKSEDVKANVTEMAGGVDTTSMTQWHYEMARSNVQEMREEVNAR 359 Query: 36 HQAQGDMATMQVKASIKETRHISVTQRYVNDVRDYMIAKTVQVAIY 420 QAQGD + MQVKASIKETRHISVTQRYVNDVRDYMIAKTVQVA+Y Sbjct: 360 RQAQGDTSKMQVKASIKETRHISVTQRYVNDVRDYMIAKTVQVAVY 49 Query: 42 AGRETFFFDENFDTRWSKDKNITYFRNGFGWGVRQCGRRIAEEMTIFINM 480 A+GR+ FF + FDTRW K++++ +FRNGFGWGVRQC+GRRIAEEMT+FI++ Sbjct: 420 AMGRDAFFSNGQFDTRWGKERDIHFRNGFGWGVRQCVGRRIAEEMTFIHI 479 Query: 48 ENFRVEIQHSDVGTTFNIMEKISFTFWFNQEATQ 520 ENF+VE+QH SDV T FNIM+KI F FNQ+ Q Sbjct: 480 ENFKVEQHFSDVDTIFNIMDKIFVFRFNQDQ 59 6

7 Molecular evolution cntd Human CA_HUMAN; 0508 vs. zebrafish Cypa; Q8JH93 Query: 34 TGEGAGISTRSRFNEISGDNGWNYHFWRETGTHKVHHHVQNFQKYGIYREK 93 T G + +FN+I N ++ F + G VH V NF+ +GIYREK+ Sbjct: 27 TRSGRAQNSTVQFNKIGRWRNSSVAFTKMGGRNVHRIMVHNFKTFGIYREKV 86 Query: 94 GNVESVYVIDEDVAFKSEGNERFIWVAYHQYYQRIGVKKSAAWKKDRVA 53 G +SVY+I ED A+FK+EG + R + W AY Y + GVK+ AWK DR+ Sbjct: 87 GIYDSVYIIKEDGAIFKAEGHHNRINVDAWTAYRDYRNQKYGVKEGKAWKTDRMI 46 Query: 54 NQEVMAEATKNFDAVSRDFVSVHRRIKKAGSGNYSGDISDDFRFAFESITNV 23 N+E++ + F+D V +DFV+ ++++I+++G ++ D++ DFRF+ ES++ V Sbjct: 47 NKEKQGTFVDEVGQDFVARVNKQIERSGQKQWTTDTHDFRFSESVSAV 206 Query: 24 IFGERQGMEEVVNEAQRFIDAIYQMFHTSVMNDFRFRTKTWKDHVAAWDVI GER G+ + ++E Q FID + MF T+ M R + WK+HV AWD I Sbjct: 207 YGERGDNIDEFQHFIDCVSVMFKTTSMYGRSIGSNIWKNHVEAWDGI 266 Query: 274 FSKADIYTQNFYWERQKGSVHHDYRGMYRGDSKMSFEDIKANVTEMAGGVDTTSM 333 F++AD QN Y G+ K+S EDIKA+VTE++AGGVD+ + Sbjct: 267 FNQADRCIQNIFKQWKENEGNGKYGVAIMQDKSIEDIKASVTEMAGGVDSVTF 326 Query: 334 TQWHYEMARNKVQDMRAEVAARHQAQGDMATMQVKASIKETRHISVT 393 T W YE+AR +QD RAE+ AAR +GDM M++++KA++KETRH++++ Sbjct: 327 TWTYEARQDQDERAEISAARIAFKGDMVQMVKMIKAAKETRHVAMS 386 Query: 394 QRYVNDVRDYMIAKTVQVAIYAGRETFFFDENFDTRWSKDKNITYFRN 453 RY+ D V+++Y IA TVQ+ +YA+GR+ FF E + +RW+S ++ YF++ Sbjct: 387 RYITEDTVIQNYHIAGTVQGVYAMGRDHQFFKEQYCSRWISSNRQ--YFKS 444 Query: 454 GFGWGVRQCGRRIAEEMTIFINMENFRVEIQHSDVGTTFNIMEKISFTFW 53 GFG+G RQCGRRIAE EM IFI+MENFR+E Q +V + F +MEKI T Sbjct: 445 GFGFGRQCGRRIAETEMQIFIHMENFRIEKQKQIEVRSKFEMEKIITIK 504 Query: 54 FN 55 N Sbjct: 505 N 506 Transcription regulation promoter region expression rates degradation post modifications Splicing genomic DNA intron- intron-2 intron-3 mrna precursor mrna 7

8 trna ribosomal RNA snorna microrna etc Non-coding genes Source: Elements of robability Theory with examples Outline Conditional robabilities Markov Chains Hidden Markov Models Information measures 8

9 robabilities Definition: x # favourable outcomes x total # possible outcomes Conditional robabilities Definition: xa # favourable outcomes x given A total # possible outcomes given A Conditional robabilities cntd Example: outcome x: tomorrow I ll go hiking event A: tomorrow will be sunny x xa A NOT x 9

10 Conditional robabilities cntd Joint probability:,y Y Y If Y then,y independent,y Y Marginal probability:,y YY Y Y Conditional robabilities cntd Bayes theorem Y Y Y Y Y! x osterior probabilities are the compromise between data and prior information. Bayes: Application- roblem from Durbin et al., 998: A rare genetic disease is discovered with population frequency one in million. An extremely good genetic test is 00% sensitive always correct if you have the disease and 99.99% specific false positive rate 0.0%. Will you be willing to take such a test? Hint: What is the probability that you have the disease, if the test is positive? 0

11 Bayes: Application- cntd Answer: D + + D D / +.0 * 0-6 / [.0 * * ] roblem: Application of Bayes-2 Given a set of transmembrane proteins with specified membrane domains of length training set, can you develop a probabilistic model that predicts which parts of a new transmembrane protein are likely to be membrane domains? Application of Bayes-2 cntd Solution: Suppose that we suspect that the amino acid frequencies differ between membrane and nonmembrane regions. Using the training set, calculate the probabilities, a i D, that each amino acid a i is part of a membrane domain D. Also, using the nonmembrane parts, calculate the corresponding probabilities, a i not D.

12 2 Application of Bayes-2 cntd Solution cntd: Divide the new protein into segments. Using Bayes theorem, calculate the posterior probability of each segment being a membrane domain using the a i D.! d M d d M D D M M M arg max : ; Markov chains What is a Markov chain? Markov chain of order n is a stochastic process of a series of outcomes, in which the probability of outcome x depends on the state of the previous n outcomes. Markov chains cntd Markov chain of first order: Transition probabilities: i i-! i i i x ,...,,...,,...,,

13 Application of Markov chains roblem from Durbin et al.: CpG islands Given two sets of sequences from the human genome, one with CpG islands and one without, can you calculate a model that can predict the CpG islands? Application of Markov chains cntd Solution: Application of Markov chains cntd Histogram of scores CpG islands: CpG other 3

14 Hidden Markov Models What is a Hidden Markov Model? A Markov process in which the probability of an outcome depends also in a hidden random variable state. Transition probability: the probability of reaching a state given the previous state. Emission probability: the probability of an outcome given the state. Hidden Markov Models cntd Graphical representation of the HMM: CpG islands transition probabilities Question: Where is the Markov process here? Application of HMMs roblem from Durbin et al.: dishonest casino Fair : /6 2: /6 : /0 2: / : /6 4: /6 5: /6 6: /6 0. 3: /0 4: /0 5: /0 6: /2 oaded 4

15 Application of HMMs cntd roblem from Durbin et al.: dishonest casino Given the previous model and 2 a series of die rolls x i, i,,, can we predict which of the rolls are coming from the fair and which from the loaded die? Question: What is hidden here? Application of HMMs cntd Answer: YES Viterbi algorithm best path Forward-backward algorithm probability of state k in outcome x i HMMs: Viterbi algorithm Viterbi predictions: 300 rolls of die 5

16 HMMs in biology General comments: Usually the structure of the model is unknown The transition and emission probabilities are calculated based on trusted training sets and the postulated model redictions are based on the Viterbi or the forward-backward algorithm, depending on the question asked Definitions: Information measures Entropy: E#log # $ p i log p i n Relative Entropy:, Q! i n i pi pi log q M,Y x Mutual Information: i,y j log x, y i j i, j i x i y j 6

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos Sequence Analysis BBSI 2006: Lecture #(χ+1) Takis Benos (2006) Molecular Genetics 101 What is a gene? We cannot define it (but we know it when we see it ) A loose definition: Gene is a DNA/RNA information

More information

Multiple Choice Review- Eukaryotic Gene Expression

Multiple Choice Review- Eukaryotic Gene Expression Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Introduction to Hidden Markov Models (HMMs)

Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II 6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution Lecture 05 Hidden Markov Models Part II 1 2 Module 1: Aligning and modeling genomes Module 1: Computational foundations Dynamic programming:

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene. GENE FINDING The Computational Problem We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene. The Computational Problem Confounding Realities:

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. Lecture 6. Hidden Markov model and Viterbi s decoding algorithm Institute of Computing Technology Chinese Academy of Sciences, Beijing, China . Outline The occasionally dishonest casino: an example

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models. Ron Shamir, CG 08 Hidden Markov Models 1 Dr Richard Durbin is a graduate in mathematics from Cambridge University and one of the founder members of the Sanger Institute. He has also held carried out research at the Laboratory

More information

GENE REGULATION AND PROBLEMS OF DEVELOPMENT

GENE REGULATION AND PROBLEMS OF DEVELOPMENT GENE REGULATION AND PROBLEMS OF DEVELOPMENT By Surinder Kaur DIET Ropar Surinder_1998@ yahoo.in Mob No 9988530775 GENE REGULATION Gene is a segment of DNA that codes for a unit of function (polypeptide,

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering

Proteomics. 2 nd semester, Department of Biotechnology and Bioinformatics Laboratory of Nano-Biotechnology and Artificial Bioengineering Proteomics 2 nd semester, 2013 1 Text book Principles of Proteomics by R. M. Twyman, BIOS Scientific Publications Other Reference books 1) Proteomics by C. David O Connor and B. David Hames, Scion Publishing

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Lecture 9. Intro to Hidden Markov Models (finish up)

Lecture 9. Intro to Hidden Markov Models (finish up) Lecture 9 Intro to Hidden Markov Models (finish up) Review Structure Number of states Q 1.. Q N M output symbols Parameters: Transition probability matrix a ij Emission probabilities b i (a), which is

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

Translation Part 2 of Protein Synthesis

Translation Part 2 of Protein Synthesis Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Flow of Genetic Information

Flow of Genetic Information presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Hidden Markov Models,99,100! Markov, here I come!

Hidden Markov Models,99,100! Markov, here I come! Hidden Markov Models,99,100! Markov, here I come! 16.410/413 Principles of Autonomy and Decision-Making Pedro Santana (psantana@mit.edu) October 7 th, 2015. Based on material by Brian Williams and Emilio

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2 Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis Shamir s lecture notes and Rabiner s tutorial on HMM 1 music recognition deal with variations

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics A stochastic (probabilistic) model that assumes the Markov property Markov property is satisfied when the conditional probability distribution of future states of the process (conditional on both past

More information

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever. CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * The contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Primer on Probability Random

More information

Advanced Data Science

Advanced Data Science Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter

More information

Chapter 4: Hidden Markov Models

Chapter 4: Hidden Markov Models Chapter 4: Hidden Markov Models 4.1 Introduction to HMM Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Markov models of sequence structures Introduction to Hidden Markov

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Biology 112 Practice Midterm Questions

Biology 112 Practice Midterm Questions Biology 112 Practice Midterm Questions 1. Identify which statement is true or false I. Bacterial cell walls prevent osmotic lysis II. All bacterial cell walls contain an LPS layer III. In a Gram stain,

More information

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2 Cellular Neuroanatomy I The Prototypical Neuron: Soma Reading: BCP Chapter 2 Functional Unit of the Nervous System The functional unit of the nervous system is the neuron. Neurons are cells specialized

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Hidden Markov Methods. Algorithms and Implementation

Hidden Markov Methods. Algorithms and Implementation Hidden Markov Methods. Algorithms and Implementation Final Project Report. MATH 127. Nasser M. Abbasi Course taken during Fall 2002 page compiled on July 2, 2015 at 12:08am Contents 1 Example HMM 5 2 Forward

More information

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Old FINAL EXAM BIO409/509 NAME. Please number your answers and write them on the attached, lined paper.

Old FINAL EXAM BIO409/509 NAME. Please number your answers and write them on the attached, lined paper. Old FINAL EXAM BIO409/509 NAME Please number your answers and write them on the attached, lined paper. Gene expression can be regulated at several steps. Describe one example for each of the following:

More information

Hidden Markov Models (HMMs) November 14, 2017

Hidden Markov Models (HMMs) November 14, 2017 Hidden Markov Models (HMMs) November 14, 2017 inferring a hidden truth 1) You hear a static-filled radio transmission. how can you determine what did the sender intended to say? 2) You know that genes

More information

The Gene The gene; Genes Genes Allele;

The Gene The gene; Genes Genes Allele; Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Supplemental Materials

Supplemental Materials JOURNAL OF MICROBIOLOGY & BIOLOGY EDUCATION, May 2013, p. 107-109 DOI: http://dx.doi.org/10.1128/jmbe.v14i1.496 Supplemental Materials for Engaging Students in a Bioinformatics Activity to Introduce Gene

More information

Introduction to molecular biology. Mitesh Shrestha

Introduction to molecular biology. Mitesh Shrestha Introduction to molecular biology Mitesh Shrestha Molecular biology: definition Molecular biology is the study of molecular underpinnings of the process of replication, transcription and translation of

More information

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Lesson Overview. Ribosomes and Protein Synthesis 13.2 13.2 The Genetic Code The first step in decoding genetic messages is to transcribe a nucleotide base sequence from DNA to mrna. This transcribed information contains a code for making proteins. The Genetic

More information

Molecular Biology - Translation of RNA to make Protein *

Molecular Biology - Translation of RNA to make Protein * OpenStax-CNX module: m49485 1 Molecular Biology - Translation of RNA to make Protein * Jerey Mahr Based on Translation by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative

More information

Hidden Markov Models and some applications

Hidden Markov Models and some applications Oleg Makhnin New Mexico Tech Dept. of Mathematics November 11, 2011 HMM description Application to genetic analysis Applications to weather and climate modeling Discussion HMM description Application to

More information

Objective 3.01 (DNA, RNA and Protein Synthesis)

Objective 3.01 (DNA, RNA and Protein Synthesis) Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types

More information

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell Hidden Markov Models in computational biology Ron Elber Computer Science Cornell 1 Or: how to fish homolog sequences from a database Many sequences in database RPOBESEQ Partitioned data base 2 An accessible

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information