2005 Fall Workshop on Information Theory and Communications. Bioinformatics. from a Perspective of Communication Science
|
|
- Doris Norman
- 5 years ago
- Views:
Transcription
1 2005 Fall Workshop on Information Theory and Communications Bioinformatics from a Perspective of Communication Science Chung-Chin Lu Department of Electrical Engineering National Tsing Hua University cclu@ee.nthu.edu.tw July 22, 2005
2 Biological Background : The Central Dogma 1
3 Typical Animal Cell From Neil Campbell, Jane Reece, and Larry Mitchell, Biology, 5th ed. (Menlo Park, CA: Addison Wesley Longman, 1999) c Addison Wesley Longman, Inc. 2
4 Places for Genetic Information Processing in Cell Nucleus : storage and transportation of genetic information Storage : deoxyribonucleic acid (DNA) Transportation : ribonucleic acid (RNA) Ribosomes : factories for protein synthesis with the blueprint in RNA Mitochondria : energy production with its own circular, doubled-stranded DNA 3
5 Double-stranded DNA From J. D. Watson, N. H. Hopkins, J. W. Roberts, J. A. Steitz, and A. M. Weiner, Molecular Biology of the Gene, 4th ed. (Redwood City, CA: Benjamin/Cummings Publishing Co., 1987). c 1987 James D. Watson. 4
6 Chemical Structures of RNA and DNA From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 5
7 Repeating Units of DNA and RNA Three subunits in a repeating unit (a ribo-nucleotide) of RNA A phosphate group A ribose A base Three subunits in a repeating unit (a deoxy-ribo-nucleotide) of DNA A phosphate group A 2 -deoxyribose A base 6
8 Nucleic Acid Bases and Paring of Bases in DNA Five nucleic acid bases Two purines Adenine (A) in DNA and RNA Guanine (G) in DNA and RNA Three pyrimidines Cytosine (C) in DNA and RNA Thymine (T) in DNA and Uracil (U) in RNA Paring of bases in DNA Adenine (A) Thymine (T) Guanine (G) Cytosine (C) 7
9 The Bases of DNA From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 8
10 The Flow of Genetic Information in Cell From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 9
11 DNA replication From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 10
12 Transcription From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 11
13 Relationships of DNA to mrna to Polypeptide From Blanchetot, Nature (1983) 301: c 1983 Macmillan Magazines, Ltd. 12
14 ! G H! E F C D G H E F C D ' ( A B M N $ K L $ $ = > I J ` a ] ] % & $ $ A B M N ^ _ $ $ ] K L " # [ \ = > I J. / ; < : , ; < : ) * Y Z S T W X Q R U V O P National Tsing Hua University (NTHU) Table of mrna Codons and Corresponding Amino Acids U C A G U UUU Phe UUC UUA Leu UUG CUU CUC Leu CUA CUG A U U A U C lle A U A { } A U G M et GUU GUC Val GUA GUG U C U U C C Ser U C A U C G CCU CCC CCA CCG ACU ACC Thr ACA ACG GCU GCC Ala GCA GCG C Pr o A U A U T yr U A C { } { } U A A sto p U A G sto p C A U H is C A C C A A G ln C A G AAU Asn AAC AAA Lys AAG GAU Asp GAC GAA Glu GAG U G U C ys U G C { } { } U G A Sto p U G G T rp CGU CGC Arg CGA CGG AGU Ser AGC AGA Arg AGG GGU GGC Gly GGA GGG G U C A G U C A G U C A G U C A G first letter third - letter second - letter 13
15 Protein as a Polymer Protein is a chain of amino acids There are 20 different kinds of amino acids Each amino acid is coded as a 3-tuple of nucleotides, called a codon. There are 64 codons, but 20 amino acids. So many amino acids are represented by more than one codons. Start codon : AUG which also encodes amino acid Methionine Stop codons : UAA, UAG and UGA which do not encode any amino acid 14
16 Translation of an RNA Message into a Protein From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 15
17 National Tsing Hua University (NTHU) The Central Dogma: Transcription and Translation 16
18 Messenger RNA (mrna) Pre-messenger RNA (pre-mrna) : primary RNA transcript Exons : concatenated to form a coding sequence for the synthesis of protein Introns : in between exons with functions not clear Poly-A signal, cleavage site and downstream element Splicing : deletion of introns Messenger RNA (mrna) : RNA after splicing A cap and a 5 untranslated region (5 UTR) Coding sequence (CDS) for protein synthesis 3 untranslated region (3 UTR) and a Poly-A tail sequence 17
19 Biological data are Digital!! Biological digital information modulates the macro-molecular of repetitive units Analogy : Digital information modulates sinusoidal waves 18
20 Digital Modulation on Macro-molecular of Repetitive Units 19
21 From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 20
22 Gene Structure Prediction and The Decoding Problem Aim To predict the structure of a gene based on the DNA sequence. Formalism To decode a DNA sequence (a sequence of A, T, G, C) to a sequence of exons and introns (a sequence of E and I). 21
23 Signal Sensors for Gene Structure Prediction 22
24 Signal Sensors To determine the transcriptional beginning of a gene, called transcription start site (TSS) and the upstream regulatory region, called promoter. Transcription start site (TSS). Elements of promoter : TATA box, CCAAT box, GC box, etc. To determine the precise exon-intron boundaries, called splice sites, in the coding region, as a crucial part in gene structure prediction. Donor site : the 5 splice site of an intron. Acceptor site : the 3 splice site of an intron. To determine the transcriptional termination of a gene. 23
25 PolyA signal, cleavage site, downstream element (DSE). 24
26 Transcription From Christopher K. Mathews, K. E. van Holde, and Kevin G. Ahern, Biochemistry, 3rd ed. (San Francisco, CA: Benjamin/Cummings, 2000) c 2000 Addison Wesley Longman, Inc. 25
27 Transcription Start Site (TSS) Signals Recognized by the basal RNA polymerase II transcriptional machinery. Encompassing three main core promotor elements TATA box, initiator (Inr), Downstream promoter element (DPE), which are generally located within -60 to +50 of the transcription start site. 26
28 The Basal RNA Polymerase II Transcriptional Machinery 27
29 Consensus in Transcription Start Site Signals TATA box has a strong consensus T A T A A A T +1 T +2 T +3 T +4 T +5 T +6 Located about 25 to 30 nt upstream of the TSS. Inr element has a loose consensus Py Py A N T/A Py Py A 2 A 1 A +1 A +2 A +3 A +4 A +5 Encompassing the TSS. DPE has no consensus and is not yet well characterized. 28
30 The Nucleotide Distribution in the TATA signal A T C G % position 29
31 The Nucleotide Distribution in the TSS signal A T C G % position 30
32 Pre-mRNA Splicing - Spliceosome Cycle From After Sharp, Cell (1994) 77:811. c 1994 Cell Press/The Nobel Foundation. 31
33 Consensus in Splice Signals Donor sites : exon A G / G U AUGU intron D 2 D 1 D +1 D +2 Acceptor sites : intron (C/T) N N A G / G exon A 2 A 1 A +1 32
34 Conservation in Splice Signals Donor sites : only GU are conserved in the D +1 D +2 positions for more than 98% of donor sites Acceptor sites : only AG are conserved in the A 2 A 1 positions for more than 98% of donor sites Spliceosome which does the job of splicing can recognize the splicing sites 33
35 Polyadenylation Mechanism There are three major steps : Recognition of the authentic signals of polyadenylation in the 3 -terminal of a pre-mrna, Cleavage of the pre-mrna, Addition of up to 250 adenosine residues (named polya tail). 34
36 Precleavage Complex 35
37 Proteins Involved CPSF (blue): Cleavage and polyadenylation specificity factor, binds to the AAUAAA motif and interacts with PAP and CstF. CstF (brown): Cleavage stimulation factor, binds to the GT/T-rich element. (at DNA level) CF I and II (gray): Cleavage factors I and II are required for cleavage. RNA polymerase II CTD (carboxyl-terminal domain): stimulates the cleavage reaction. PAP (orange): Poly(A) polymerase, initiates poly(a) synthesis, yielding an oligo(a) at least 10 nt long. PAB II (yellow): Poly(A)-binding protein II is for the elongation of poly(a) in mammals. 36
38 Model for Polyadenylation 37
39 38
40 Authentic Signals of Polyadenylation There are two major signals : PolyA signal (PAS) nucleotides upstream to the cleavage/polyadenylation site. A highly conserved hexamer AAUAAA (and the common variant AUUAAA). Recognized by the cleavage and polyadenylation specificity factor (CPSF). Downstream element (DE) nucleotides downstream to the cleavage/polyadenylation site. 39
41 consisting of a much less well-characterized U or G-U rich sequence, Recognized by the cleavage stimulation factor (CstF). Then cleavage occurs between these two signals as directed by two cleavage factors, CF Im and CF IIm. 40
42 Polyadenylation site, Window = [ 200, +206] A U G C G+C G+U 0.5 proportion position 41
43 Biological data are Noisy!! 42
44 Variation in Nucleotide Positions of a Signal Variation is from evolution of organisms. Transcriptional and post-transcriptional factors can still recognize signals. Inter-relation between nucleotide positions of a signal induces the recognition process by DNA-protein interaction. 43
45 Hidden Encoding and Channel Models Are there hidden legitimate codeword(s) for representing a signal? What is the codeword length for that signal? What is the stochastic mechanism (the channel) between the hidden legitimate codeword(s) and the observed diversified DNA segments for a signal? 44
46 Inter-dependency among Base Positions in Splice Signals Not being sufficiently addressed by previous models such as Weight matrix model (WMM) (Standen, 1984) Weight array model (WAM) (Zhang and Marr, 1993) Maximal dependence decomposition (MDD) (Burge and Karlin, 1997) Tree model (Cai et al., 2000) Potential models? Higher-order Markov chains Dependency graphs 45
47 Test of Dependency and Chi-square Statistics Question : How to find the dependency (strength of inter-relation) between the positions in a splice signal? Table 1: A contingency table for signals in DNA sequence. s i \s j A T C G Total A Y 11 Y 12 Y 13 Y 14 Y 1c T Y 21 Y 22 Y 23 Y 24 Y 2c C Y 31 Y 32 Y 33 Y 34 Y 3c G Y 41 Y 42 Y 43 Y 44 Y 4c Total Y r1 Y r2 Y r3 Y r4 Y 46
48 Test of Dependency and Chi-square Statistics Chi-square test statistics : where χ 2 (X i, X j ) = 4 m=1 4 n=1 E mn = Y mc Y rn /Y. (Y mn E mn ) 2 E mn. P (null hypothesis is rejected when it is true) = P (χ 2 (X i, X j ) K null hypothesis) = α where α is a numerical value for the Type I error of the test. If χ 2 (X i, X j ) is greater than a critical point K, two positions are said to have strong dependency. 47
49 The chi-square Statistics for the TATA Box Signal i/j X 5 X 4 X 3 X 2 X 1 X +1 X +2 X +3 X +4 X +5 X +6 X +7 X +8 X X X X X X X X X X X X X
50 Dependency Graph for the TATA Box Signal
51 Dependency Graph for the TSS Signal 50
52 National Tsing Hua University (NTHU) Dependency Graph for Donor Site D+7 51
53 Difficulty for Statistical Reasoning with Dependency Graphs There are always cycles in a dependency graph. 52
54 A Remedy Expanding a dependency graph into a Bayesian network. 53
55 Expanded Bayesian Network for the TATA Box Signal 54
56 55
57 )3( )3( )3( )3( )3( )3( )4( )4( )5( )5( )5( )5( )8( )8( )8( National Tsing Hua University (NTHU) Expanded Bayesian Network for Donor Site D 2 (0) D 3(1) D 1(1 ) D +4(1 ) D +5(1 ) D +6 (1 ) D (2) 4 D )2 D )2( D 1 )2( D +3 )2( D ) D (2 +5 ) D (2 +6 ) D +7 (2 ) 3 ( 2 +4 (2 D 5 (3) D 4 D 3 3() D 2 D 1 D +3 3() D +4 D +5 ()3 D +6 D +7 D +8 (3) D 6 (4) D 5 (4) D 4 D 3 (4) D 2 D 1 ()4 D +3 (4) +D 4 ()4 D +5 (4) D +6 (4) D +7 (4) D +8 (4) D +9 (4) D 7 (5) D 6 5() D 5 (5) D 4 D 3 5() D 2 D 1 D +3 (5) + D 4 D +5 (5) D +6 (5) D +7 ()5 D +8 5() D +9 ()5 D 8 (6) D 7 (6) D (6) D 3 (6) 6 D 5 (6) )6( )6( )6( D +3 (6) )6( D +5 (6) D +6 (6) D +7 (6) D +8 (6) D +9 (6 ) D 4 D 2 D 1 + D 4 D 9 (7) D 8 (7) D 7 (7) D (7) 6 D 5 (7) D 4 7() D 3 (7) D 2 7() D 1 (7) D +3 (7) D + 4 )7( D +5 (7) D +6 (7) D +7 (7) D +8 (7) D +9 (7 ) D 9 D 8 D 7 D 8() 6 D 5 (8) D 4 )8( D 3 8() D 2 )8( D 1 8() D +3 8() D + 4 )8( D +5 8() D +6 )8( D +7 )8( D +8 8() D +9 )8( 56
58 Datasets for TATA Model Training 862 human non-redundant and experimentally verified TATA box sequences were extracted from NCBI ( to form a true dataset pseudo-tata signals are retrieved from the exon and intron regions of the 862 genes to form a false dataset. 57
59 Datasets for TSS Model Training 1430 human non-redundant promoter sequences were extracted from EPD78 ( and blasted in the human genome to form a true dataset false TSS signal sequences are obtained in random from exon and intron regions of the 1430 genes. 58
60 Datasets for Splice Site Model Training We extract a collection of real and pseudo splice sites from a set of 462 annotated multiple-exon human genes at Table 2: Number of genes, true and pseudo splice sites in the dataset. Genes (True) donor acceptor (False) donor acceptor We exclude the splice sites which contains base positions not labelled with A, T, C, G but with other symbols. 59
61 Datasets for PolyA Signal Model Training 2923 polya signal sequences with AAUAAA are retrieved form GeneBank to form a true dataset pseudo-polya signals with AAUAAA are retrieved from the exon and intron regions of the 2923 genes to form a false dataset. 60
62 Five-fold Cross-Validation. The models are cross-validated by randomly partitioning the dataset into five subsets. Then we test each subset (called the testing data) with the parameters trained by the other four subsets (called the training data) under the splice site models, and take the average of the five predictive accuracy measures corresponding to the five testing/training data pairs. We also justify the training data with the model trained by themselves in the same manner. 61
63 Measures for Predictive Accuracy Actual positive (AP ), Actual negative (AN), Predicted positive (P P ), Predicted negative (P N), AP = T P + F N AN = F P + T N P P = T P + F P T P F P P N = F N + T N F N T N False negative rate : F N rate = False positive rate F P rate = #F N #T P + #F N. #F P #T N + #F P. 62
64 Measures for Predictive Accuracy (Cont ) Sensitivity : Sensitivity = 1 false negative rate = #T P #T P + #F N = #T P #AP. Specificity Specificity = 1 false positive rate = #T N #T N + #F P = #T N #AN. Predictive Positive Value (PPV): P P V = #T P #T P + #F P = #T P #P P. 63
65 Results and Comparison 64
66 TATA Signal TATA (Testing Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 1: Comparison of prediction accuracy of 6 dependency graph models for the training data of TATA box corresponding to 6 different windows. 65
67 TATA (Training Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 2: Comparison of prediction accuracy of 6 dependency graph models for the testing data of TATA box corresponding to 6 different windows. 66
68 TSS Site TSS (Training Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 3: Comparison of prediction accuracy of 6 dependency graph models for the training data of TSS corresponding to 6 different windows with the same right edge. 67
69 80 70 TSS (Testing Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 4: Comparison of prediction accuracy of 6 dependency graph models for the testing data of TSS corresponding to 6 different windows with the same right edge. 68
70 40 35 TSS (Training Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 5: Comparison of prediction accuracy of 6 dependency graph models for the training data of TSS corresponding to 6 different windows with the same left edge. 69
71 80 70 TSS (Testing Data), Dependency Graph Models, α = False Positive Rate (%) False Negative Rate (%) Figure 6: Comparison of prediction accuracy of 6 dependency graph models for the testing data of TSS corresponding to 6 different windows with the same left edge. 70
72 Donor Site False Positive Rate (%) Donor site (Training Data) Zero order Markov Chain Model (WMM), Window = [ 6, +9], With Laplace s Rule 1st order Markov Chain Model (WAM), Window = [ 9, +9], With Laplace s Rule 2nd order Markov Chain Model, Window = [ 9, +9], With Laplace s Rule 3rd order Markov Chain Model, Window = [ 3, +7], With Laplace s Rule MDD, Window = [ 6, +15], With Laplace s Rule Cai_Tree Model, Window = [ 9, +9], With Laplace s Rule EBN Model (at most 1 parent), Window = [ 9, +9], α=10 8, With Laplace s Rule EBN Model (at most 2 parents), Window = [ 9, +9], α=10 8, With Laplace s Rule EBN Model (at most 3 parents), Window = [ 6, +15], α=10 1, With Laplace s Rule False Negative Rate (%) Figure 7: Comparison of predictive accuracy for the training data of the donor site under WMM, WAM, MDD, Cai s Tree, the 2nd-order Markov chain, the 3rd-order Markov chain and the expanded Bayesian network with at most 1, 2 and 3 parents prediction models. 71
73 False Positive Rate (%) Donor site (Testing Data) Zero order Markov Chain Model (WMM), Window = [ 6, +9], With Laplace s Rule 1st order Markov Chain Model (WAM), Window = [ 9, +9], With Laplace s Rule 2nd order Markov Chain Model, Window = [ 9, +9], With Laplace s Rule 3rd order Markov Chain Model, Window = [ 3, +7], With Laplace s Rule MDD, Window = [ 6, +15], With Laplace s Rule Cai_Tree Model, Window = [ 9, +9], With Laplace s Rule EBN Model (at most 1 parent), Window = [ 9, +9], α=10 8, With Laplace s Rule EBN Model (at most 2 parents), Window = [ 9, +9], α=10 8, With Laplace s Rule EBN Model (at most 3 parents), Window = [ 6, +15], α=10 1, With Laplace s Rule False Negative Rate (%) Figure 8: Comparison of predictive accuracy for the testing data of the donor site under WMM, WAM, MDD, Cai s Tree, the 2nd-order Markov chain, the 3rd-order Markov chain and the expanded Bayesian network with at most 1, 2 and 3 parents prediction models. 72
74 Acceptor Site False Positive Rate (%) Acceptor site (Training Data) Zero order Markov Chain Model (WMM), Window = [ 27, +3], Without Laplace s Rule 1st order Markov Chain Model (WAM), Window = [ 27, +9], Without Laplace s Rule 2nd order Markov Chain Model, Window = [ 27, +9], With Laplace s Rule 3rd order Markov Chain Model, Window = [ 27, +3], With Laplace s Rule MDD, Window = [ 27, +9], With Laplace s Rule Cai_Tree Model, Window = [ 27, +9], Without Laplace s Rule EBN Model (at most 1 parent), Window = [ 27, +3], α=10 8, Without Laplace s Rule EBN Model (at most 2 parents), Window = [ 27, +9], α=10 3, Without Laplace s Rule EBN Model (at most 3 parents), Window = [ 27, +3], α=10 3, With Laplace s Rule False Negative Rate (%) Figure 9: Comparison of predictive accuracy for the training data of the donor site under WMM, WAM, MDD, Cai s Tree, the 2nd-order Markov chain, the 3rd-order Markov chain and the expanded Bayesian network with at most 1, 2 and 3 parents prediction models. 73
75 Acceptor site (Testing Data) Zero order Markov Chain Model (WMM), Window = [ 27, +3], Without Laplace s Rule 1st order Markov Chain Model (WAM), Window = [ 27, +9], Without Laplace s Rule 2nd order Markov Chain Model, Window = [ 27, +9], With Laplace s Rule 3rd order Markov Chain Model, Window = [ 27, +3], With Laplace s Rule MDD, Window = [ 27, +9], With Laplace s Rule Cai_Tree Model, Window = [ 27, +9], Without Laplace s Rule EBN Model (at most 1 parent), Window = [ 27, +3], α=10 8, Without Laplace s Rule EBN Model (at most 2 parents), Window = [ 27, +9], α=10 3, Without Laplace s Rule EBN Model (at most 3 parents), Window = [ 27, +3], α=10 3, With Laplace s Rule False Positive Rate (%) False Negative Rate (%) Figure 10: Comparison of predictive accuracy for the testing data of the donor site under WMM, WAM, MDD, Cai s Tree, the 2nd-order Markov chain, the 3rd-order Markov chain and the expanded Bayesian network with at most 1, 2 and 3 parents prediction models. 74
76 PolyA Signal PAS, Training(5 fold cross validation), Window = [ 90, +96] DG WMM WAM SMC 70 FP rate (%) in training data FN rate (%) in training data Figure 11: Comparing accuracy between different methods using training data (in region [-90, +96]). 75
77 PAS, Testing(5 fold cross validation), Window = [ 90, +96] DG WMM WAM SMC ERPIN POLYAH FP rate (%) in testing data FN rate (%) in testing data Figure 12: Comparing accuracy between different methods using testing data (in region [-90, +96]). 76
78 Discussion Dependency graph models with their expanded Bayesian networks seem to be sufficient to address the intrinsic cyclic inter-dependency between base positions in a biological signal site. 77
79 Content Sensors for Gene Structure Prediction 78
80 Content Sensors Exon sensor: to detect the coding regions, i.e., the regions where exons reside We have constructed a dependency graph with window size equal to 9 nucleotides and then created an expanded Baysian network for the exon sensor Intron sensor: to detect the non-coding regions, i.e., the regions where introns reside We have also constructed a dependency graph with window size equal to 9 nucleotides and then created an expanded Baysian network for the intron sensor 79
81 Gene Structure Prediction by Stochastic Grammar There is a stochastic grammar (encoding process) to describe the alternating exon and intron gene structure. A state diagram and the corresponding trellis diagram with proper states are created to represent the stochastic grammar (encoding process). Each state is created by a dependency graph with expanded Bayesian network Viterbi algorithm (a dynamic programming algorithm) is used to decode a path on the trellis diagram to determine the exon-intron structure of a gene. 80
82 National Tsing Hua University (NTHU) 81
83 National Tsing Hua University (NTHU) State Diagram I nt r on Exon A 27 A 2 A 1 A +1 A +9 a g E 0 I E 1 t g E 2 D +15 D +2 D +1 D 1 D 2 D 3 82
84 The trellis diagram 83
85 National Tsing Hua University (NTHU) A 1 A + 1 A + 1 A + 2 A + 2 A + 3 A + 3 A + 4 A + 4 A + 5 A + 5 A + 6 A + 6 A + 7 A + 7 A + 8 A + 8 A + 9 E E E A + 9 D 3 D 3 D 2 D 2 E E E D 1 D 1 D +1 D + 1 D + 2 D + 2 D + 3 D + 3 D + 4 D + 4 D + 5 D + 5 D + 6 D + 6 D + 7 D + 7 D + 8 D + 8 D + 9 D + 9 D+10 D + 10 D+11 D + 11 D+12 D + 12 D + 13 D +14 I D + 13 D +14 D+ 15 D + 15 A 27 A 27 A 26 A 26 A 25 A 25 A 24 A 24 A 23 A 23 A 22 A 22 A 21 A 21 A 20 A 20 A 19 A 19 A 18 A 18 A 17 A 17 A 16 A 16 A 15 A 15 A 14 A 14 A 13 A 13 A 12 A 12 A 11 A 11 A 10 A 10 A 9 A 9 A 8 A 8 A 7 A 7 A 6 A 6 A 5 A 4 A 3 A 2 I A 5 A 4 A 3 A 2 A 1 84
86 Maximum a Posteriori (MAP) Decoding Γ = γ 1 γ 2 γ N : state sequence S = s 1 s 1 s N : DNA sequence S = s (1) 1 s(1) 2 s (1) N s (2) 1 s(2) 2 s (2) N. s (k) 1 s(k) 2 s (k) N 85
87 Maximum a Posteriori (MAP) Decoding ˆΓ = arg max Γ Assumptions: P(Γ S) = arg max Γ N P(γ i γ 1,, γ i 1, S). i=1 (1)P(γ i γ 1,, γ i 1, S)) = P(γ i S) = P(γi S(i, γ i )) = P(γ i)p(s(i, γ i ) γ i ). P(S(i, γ i )) (2)P(S(i, γ i )) = l P(s (j) l ), s (j) l S(i, γ i ). 86
88 National Tsing Hua University (NTHU) Viterbi Algorithm - The Best Path A + 9 A + 9 A + 9 A + 9 A + 9 A + 9 A + 9 A + 9 A + 9 E 0 E 0 E 0 E 0 E 0 E 0 E 0 E 0 E 0 E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 1 E 2 E 2 E 2 E 2 E 2 E 2 E 2 E 2 E 2 D 3 D 3 D 3 D 3 D 3 D D 3 3 D 3 D 3 D 2 D 2 D 2 D 2 D 2 D D 2 2 D D 2 2 D 1 87
89 Measures for Predictive Accuracy Sensitivity : Sensitivity = 1 false negative rate = #T P #T P + #F N = #T P #AP. Specificity Specificity = 1 false positive rate = #T N #T N + #F P = #T N #AN. Predictive Positive Value (PPV): P P V = #T P #T P + #F P = #T P #P P. 88
90 Nucleotide-level Accuracy Total test genes: 462 AP AN PP TP= FP=7332 PN FN = TN = Sensitivity: Specificity: P.P.V.:
91 Nucleotide-level Accuracy with Start Codon and Stop Codon Total test genes: 462 AP AN PP TP= FP=7584 PN FN =72675 TN = Sensitivity: Specificity: P.P.V.:
92 Exon-level Accuracy with Start Codon and Stop Codon Total actual exons : 2843 Exactly predicted Partially predicted Overlapped Missed Total predicted exons : 2713 Exact Partial Overlapping Wrong Sensitivity: ME : P.P.V.: WE :
93 Prediction of Protein Structure 92
94 Protein 3D structure From H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp (2000) 93
95 Levels of Protein Structure Primary structure: the amino acid sequence Secondary structure: the segmentation of an amino acid sequence into α-helices, β-sheets, and loops Tertiary structure: domains and folds 94
96 Different Mapping for Protein Secondary Structure RES DSSP EHL CK KENITNLDACITRLRVSVADVSKVDQAGLKKLG STTBCEEEECSSCEEEEESCGGGCCHHHHHHTT CCCECEEEECCCCEEEEECCHHHCCHHHHHHCC CCCCCEEEECCCCEEEEECCCCCCCHHHHHHCC 95
97 Prediction of Protein Secondary Structure C: the set of all eligible protein secondary structure sequences, i.e., the code of protein secondary structure sequences Assumed to be represented by a trellis, i.e., a trellis code Each protein secondary structure sequence Q = (Q 1, Q 2,..., Q L ) of length L in C corresponds to a unique (state) path X = (X 0, X 1,..., X L 4 ) in the trellis from the initial state X 0 to the final state X L 4. For the purpose of representing interactions among amino acids 96
98 in a protein chain, we define by O = O (1) 1 O (1) 2... O (1) L O (2) 1 O (2) 2... O (2) L O (k) 1 O (k) 2... O (k) L the observation template of the unconnected Bayesian network of k layers. The template O of k layers is fully defined given the observed amino acid sequence O = (O 1, O 2,..., O L ) by assigning O (j) i = O i for all depth j for all i. Maximum-Likelihood Decoding chooses the largest 97
99 ˆQ = arg max Q P (Q O) as the output, where P (Q O) = P (Q 1, Q 2, Q L O) = P (Q 1, Q 2, Q 3 S) P (Q 4, Q 5, Q L O, Q 1, Q 2, Q 3 ) = P (Q1, Q 2, Q 3 O) L 2 i=4 P (Q i O, Q i 1 )P (Q L 1, Q L O, Q L 2 ) 98
100 For the initial states: P (Q 1, Q 2, Q 3 O) = P (Q1, Q 2, Q 3 O [1:lI ]) = P (Q 1, Q 2, Q 3 )P (O [1:lI ] Q 1, Q 2, Q 3 ) P (O [1:lI ]) where l I is the window size of an initial state. For the intermidiate states: P (Q i O, Q i 1 ) = P (Qi O [i ll :i+l R 1], Q i 1 ) = P (Q i Q i 1 )P (O [i ll :i+l R 1] Q i, Q i 1 ) P (O [i ll :i+l R 1] Q i 1 ) where l L and l R are denoted as the start and end of the window of an internal state, respectively. 99
101 For the terminal states: P (Q L 1, Q L O, Q L 2 ) = P (QL 1, Q L O [L lt +1:L], Q L 2 ) = P (Q L 1, Q L Q L 2 )P (O [L lt +1:L] Q L 2, Q L 1, Q L ) P (O [L lt +1:L] Q L 2 ) where l T is the window size of a terminal state. The decoding can be fulfilled by applying Viterbi algorithm to compute the value of ˆQ = arg max P (Q O) given an amino acid Q sequence O. 100
102 Factor Graph for Protein Secondary Structure Prediction Since the observed amino acid sequence O is fixed in the annotating (decoding) problem, the observation template O is thus considered fixed. The global function of this annotating (decoding) problem is selected as the APP g(x 0,..., X L 4 ) P (X 0, X 1,..., X L 4 O) The global function g can be further factored into a product of 101
103 several local functions: g(x 0,..., X L 4 ) = P (X 0 O)P (X 1 X 0, O) P (X L 4 X 0, X 1,..., X L 5, O) L 4 i=1 I i (X i 1, X i ) = P (X0 O)P (X 1 O) P (X L 4 O) L 4 i=1 = f 1 (X 0 )f 2 (X 0, X 1 )f 3 (X 1 )f 4 (X 1, X 2 ) I i (X i 1, X i ) f 2L 9 (X L 5 )f 2L 8 (X L 5, X L 4 )f 2L 7 (X L 4 ) where f 2i (X i 1, X i ) = I i (X i 1, X i ) is an indicator function of the local behavior of the ith trellis section that constrains the possible combinations of X i 1 and X i. 102
104 For f 1 (X 0 ), For f 2i+1 (X i ), 1 i L 5, f 1 (X 0 ) = P (X 0 O) = P (X0 O [1:lI ]) = P (X 0)P (O [1:lI ] X 0 ) P (O [1:lI ]) f 2i+1 (X i ) = P (X i O) = P (Xi O [(i+3) ll :(i+3)+l R 1]) = P (X i)p (O [(i+3) ll :(i+3)+l R 1] X i ) P (O [(i+3) ll :(i+3)+l R 1]) 103
105 For f 2L 7 (X L 4 ), f 2L 7 (X L 4 ) = P (X L 4 O) = P (XL 4 O [L lt +1:L]) = P (X L 4)P (O [L lt +1:L] X L 4 ) P (O [L lt +1:L]) 104
106 The factor graph for the annotating problem is as follows: X 0 X 1 X 2 X L 5 X L 4 f 2 f 4 f2l 8 f 1 f 3 f5 2L 9 f f2l 7 105
107 The Sum-Product Update Rule The message sent between one variable vertex and one function vertex in the sum-product algorithm is an updated message subject to the following sum-product update rule: Variable vertex to function vertex µ x f (x) = µ h x (x), h n(x)\{f} Function vertex to variable vertex µ f x (x) = f(n f ) µ y f (y) x. y N f \{x} 106
108 When the factor graph is cycle-free, the sum-product algorithm is guaranteed to give g(v ) v = µ f v (v), v V, f n(v) after a suitable message passing schedule is done. 107
109 Measure for Secondary Structure Prediction Accuracy Let M ij be the number of residues observed in state i and predicted in state j, with i and j {H, E, L}, and the total number of residues is simply N = i,j M ij The three-state per-residue accuracy Q 3 is thus defined as i Q 3 = 100 M ii N 108
110 Prediction Results Table of prediction results for PDBselect25 data set with EHL- and CK-mapping in Q 3 measure: L EHL(vi) EHL(sp) CK(vi) CK(sp) % 62.96% 63.61% 64.91% % 63.60% 64.16% 65.35% % 63.86% 65.17% 65.84% L: number of secondary structures in the initial state or terminal state vi: decoded by Viterbi algorithm sp: decoded by sum-product algorithm 109
111 Table of prediction results for CB513 data set with EHL- and CK-mapping in Q 3 measure: L EHL(vi) EHL(sp) CK(vi) CK(sp) % 60.00% 60.81% 62.28% % 59.91% 61.26% 62.62% % 58.94% 61.03% 62.24% 110
112 Current Challenging Problems in Bioinformatics Genomics Novel gene discovery Alternative splicing Gene regulatory networks Cell cycle Development Disease finding Proteomics Prediction of structures and functions of proteins Drug design 111
113 Metabolic pathways Construction of pathways Cellular functions Systems biology 112
114 Systems Biology From Neil Campbell, Jane Reece, and Larry Mitchell, Biology, 5th ed. (Menlo Park, CA: Addison Wesley Longman, 1999) c Addison Wesley Longman, Inc. 113
115 Bioinformatics is Not Just Computer Algorithms!! Information theory Coding theory Communication theory Signal processing and linguistics System (control) theory Statistics, probability theory and stochastic processes Combinatorics and graph theory 114
116 Co-investigators Dr. Wen-Hsiung Li at the Department of Ecology and Evolution, University of Chicago, USA. Te-Ming Chen, Chao-Chung Chang, Chen-Wei Hsu, Yun Lee, Chiung-Wen He at the Department of Electrical Engineering, National Tsing Hua University, Taiwan. 115
117 Thank You Very Much 116
Objective: You will be able to justify the claim that organisms share many conserved core processes and features.
Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved
More informationIn previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions.
In previous lecture Shannon s information measure H ( X ) p log p log p x x 2 x 2 x Intuitive notion: H = number of required yes/no questions. The basic information unit is bit = 1 yes/no question or coin
More informationAoife McLysaght Dept. of Genetics Trinity College Dublin
Aoife McLysaght Dept. of Genetics Trinity College Dublin Evolution of genome arrangement Evolution of genome content. Evolution of genome arrangement Gene order changes Inversions, translocations Evolution
More informationFrom gene to protein. Premedical biology
From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,
More informationUsing an Artificial Regulatory Network to Investigate Neural Computation
Using an Artificial Regulatory Network to Investigate Neural Computation W. Garrett Mitchener College of Charleston January 6, 25 W. Garrett Mitchener (C of C) UM January 6, 25 / 4 Evolution and Computing
More informationGenetic Code, Attributive Mappings and Stochastic Matrices
Genetic Code, Attributive Mappings and Stochastic Matrices Matthew He Division of Math, Science and Technology Nova Southeastern University Ft. Lauderdale, FL 33314, USA Email: hem@nova.edu Abstract: In
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationA p-adic Model of DNA Sequence and Genetic Code 1
ISSN 2070-0466, p-adic Numbers, Ultrametric Analysis and Applications, 2009, Vol. 1, No. 1, pp. 34 41. c Pleiades Publishing, Ltd., 2009. RESEARCH ARTICLES A p-adic Model of DNA Sequence and Genetic Code
More informationReading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype
Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed
More informationA modular Fibonacci sequence in proteins
A modular Fibonacci sequence in proteins P. Dominy 1 and G. Rosen 2 1 Hagerty Library, Drexel University, Philadelphia, PA 19104, USA 2 Department of Physics, Drexel University, Philadelphia, PA 19104,
More informationVideos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.
Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable
More informationLesson Overview. Ribosomes and Protein Synthesis 13.2
13.2 The Genetic Code The first step in decoding genetic messages is to transcribe a nucleotide base sequence from DNA to mrna. This transcribed information contains a code for making proteins. The Genetic
More informationReducing Redundancy of Codons through Total Graph
American Journal of Bioinformatics Original Research Paper Reducing Redundancy of Codons through Total Graph Nisha Gohain, Tazid Ali and Adil Akhtar Department of Mathematics, Dibrugarh University, Dibrugarh-786004,
More informationLecture IV A. Shannon s theory of noisy channels and molecular codes
Lecture IV A Shannon s theory of noisy channels and molecular codes Noisy molecular codes: Rate-Distortion theory S Mapping M Channel/Code = mapping between two molecular spaces. Two functionals determine
More informationGenetic code on the dyadic plane
Genetic code on the dyadic plane arxiv:q-bio/0701007v3 [q-bio.qm] 2 Nov 2007 A.Yu.Khrennikov, S.V.Kozyrev June 18, 2018 Abstract We introduce the simple parametrization for the space of codons (triples
More informationSEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA
SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.
More informationUNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level
*1166350738* UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certifi cate of Education Advanced Subsidiary Level and Advanced Level CEMISTRY 9701/43 Paper 4 Structured Questions October/November
More informationFrom Gene to Protein
From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed
More information1. In most cases, genes code for and it is that
Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationRNA Processing: Eukaryotic mrnas
RNA Processing: Eukaryotic mrnas Eukaryotic mrnas have three main parts (Figure 13.8): 5! untranslated region (5! UTR), varies in length. The coding sequence specifies the amino acid sequence of the protein
More informationThe degeneracy of the genetic code and Hadamard matrices. Sergey V. Petoukhov
The degeneracy of the genetic code and Hadamard matrices Sergey V. Petoukhov Department of Biomechanics, Mechanical Engineering Research Institute of the Russian Academy of Sciences petoukhov@hotmail.com,
More informationBiology 155 Practice FINAL EXAM
Biology 155 Practice FINAL EXAM 1. Which of the following is NOT necessary for adaptive evolution? a. differential fitness among phenotypes b. small population size c. phenotypic variation d. heritability
More informationFrom DNA to protein, i.e. the central dogma
From DNA to protein, i.e. the central dogma DNA RNA Protein Biochemistry, chapters1 5 and Chapters 29 31. Chapters 2 5 and 29 31 will be covered more in detail in other lectures. ph, chapter 1, will be
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationSlide 1 / 54. Gene Expression in Eukaryotic cells
Slide 1 / 54 Gene Expression in Eukaryotic cells Slide 2 / 54 Central Dogma DNA is the the genetic material of the eukaryotic cell. Watson & Crick worked out the structure of DNA as a double helix. According
More informationC CH 3 N C COOH. Write the structural formulas of all of the dipeptides that they could form with each other.
hapter 25 Biochemistry oncept heck 25.1 Two common amino acids are 3 2 N alanine 3 2 N threonine Write the structural formulas of all of the dipeptides that they could form with each other. The carboxyl
More informationPROTEIN SYNTHESIS INTRO
MR. POMERANTZ Page 1 of 6 Protein synthesis Intro. Use the text book to help properly answer the following questions 1. RNA differs from DNA in that RNA a. is single-stranded. c. contains the nitrogen
More informationBME 5742 Biosystems Modeling and Control
BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various
More informationومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury
Translation 1 DONE BY :Maen Faoury 0 1 ومن أحياها Translation 1 2 ومن أحياها Translation 1 In this lecture and the coming lectures you are going to see how the genetic information is transferred into proteins
More informationRanjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013
Hydration of protein-rna recognition sites Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India 1 st November, 2013 Central Dogma of life DNA
More informationGENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications
1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription
More informationRNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA
RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order
More informationMathematics of Bioinformatics ---Theory, Practice, and Applications (Part II)
Mathematics of Bioinformatics ---Theory, Practice, and Applications (Part II) Matthew He, Ph.D. Professor/Director Division of Math, Science, and Technology Nova Southeastern University, Florida, USA December
More informationGENETICS - CLUTCH CH.11 TRANSLATION.
!! www.clutchprep.com CONCEPT: GENETIC CODE Nucleotides and amino acids are translated in a 1 to 1 method The triplet code states that three nucleotides codes for one amino acid - A codon is a term for
More informationCHEMISTRY 9701/42 Paper 4 Structured Questions May/June hours Candidates answer on the Question Paper. Additional Materials: Data Booklet
Cambridge International Examinations Cambridge International Advanced Level CHEMISTRY 9701/42 Paper 4 Structured Questions May/June 2014 2 hours Candidates answer on the Question Paper. Additional Materials:
More informationOrganic Chemistry Option II: Chemical Biology
Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching
More informationA Minimum Principle in Codon-Anticodon Interaction
A Minimum Principle in Codon-Anticodon Interaction A. Sciarrino a,b,, P. Sorba c arxiv:0.480v [q-bio.qm] 9 Oct 0 Abstract a Dipartimento di Scienze Fisiche, Università di Napoli Federico II Complesso Universitario
More informationLaith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First
Laith AL-Mustafa Protein synthesis Nabil Bashir 10\28\2015 http://1drv.ms/1gigdnv 01 First 0 Protein synthesis In previous lectures we started talking about DNA Replication (DNA synthesis) and we covered
More informationMolecular Biology - Translation of RNA to make Protein *
OpenStax-CNX module: m49485 1 Molecular Biology - Translation of RNA to make Protein * Jerey Mahr Based on Translation by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative
More informationChapter 17. From Gene to Protein. Biology Kevin Dees
Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting
More informationTranslation. Genetic code
Translation Genetic code If genes are segments of DNA and if DNA is just a string of nucleotide pairs, then how does the sequence of nucleotide pairs dictate the sequence of amino acids in proteins? Simple
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationProtein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.
Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,
More informationChapters 12&13 Notes: DNA, RNA & Protein Synthesis
Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationNatural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky
It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp
More informationProtein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.
Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps
More informationMolecular Biology of the Cell
Alberts Johnson Lewis Raff Roberts Walter Molecular Biology of the Cell Fifth Edition Chapter 6 How Cells Read the Genome: From DNA to Protein Copyright Garland Science 2008 Figure 6-1 Molecular Biology
More informationThe genetic code, 8-dimensional hypercomplex numbers and dyadic shifts. Sergey V. Petoukhov
The genetic code, 8-dimensional hypercomplex numbers and dyadic shifts Sergey V. Petoukhov Head of Laboratory of Biomechanical System, Mechanical Engineering Research Institute of the Russian Academy of
More informationA Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine
A Mathematical Model of the Genetic Code, the Origin of Protein Coding, and the Ribosome as a Dynamical Molecular Machine Diego L. Gonzalez CNR- IMM Is)tuto per la Microele4ronica e i Microsistemi Dipar)mento
More informationName: SBI 4U. Gene Expression Quiz. Overall Expectation:
Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):
More informationAbstract Following Petoukhov and his collaborators we use two length n zero-one sequences, α and β,
Studying Genetic Code by a Matrix Approach Tanner Crowder 1 and Chi-Kwong Li 2 Department of Mathematics, The College of William and Mary, Williamsburg, Virginia 23185, USA E-mails: tjcrow@wmedu, ckli@mathwmedu
More informationUNIT 5. Protein Synthesis 11/22/16
UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA
More informationInterpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the
More informationTranslation. A ribosome, mrna, and trna.
Translation The basic processes of translation are conserved among prokaryotes and eukaryotes. Prokaryotic Translation A ribosome, mrna, and trna. In the initiation of translation in prokaryotes, the Shine-Dalgarno
More informationMultiple Choice Review- Eukaryotic Gene Expression
Multiple Choice Review- Eukaryotic Gene Expression 1. Which of the following is the Central Dogma of cell biology? a. DNA Nucleic Acid Protein Amino Acid b. Prokaryote Bacteria - Eukaryote c. Atom Molecule
More information1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.
Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.
More informationThe Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11
The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding
More informationOrganization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=
More informationTypes of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.
RNAs L.Os. Know the different types of RNA & their relative concentration Know the structure of each RNA Understand their functions Know their locations in the cell Understand the differences between prokaryotic
More information9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes
Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin
More informationThree-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino Acids
Life 2014, 4, 341-373; doi:10.3390/life4030341 Article OPEN ACCESS life ISSN 2075-1729 www.mdpi.com/journal/life Three-Dimensional Algebraic Models of the trna Code and 12 Graphs for Representing the Amino
More informationMolecular Biology of the Cell
Alberts Johnson Lewis Raff Roberts Walter Molecular Biology of the Cell Fifth Edition Chapter 6 How Cells Read the Genome: From DNA to Protein Copyright Garland Science 2008 Figure 6-1 Molecular Biology
More informationMolecular Genetics Principles of Gene Expression: Translation
Paper No. : 16 Module : 13 Principles of gene expression: Translation Development Team Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi Paper Coordinator: Prof.
More informationAdvanced Topics in RNA and DNA. DNA Microarrays Aptamers
Quiz 1 Advanced Topics in RNA and DNA DNA Microarrays Aptamers 2 Quantifying mrna levels to asses protein expression 3 The DNA Microarray Experiment 4 Application of DNA Microarrays 5 Some applications
More informationIntroduction to the Ribosome Overview of protein synthesis on the ribosome Prof. Anders Liljas
Introduction to the Ribosome Molecular Biophysics Lund University 1 A B C D E F G H I J Genome Protein aa1 aa2 aa3 aa4 aa5 aa6 aa7 aa10 aa9 aa8 aa11 aa12 aa13 a a 14 How is a polypeptide synthesized? 2
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationIntroduction to molecular biology. Mitesh Shrestha
Introduction to molecular biology Mitesh Shrestha Molecular biology: definition Molecular biology is the study of molecular underpinnings of the process of replication, transcription and translation of
More informationPractical Bioinformatics
5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o
More informationCrick s early Hypothesis Revisited
Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More information-14. -Abdulrahman Al-Hanbali. -Shahd Alqudah. -Dr Ma mon Ahram. 1 P a g e
-14 -Abdulrahman Al-Hanbali -Shahd Alqudah -Dr Ma mon Ahram 1 P a g e In this lecture we will talk about the last stage in the synthesis of proteins from DNA which is translation. Translation is the process
More informationEnergy and Cellular Metabolism
1 Chapter 4 About This Chapter Energy and Cellular Metabolism 2 Energy in biological systems Chemical reactions Enzymes Metabolism Figure 4.1 Energy transfer in the environment Table 4.1 Properties of
More informationGene regulation II Biochemistry 302. February 27, 2006
Gene regulation II Biochemistry 302 February 27, 2006 Molecular basis of inhibition of RNAP by Lac repressor 35 promoter site 10 promoter site CRP/DNA complex 60 Lewis, M. et al. (1996) Science 271:1247
More informationSupplementary Information for
Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford
More informationMolecular Biology (9)
Molecular Biology (9) Translation Mamoun Ahram, PhD Second semester, 2017-2018 1 Resources This lecture Cooper, Ch. 8 (297-319) 2 General information Protein synthesis involves interactions between three
More informationChapter
Chapter 17 17.4-17.6 Molecular Components of Translation A cell interprets a genetic message and builds a polypeptide The message is a series of codons on mrna The interpreter is called transfer (trna)
More informationDNA Feature Sensors. B. Majoros
DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length
More informationATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES *
Symmetry: Culture and Science Vols. 14-15, 281-307, 2003-2004 ATTRIBUTIVE CONCEPTION OF GENETIC CODE, ITS BI-PERIODIC TABLES AND PROBLEM OF UNIFICATION BASES OF BIOLOGICAL LANGUAGES * Sergei V. Petoukhov
More informationSection 7. Junaid Malek, M.D.
Section 7 Junaid Malek, M.D. RNA Processing and Nomenclature For the purposes of this class, please do not refer to anything as mrna that has not been completely processed (spliced, capped, tailed) RNAs
More informationTranslation Part 2 of Protein Synthesis
Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation
More informationQuiz answers. Allele. BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA)
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 17: The Quiz (and back to Eukaryotic DNA) http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Quiz answers Kinase: An enzyme
More informationTranslation and the Genetic Code
Chapter 11. Translation and the Genetic Code 1. Protein Structure 2. Components required for Protein Synthesis 3. Properties of the Genetic Code: An Overview 4. A Degenerate and Ordered Code 1 Sickle-Cell
More informationTranslation and Operons
Translation and Operons You Should Be Able To 1. Describe the three stages translation. including the movement of trna molecules through the ribosome. 2. Compare and contrast the roles of three different
More informationIntroduction to Hidden Markov Models (HMMs)
Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation
More informationEukaryotic Gene Expression: Basics and Benefits Prof. P N RANGARAJAN Department of Biochemistry Indian Institute of Science Bangalore
Eukaryotic Gene Expression: Basics and Benefits Prof. P N RANGARAJAN Department of Biochemistry Indian Institute of Science Bangalore Module No #04 Lecture No # 12 Eukaryotic gene Regulation: Co-transcriptional
More informationCODING A LIFE FULL OF ERRORS
CODING A LIFE FULL OF ERRORS PITP ϕ(c 5 ) c 3 c 4 c 5 c 6 ϕ(c 1 ) ϕ(c 2 ) ϕ(c 3 ) ϕ(c 4 ) ϕ(c i ) c i c 7 c 8 c 9 c 10 c 11 c 12 IAS 2012 PART I What is Life? (biological and artificial) Self-replication.
More informationTRANSLATION: How to make proteins?
TRANSLATION: How to make proteins? EUKARYOTIC mrna CBP80 NUCLEUS SPLICEOSOME 5 UTR INTRON 3 UTR m 7 GpppG AUG UAA 5 ss 3 ss CBP20 PABP2 AAAAAAAAAAAAA 50-200 nts CYTOPLASM eif3 EJC PABP1 5 UTR 3 UTR m 7
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More information9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes
Molecular and Cellular Biology Animal Cell ((eukaryotic cell) -----> compare with prokaryotic cell) ENDOPLASMIC RETICULUM (ER) Rough ER Smooth ER Flagellum Nuclear envelope Nucleolus NUCLEUS Chromatin
More informationLect. 19. Natural Selection I. 4 April 2017 EEB 2245, C. Simon
Lect. 19. Natural Selection I 4 April 2017 EEB 2245, C. Simon Last Time Gene flow reduces among population variability, reduces structure Interaction of climate, ecology, bottlenecks, drift, and gene flow
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationTHE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET
Symmetry: Culture and Science Vol. 25, No. 3, 261-278, 2014 THE GENETIC CODE INVARIANCE: WHEN EULER AND FIBONACCI MEET Tidjani Négadi Address: Department of Physics, Faculty of Science, University of Oran,
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationIntroduction to Molecular and Cell Biology
Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What
More informationNatural Selection. Nothing in Biology makes sense, except in the light of evolution. T. Dobzhansky
It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp
More informationRegulation of Gene Expression
Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley
More information