Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective

Size: px
Start display at page:

Download "Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective"

Transcription

1 Jacobs University Bremen Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective Semester Project II By: Dawit Nigatu Supervisor: Prof. Dr. Werner Henkel Transmission Systems Group (TrSyS) School of Engineering and Science October 2013

2 JACOBS UNIVERSITY BREMEN Abstract School of Engineering and Science Encoding of Amino Acids and Proteins from a Communications and Information Theoretic Perspective by Dawit Nigatu This research contains two separate parts. In the first part, we have used classical multidimensional scaling (CMD) technique to scale down a 64-dimensional empirical codon mutation (ECM) matrix and a 20-dimensional chemical distance matrix to two dimensions (2-D). The 2-D plots of ECM show that most mutations occur between codons that encode the same amino acid, i.e., the changes from one codon to another will not change the amino acid to be produced. Furthermore, most of the highly probable inter-amino acid mutations will not result in a dramatic change of chemical properties. However, we have seen some inconsistencies in comparing the 2-D plots of ECM and chemical distance matrices, in which codons near to each other in mutation distance have a significant difference in chemical properties. This may lead to a severe effect, and hence the results point out that some protection mechanism is needed to counteract. In addition, the arrangement of the amino acids is very much in line with the so-called Taylor classification. In the second part of the research, we have focused on investigating the relationship between Shannon and Boltzmann entropies using the complete genome sequence of the bacteria E. coli. There are positions in which parallel and anti parallel relationships exist. We have found that around the terminus, the two entropies seem to have an opposite trend with high Shannon and low Boltzmann entropies, meaning that the sequence is more random and at the same time less stable. In general, the Boltzmann entropy decreases as we move along the gene from the origin to the terminus. Furthermore, with the cooperation with a molecular biology colleague, we have compared the entropies with the number of different types of functional genes (anabolic, catabolic, aerobic, and anaerobic) located at the same positions. We have seen that there is a strong similarity between the distribution of anabolic genes and the two entropies.

3 Contents Abstract i List of Figures iii 1 Introduction Basic Theoretical Background DNA The Central Dogma Organization of the Report Dimension Reduction of Evolutionary and Chemical Distance Matrices Evolutionary Substitution and Chemical Distance Matrices Classical Multidimensional Scaling Result and Discussion Relation Between Boltzmann and Shannon Entropy Introduction Boltzmann Entropy and Distribution Laws of Thermodynamics First Law of Thermodynamics Second Law of Thermodynamics Ideal Gas Law Entropy of a Gas Macroscopic View Microscopic View: Boltzmann Entropy Boltzmann Distribution Gibbs Entropy Formula Entropy of an Ideal Gas Entropy of the E. coli Genome Result and Discussion Conclusions 26 A Additional Plots 27 Bibliography 29 ii

4 List of Figures 1.1 The structure of DNA Central dogma of molecular biochemistry with enzymes Codon-amino acid encoding chart D plot of the mutation distance matrix D plot of the chemical distance matrix Taylor classification of amino acids D plot of the mutation distance matrix Adiabatic expansion of a gas at constant temperature Boltzmann and Shannon entropies of E. coli genome, 2bp block Boltzmann and Shannon entropies of E. coli genome, 3bp block Number of anabolic genes with Boltzmann and Shannon entropies Number of anabolic genes with difference of Boltzmann and Shannon entropies Number of catabolic genes with Boltzmann and Shannon entropies Number of catabolic genes with difference of Boltzmann and Shannon entropies Number of aerobic genes with Boltzmann and Shannon entropies Number of aerobic genes with difference of Boltzmann and Shannon entropies Number of anaerobic genes with Boltzmann and Shannon entropies Number of anaerobic genes with difference of Boltzmann and Shannon entropies A.1 Boltzmann and Shannon entropies of E. coli genome, 4bp block A.2 Boltzmann and Shannon entropies of E. coli genome, 5bp block A.3 Boltzmann and Shannon entropies of E. coli genome, 6bp block iii

5 Chapter 1 Introduction 1.1 Basic Theoretical Background DNA Deoxyribonucleic acid (DNA) is a double stranded structure found in all cells, containing the genetic information of the living organism. It consists of building blocks called nucleotides. The nucleotides are made of sugar phosphate backbone and one of the four nitrogenous bases attached to the sugars. These bases are called Adenine, Thymine, Cytosine, and Guanine (A, T, C, G). For the DNA to have the double helix structure, the nucleotides are linked together into chains. A figure showing the structure of the DNA is presented in Fig The two strands are complementary to each other. According to the Watson-Crick pairing rule A is always paired with T and G is always paired with C [2]. This means, if we know the sequence of nucleotides on one strand, the sequence in the complementary strand is known right away. The bases are attached by hydrogen bonds. GC pairs have three hydrogen bonds whereas AT pairs have two hydrogen bonds. The additional hydrogen bond makes the GC pairs more stable than AT pairs The Central Dogma Francis Crick [3] states that flow of biologic information is from DNA towards proteins and called the process the central dogma of molecular biology (Fig. 1.2). The sequences of bases aligned in a segment of a DNA, called a gene, carry the directions for building proteins that have special functions in the cell. Protein synthesis consists of two steps, transcription and translation. The RNA (ribonucleic acid) polymerase enzyme 1

6 Chapter 1. Introduction 2 Cell nucleus Base pairs [ Base pairs [ Adenine Thymine, Guanine Cytosine DNA's Double Helix. DNA molecules are found inside the cell's nucleus, tightly packed into chromosomes. Scientists use the term "double helix" to describe DNA's winding, two-stranded chemical structure. Alternating sugar and phosphate groups form the helix's two parallel strands, which run in opposite directions. Nitrogen bases on the two strands chemically pair together to form the interior, or the backbone of the helix. The base adenine (A) always pairs with thymine (T), while guanine (G) always pairs with cytosine (C). Figure 1.1: The structure of DNA [1]. unwinds the DNA molecule and the transcription process begins. In transcription the gene sequence is copied into Messenger RNA (mrna) using the template strand of the DNA. Messenger RNA is a single stranded molecule similar with DNA except for the base Thymine (T) is replaced by Uracil (U). In the translation phase, the ribosome translates the sequence of mrna molecule to amino acids, reading the sequence in groups of three bases (codons). There are 20 naturally occurring amino acids. The chart in Fig. 1.3 shows the codon to amino acid translation. The process starts when the smaller ribosomal subunit is attached to the translation initiation site, usually AUG. Then, the transfer RNA (trna) binds to the mrna. The trna contains an anticodon complementary to the mrna to which it binds and the corresponding amino acid is attached to it. Next, the large ribosomal subunit binds to create the P-site (peptidyl) and A-site (aminoacyl). The first trna occupies the P-site and the second trna enters to the A-site. After that, the trna at the P-site transfers the amino acid it carries to the second trna at the A-site and exits. Finally, the ribosome moves along the mrna and the next trna enters. This

7 Chapter 1. Introduction 3 Figure 1.2: Central dogma of molecular biochemistry with enzymes [4] Figure 1.3: Codon-amino acid encoding chart [5]. process will continue until a stop codon (UAG, UAA, or UGA) signals the end of the mrna molecule. Lastly, the amino acids are connected by a peptide bond and folded in a certain way to create proteins.

8 Chapter 1. Introduction Organization of the Report In Chapter 2, we first present the different types of evolutionary substitution matrices and the chemical distance matrix followed by the mathematics behind classical multidimensional scaling. Then, the results of the dimension reduction are presented and discussed. In Chapter 3, the proofs of Boltzmann entropy and Boltzmann distribution are described. Thereafter, the Shannon and Boltzmann entropies of E. coli genome are computed, presented, and discussed. Finally, the conclusions are presented in Chapter 4.

9 Chapter 2 Dimension Reduction of Evolutionary and Chemical Distance Matrices 2.1 Evolutionary Substitution and Chemical Distance Matrices There are several substitution matrices providing the mutational change of one amino acid by another inside protein sequences. The first of such matrices is the point accepted mutations (PAM) matrix which is obtained by counting the number of replacements and computing the mutation probabilities from a database of aligned protein sequences [6]. However, if the protein sequences are on a different part of the phylogenetic tree, the PAM matrix is not efficient. The other type that overcomes the shortcomings of the PAM matrix is the BLOSUM matrix (Block Substitution Matrix), which uses blocks of aligned protein segments [7]. The third type based on amino acid substitutions is called the WAG substitution matrix [8]. The WAG matrix utilizes a large database of aligned proteins of different families and uses a maximum-likelihood technique to derive the substitution scores. The evolutionary models mentioned so far are based on amino acid substitutions. Besides, there are also models which describe codon to codon substitutions. One of them is the empirical codon mutation (ECM) matrix proposed by Schneider et al. [9]. For developing the ECM matrix, 8.3 million aligned codons from five vertebrates were used to tally the number of substitutions and derive the mutational probabilities. Since the transitions to stop codons are not considered, the matrix contains a block diagonal 5

10 Chapter 2. Multidimensional Scaling (MDS) entries for the three stop codons separated from matrix outside the stop codons. The ECM matrix provides an extra edge by providing the transitions between codons encoding the same amino acid, in addition to transitions leading to different ones. Hence, we have used this matrix for the rest of our work. Grantham s chemical distance matrix takes into account the three chemical properties (composition, polarity, and molecular volume) which have a strong correlation with the substitution frequencies. The matrix presents a mechanism to identify the difference between amino acids. The distance between amino acids is computed by making the three chemical properties as an axis in Euclidean space. We would like to compare how these chemical properties relate to the mutation probabilities. Since the matrices are of 64 and 20 dimensions, we have to apply the dimension reduction technique to bring it down to 2 or 3 dimensions for easy comparisons and to see if some kind of clustering will appear. More importantly, we would like to see the severeness of mutational changes, which is visible in the chemical properties. For reducing the dimensions of ECM and chemical distance matrices, we used a technique called classical multidimensional scaling (CMD), which will be presented in the following section. 2.2 Classical Multidimensional Scaling In this section, the mathematics behind CMD technique will be described. The reference used for this section is [10]. Assume that we have observed n n Euclidean distance matrix D = [d ij ] derived from a raw n p data matrix X. With CMD, the aim is to recover the original data matrix of n points in p dimensions from the distance matrix. However, since distances are invariant to change in location, rotation, and reflections, the original data cannot be fully retrieved. Define an n n matrix B such that B = XX T. (2.1) The elements of B are given by p b ij = x ik x jk. (2.2) k=1

11 Chapter 2. Multidimensional Scaling (MDS) 7 Similarly, since D is a distance matrix, the squared Euclidean distances can be written as d 2 ij = = p (x ik x jk ) 2, k=1 p p p x 2 ik + x 2 jk 2 x ik x jk, k=1 k=1 k=1 = b ii + b jj 2b ij. (2.3) At this point, If we can rewrite the b ij s in terms of the d ij s, X can be derived from B. However, unless a location constraint is introduced, a unique solution cannot be found to determine B from D. Commonly, the center of the columns of X are set to the origin, i.e., n x ik = 0, k. (2.4) i=1 The added constraint will also mean that the sum of the terms in any row of B is zero. Let T be the trace of B and observe that n d 2 ij = T + nb jj, (2.5) i=1 n d 2 ij = nb ii + T, (2.6) j=1 n i=1 j=1 n d 2 ij = 2nT. (2.7) Solving for b ij, b ij = 1 d 2 ij 1 2 n n d 2 ij 1 n j=1 n d 2 ij + 1 n 2 i=1 n n i=1 j=1 d 2 ij (2.8) Applying singular value decomposition (SVD) on B, B = VΛV = VΛ Λ V. (2.9) Using only the 2 (or 3) biggest eigenvalues, λ 1 and λ 2 and the corresponding eigenvectors u 1 and u 2 we obtain X = V 1 Λ 1 2 1, (2.10) [ ] λ1 0 where Λ 1 = and V 1 = [u 1 u 2 ]. 0 λ 2

12 Chapter 2. Multidimensional Scaling (MDS) Result and Discussion To apply the CMD method, we need to convert the mutation probabilities in the ECM matrix to some form of Euclidean distance measure. To do so, we have assumed a Gaussian model and computed the codon based distances from the pairwise error probability expression given by P ij = 1 ( ) 2 erfc Dij 2σ, (2.11) Where σ is a standard deviation. We have assumed a constant standard deviation for the mutation distances. The two dimensional (2-D) plots of the mutation and chemical distance matrices are shown in figures 2.1 and 2.2, respectively. The codons encoding the same amino acid are bundled together. Also, the clusterings of amino acids are mostly consistent with Taylor classification shown in Fig. 2.3, which classifies amino acids based on their physiochemical properties [11]. Using these observations we can deduce that most of the mutational changes will not lead to a significant change of chemical properties. However, there are also some inconsistencies where lower mutation distances come together with higher chemical distances and vice versa. The results can also be used as references to apply some sort of protection for high mutation probabilities with higher chemical differences. The inconsistencies are listed below. Large chemical distance but small mutation distance C with all others G with E S with {P,T,A} {D,N} with E {D,N} with G {Q,H} with {W,Y} K with N Small chemical distance but large mutation distance {W,Y} with {F,L,M,I,V} {P,T,A} with {Q,H,R}

13 Chapter 2. Multidimensional Scaling (MDS) CGC CGA AGG AGA CGT CGG AAG R AAA K CAC CAT CAG CAA Q H TGC TGG TGT W c 0 ATA CCG ATC AAT N P AAC CCT ATT 20 CCCCCA GTG GAG E AGC GAA GTA GTC TCC AGT TCT GAC GAT S V TCG GGG TCA GTT 40 ACG D GGA GGC ACA T ACT GGT ACC G GCC GCG GCT GCA 60 A TAC TAT Y M ATG Figure 2.1: 2-D plot of the mutation distance matrix. F CTC TTC CTA TTT CTG CTT TTA TTG I L C S N G Q P T A V M Y L I F W 40 D E H R 60 K Figure 2.2: 2-D plot of the chemical distance matrix.

14 Chapter 2. Multidimensional Scaling (MDS) 10 Figure 2.3: Taylor classification of amino acids [12]. The CMD method works best if the eigenvalues used for reconstruction are very large compared to the unused eigenvalues. However, in our case the eigenvalues are not decaying very quickly, and hence the error in 2-D representation is significant, with a root mean squared error of around half the mean distance. For this reason, we will try to improve the performance by applying another better dimension reduction and clustering method in a future work. The 3-D plot of the ECM mutation matrix is shown in Fig R CGT CGC CGA CGG AGA AGG CAC CAT Q 0 H AAG AAA CAG CAA K 20 TGG AAC AAT D 40 TAC TAT TGC TCC CTC TCT CTTTTA TCG TTG N CCC TCA CTG CCT CTA AGC CCG CCA AGT GCC ACC GAC GGC P ACT GCT GAT GGT ACG ATG GCG T GGG ACA GAG GCA M GGA GAA A W E 60 TGT Y G 60 C 40 S 20 TTC TTT L F 0 I ATC ATT GTC ATA GTT GTG GTA 20 V Figure 2.4: 3-D plot of the mutation distance matrix.

15 Chapter 3 Relation Between Boltzmann and Shannon Entropy 3.1 Introduction DNA is a double sequence of nucleotides based on a 4-letter alphabet called Adenine, Thymine, Cytosine, and Guanine (A, T, C, G) in which the second sequence is complementary to the first one. For a sequence of such kinds, the Shannon entropy gives an average measure of information obtained from the distribution of the symbols (words) of the source. In addition, the sequence in which these four letters are aligned in the DNA is a major factor determining the stability of the DNA structure [13]. Hence, looking into the information contained in the sequence of nucleotides along with the stability that comes with it is important. Shannon block entropy for a block size of N symbols is mathematically given as H N = i P (N) i log P (N) i, (3.1) where P (N) i is the probability (relative frequency) to observe the i th word of block size N. The entropy is maximal when all words occur at equal probabilities, and it is zero when one of the symbols occurs with probability one. In statistical mechanics and thermodynamics, the Boltzmann-Gibbs entropy has the form very similar to the Shannon entropy measure given in Eq. (3.1). However, it should be properly scaled by the Boltzmann constant k, which gives this entropy a unit 11

16 Chapter 3. Relation Between Boltzmann and Shannon Entropy 12 of kcal/kelvin and natural logarithm is used. S = k i P (N) i ln P (N) i. (3.2) Our aim is to apply the two forms of entropy measures on the complete genome of Escherichia coli (E.coli) and to see how the entropies develop across the genome. Furthermore, we would like to compare and figure out if there is some sort of relation that can help us relate the two. 3.2 Boltzmann Entropy and Distribution Laws of Thermodynamics In this section, the two laws of thermodynamics will be presented. The reference used for this section is [14] First Law of Thermodynamics For a system undergoing a process, the change in energy is equal to the heat added to the system minus the work done by the system. It simply means, the energy of the universe is conserved. The change in internal energy of the system, de is given by the equation de = dq dw, (3.3) where dq is the heat transferred into or out of the system and dw is the work done by or on the system. If the work done is a mechanical work by an expanding or contracting gas, dw can be derived to be P dv and the equation becomes de = dq P dv. (3.4) The negative sign is from the sign convention for work. The above equation is only valid if the pressure is constant throughout the reaction. Under such conditions, the heat transfer is called enthalpy(h) and the first law of thermodynamics can be written as de = dh P dv. (3.5)

17 Chapter 3. Relation Between Boltzmann and Shannon Entropy Second Law of Thermodynamics The second law is about entropy, a quantity which describes the microscopic state of a system in equilibrium. If the system is thermally isolated and undergoes a change of state, the entropy will always increase, i.e., S 0. (3.6) However, if the system is not thermally isolated and the change of state is in a quasistatic fashion in which a heat, dq, is absorbed, then ds = dq T, (3.7) where T is the absolute temperature. Entropy has units of Joule/Kelvin or Cal/Kelvin and it is a state variable, i.e., it is independent of the path between the initial and final states Ideal Gas Law The state of a gas is determined by its pressure (P ), volume (V ), and temperature (T )[14].The ideal gas law is commonly stated a, P V = nrt, (3.8) where n is the number of moles in the gas and R is the universal gas constant(8.314 J/K mol). The ideal gas law can also be formulated as P V = NkT, (3.9) N is the number of molecules in the gas and k is the Boltzmann constant Entropy of a Gas Macroscopic View Consider an isothermal and adiabatic process, i.e., occurring without exchange of heat of a system with its environment at constant temperature. Since we considered the process to be adiabatic and isothermal, de = 0 and dt = 0 [15]. Using the laws of

18 Chapter 3. Relation Between Boltzmann and Shannon Entropy 14 Figure 3.1: Adiabatic expansion of a gas at constant temperature [15] thermodynamics (equations (3.4) and (3.7)) and the ideal gas law (Eq. (3.9)), dq = de + P dv, (3.10) T ds = NkT dv V ds = NkdV V Integrating from the initial state to the final, we obtain, (3.11), (3.12) S = Nk ln V 2 V 1. (3.13) In this specific case the volume is doubled. Therefore, V 2 = 2V 1, S = Nk ln 2. (3.14) Microscopic View: Boltzmann Entropy It was Boltzmann who first gave thermodynamic entropy a meaning in relation to the number of arrangements of the molecules Ω [15]. In the above process, if we initially assume the number of molecules to be N and the number of arrangements of molecules (number of possible microscopic states) to be Ω, the final system will have 2 N Ω ways of arrangements (a molecule can be either on the left or on the right). Let S 1 and S 2 be the entropy of the first and second states with Ω 1 and Ω 2 arrangements respectively. The following proof is taken from [16]. The entropy of the final system will be S = S 1 + S 2. (3.15) The number of arrangements Ω of the final system will be Ω = Ω 1 Ω 2. (3.16)

19 Chapter 3. Relation Between Boltzmann and Shannon Entropy 15 Boltzmann postulated the entropy to be a function of Ω, S f(ω). (3.17) Therefore, S 1 = f(ω 1 ), S 2 = f(ω 2 ), and f(ω 1 Ω 2 ) = f(ω 1 ) + f(ω 2 ). (3.18) Differentiating with respect to Ω 1 leads to Again differentiating with respect to Ω 2 yields Thus, where C is a constant by separation of variables. f(ω 1 Ω 2 ) Ω 1 = f(ω 1Ω 2 ) Ω 1 Ω 2 Ω 2 = f(ω 1) Ω 1 (3.19) f(ω) Ω Ω 2 = f(ω 1) Ω 1. (3.20) f(ω 1 Ω 2 ) Ω 2 = f(ω 1Ω 2 ) Ω 1 Ω 2 Ω 1 = f(ω 2) Ω 2 (3.21) f(ω) Ω Ω 1 = f(ω 2) Ω 2. (3.22) 1 Ω 2 f(ω 1 ) Ω 1 = 1 Ω 1 f(ω 2 ) Ω 2 (3.23) Ω 1 f(ω 1 ) Ω 1 = Ω 2 f(ω 2 ) Ω 2 = C, (3.24) S 1 = f(ω 1 ) = C ln Ω 1 + const (3.25) S 2 = f(ω 2 ) = C ln Ω 2 + const (3.26) S = C ln Ω 1 + C ln Ω 2 + const (3.27) S = S 1 + S 2 (3.28) Hence, with const = 0, we obtain S = C ln Ω. (3.29) The value of the constant C can be observed by applying the postulate to the expansion of a gas depicted in Fig S = S 2 S 1 (3.30) S = C ln 2 N Ω C ln Ω (3.31)

20 Chapter 3. Relation Between Boltzmann and Shannon Entropy 16 S = CN ln 2 (3.32) Comparing with Eq. (3.14) we obtain C = k. The Boltzmann entropy becomes S = k ln Ω. (3.33) Boltzmann Distribution Consider an isolated system with energy E, volume V, and number of molecules N fixed. The N molecules will be arranged in such a way that n 1 is in the first energy state (ɛ 1 ), n 2 is in the second (ɛ 2 ), n 3 is in the third..., and n i is in the ɛ i energy states. The number of possible arrangements will be ( )( )( ) N N n1 N n1 n 2 Ω = = N! n i!. (3.34) n 1 n 2 When the system under consideration reaches on equilibrium, the molecules will disperse and the number of possible arrangements will be maximum [16]. To find the most probable configuration of the molecules, we have to maximize Ω for fixed N and E. The reference for this section is [16]. n 3 i maximize n i subject to Ω n i = N, i i n i ɛ i = E (3.35) Reformulating the constraints in terms of probabilities P i = n i N, i n i = N can be replaced by i P i = 1, and i n iɛ i = E can be replaced by i P iɛ i = Ē. Instead of Ω we can also maximize ln Ω and the problem becomes, maximize ln Ω, P i subject to P i = 1, i i Using Stirling s approximation for large N, P i ɛ i = Ē (3.36) ln N! N ln N N. (3.37)

21 Chapter 3. Relation Between Boltzmann and Shannon Entropy 17 Applying the approximation for ln Ω ln Ω ln N! i ln n i! (3.38) = N ln N N i n i ln n i + i n i (3.39) = N i P i ln P i (3.40) Omitting N, because it has no effect in the maximization, and applying Lagrange multipliers method, Eq. (3.40) leads to L = i P i ln P i α i P i β i P i ɛ i. (3.41) Setting L P j = 0, ln P j 1 α βɛ j = 0, P j = e α e βɛ j, (3.42) Substituting in the constraints, P j = 1 Z e βɛ j, where Z = e α. P j = 1 = 1 e βɛ j = 1, Z j Z(β) = e βɛ j. j j (3.43) Therefore, P j = e βɛ j e. βɛ j (3.44) j The constant β can be shown to be 1 kt. To do so, one can compare the average energy obtained using the Boltzmann distribution, which is 3 2β with the average kinetic energy of a molecule at equilibrium 3kT 2 From the definition of temperature, we have. Another way to derive that β = 1 kt, is as follows [17]. 1 T = S E. (3.45) V,N

22 Chapter 3. Relation Between Boltzmann and Shannon Entropy 18 Using Boltzmann s entropy definition, S = k ln Ω, and replacing Eq. (3.42) in Eq. (3.40), S = k ln Ω = kn i p i ln (e α e βɛ i ), (3.46) = kn i p i ( α βɛ i ), (3.47) = kn i p i α + knβ i p i ɛ i, (3.48) = knα + knβ E N, (3.49) = knα + kβe, (3.50) 1 T = S E = kβ, (3.51) V,N = β = 1 kt. (3.52) Therefore, the Boltzmann distribution relating the energy and temperature to the microscopic properties is given by P j = i e ɛ j kt e ɛ i kt (3.53) Gibbs Entropy Formula In the Boltzmann definition of the entropy, at a fixed energy, all states resulting in an energy E are assumed to be equally likely [15]. If the states of the thermodynamic system are not equally probable, Gibb s definition of entropy given by S = k i P i ln P i, (3.54) where the sum is over all microstates and P i is the probability that the molecule is in the i th microstate [18]. This definition, like Boltzmann s, is a fundamental postulate which can explain the experimental facts accurately [18]. To see if this definition of entropy is more general, consider a system having Ω microstates and if all microstates are equally probable, i.e., the P i = 1 Ω, (3.54) results in S = k Ω i=1 ( ) 1 Ω ln ( ) 1 = k ln Ω, (3.55) Ω

23 Chapter 3. Relation Between Boltzmann and Shannon Entropy 19 which is the Boltzmann definition of entropy Entropy of an Ideal Gas From the first law of thermodynamics (given in Eq. (3.4)) we have dq = de + V dp. (3.56) For any gas, the change in internal energy de depends on the change in temperature. Thus, de = C v dt per mole of a gas, where C v is the specific heat 1 at constant volume. Integrating both sides of the equation, leads to dq = C v dt + nrt dv V (3.57) dq T = mc vdt + nr dv T V (3.58) S = C v ln T + nr ln V + constant. (3.59) Depending on the type of experimental condition of the system, the change in entropy will be different [14]. If the process is done at constant temperature, S = nr ln V 2 V 1, If the process is done at constant volume, S = nc v ln T 2 T 1, and If the process is done at constant pressure, S = nc p ln T 2 T Entropy of the E. coli Genome We have used the 4,639,221 base pairs (bp) sequence of E. coli K-12 strain. First, the data is rearranged to start at the origin of replication. Then, entropy of chunks of the DNA sequence is computed for different block sizes (2bp up to 6bp) in nonoverlapping windows containing 100 Kbp. For calculating the Boltzmann entropy the stacking energies of base pairs obtained from [13] is used. All the neighboring base pairs are considered. That is, if the nucleotide sequence is AGCT, the energies of AG, GC, and CT will be taken into account. 1 The specific heat is the amount of heat per mass unit required to raise the temperature by one degree Kelvin

24 Chapter 3. Relation Between Boltzmann and Shannon Entropy 20 We have assumed that all nearest neighbor pairs in the window are independent and we postulated discrete states in which the probabilities for having the corresponding stacking energy are drawn from the Boltzmann distribution. Although we are aware that the Boltzmann distribution gives the most probable distribution of energy for states having a random distribution of energies (e.g., ideal gas), which is not the case here, we used it to have a representation of stability (energy) in an expression that follows the structure of an entropy. The Boltzmann distribution for a state having a stacking energy E i at an absolute temperature of T is P i = e Ei kt i e E i. (3.60) kt 3.4 Result and Discussion The result for a block size of 2bp and window size of 100 Kbp is shown in Fig The result of the Boltzmann entropy is scaled down to the range of the Shannon entropy to make it easy for visual comparisons. Although we could not yet find a single general interpretation relating the two entropies, we can see some opposite trend in some positions (e.g., Window 16 to 25) and parallel tendencies in some other (e.g., Window 40 to 46). The plots for 3bp, 4bp, 5bp, and 6bp are also similar. This shows that the entropies are more or less invariant under the change of block size. Hence, from now on results with a block size of 3bp will be plotted. The plots for 4bp, 5bp, and 6bp are presented in Appendix A Window:of:size::100Kb::2:base:pairs Boltzmann:Entropy Shannon:Entropy Entropy Window:Number Figure 3.2: Boltzmann and Shannon entropies of E. coli genome, 2bp block.

25 Chapter 3. Relation Between Boltzmann and Shannon Entropy Windowaofasize:a100Kb:a3abaseapairs BoltzmannaEntropy ShannonaEntropy Entropy WindowaNumber Figure 3.3: Boltzmann and Shannon entropies of E. coli genome, 3bp block. Once the results for Shannon and Boltzmann entropies were obtained, we discussed the results with molecular biology colleagues. As a result, we decided to see how the entropies relate to the number of the four functional classes of genes, namely anabolic, catabolic, aerobic, and anaerobic. Additionally, they provided us with the data for the distribution of the classes of genes in the genome. We used a 500 kb sliding window starting with the origin as the center of the first window and slide it 4 kb at a time across the complete genome. Then, the number of genes of the corresponding functional gene along with the Shannon and Boltzmann or their difference is plotted. The results are presented in figures from 3.4 to Interestingly, from Fig. 3.4, we observe that the shape of Boltzmann entropy and number of anabolic genes are strongly related. This implies that, the stability is dependent on the number of anabolic genes present. Also, the distribution of the aerobic genes has a similar pattern as the difference of the entropies as shown in Fig All in all, even if there is no straightforward relationship between some of the curves, there seems to be a hidden meaning which we should further analyze together with our molecular genetics colleagues.

26 Chapter 3. Relation Between Boltzmann and Shannon Entropy oric SlidingfWindowfoffsize:f500Kb:f3fBasepairs Ter BoltzmannfEntropy ShannonfEntropy AnabolicfGenes oric Entropy NumberfoffGenes ChromosomalfPosition xf10 6 Figure 3.4: Number of anabolic genes with Boltzmann and Shannon entropies oric SlidingWWindowWofWsize:W500Kb:W3WBasepairs Ter oric BoltzmannWEntropyW WShannonWEntropy NumberWofWAnabolicWGenes 100 DifferenceWofWtheWEntropies NumberWofWGenes ChromosomalWPosition xw10 6 Figure 3.5: Number of anabolic genes with difference of Boltzmann and Shannon entropies.

27 Chapter 3. Relation Between Boltzmann and Shannon Entropy oric SlidingfWindowfoffsize:f500Kb:f3fBasepairs Ter oric Entropy NumberfoffGenes BoltzmannfEntropy ShannonfEntropy CatabolicfGenes ChromosomalfPosition xf10 6 Figure 3.6: Number of catabolic genes with Boltzmann and Shannon entropies SlidingWWindowWofWsize:W500Kb:W3WBasepairs oric Ter oric BoltzmannWEntropyW WShannonWEntropy NumberWofWCatabolicWGenes 40 DifferenceWofWtheWEntropies NumberWofWGenes ChromosomalWPosition xw10 6 Figure 3.7: Number of catabolic genes with difference of Boltzmann and Shannon entropies.

28 Chapter 3. Relation Between Boltzmann and Shannon Entropy oric SlidingwWindowwofwsize:w500Kb:w3wBasepairs Ter oric BoltzmannwEntropy ShannonwEntropy AerobicwGenes Entropy NumberwofwGenes ChromosomalwPosition xw10 6 Figure 3.8: Number of aerobic genes with Boltzmann and Shannon entropies oric SlidingWWindowWofWsize:W500Kb:W3WBasepairs Ter oric 20 DifferenceWofWtheWEntropies BoltzmannWEntropyW WShannonWEntropy NumberWofWAerobicWGenes NumberWofWGenes ChromosomalWPosition xw10 6 Figure 3.9: Number of aerobic genes with difference of Boltzmann and Shannon entropies.

29 Chapter 3. Relation Between Boltzmann and Shannon Entropy oric SlidingwWindowwofwsize:w500Kb:w3wBasepairs Ter BoltzmannwEntropy ShannonwEntropy AnaerobicwGenes oric Entropy NumberwofwGenes ChromosomalwPosition xw10 6 Figure 3.10: Number of anaerobic genes with Boltzmann and Shannon entropies oric SlidingWWindowWofWsize:W500Kb:W3WBasepairs Ter oric 50 BoltzmannWEntropyW WShannonWEntropy NumberWofWAnaerobicWGenes DifferenceWofWtheWEntropies NumberWofWGenes ChromosomalWPosition xw10 6 Figure 3.11: Number of anaerobic genes with difference of Boltzmann and Shannon entropies.

30 Chapter 4 Conclusions A comparison between chemical properties of amino acids and mutation probabilities of codons was carried out using the classical multidimensional scaling method. The results showed that most of the highly probable mutations will not lead to a dramatic change of chemical properties. However, some inconsistencies were also observed. Thus, further studies of the severeness of the mutations and possible protection mechanism to counteract the effects is required. In addition, the error introduced in representing 64-dimensional data with two dimensions is significant. This is due to the slow decay of the eigenvalues of the data. Therefore, another dimension reduction and clustering method with a better performance can be applied in the future. Our second task was to look into the relationship between Shannon and Boltzmann entropies. We have seen that, even though we did not yet find suitable interpretations, at some positions they follow the same pattern and in other positions they tend to move in opposite directions. We further investigated how both entropies are related to the functional classes of genes located at the same positions in the genome. We found interesting correlations, especially with the distribution of anabolic genes. 26

31 Appendix A Additional Plots Window:of:size::100Kb::4:base:pairs Boltzmann:Entropy Shannon:Entropy 7.9 Entropy Window:Number Figure A.1: Boltzmann and Shannon entropies of E. coli genome, 4bp block. 27

32 Appendix A. Additional Plots Window:of:size::100Kb::5:Basepairs Boltzmann:Entropy Shannon:Entropy Entropy Window:Number Figure A.2: Boltzmann and Shannon entropies of E. coli genome, 5bp block WindowKofKsize:K100Kb:K6KBasepairs BoltzmannKEntropy ShannonKEntropy Entropy WindowKNumber Figure A.3: Boltzmann and Shannon entropies of E. coli genome, 6bp block.

33 Bibliography [1] Deoxyribonucleic acid (dna). [Online]. Available: [2] J. D. Watson, F. H. Crick et al., Molecular structure of nucleic acids, Nature, vol. 171, no. 4356, pp , [3] F. H. Crick, On protein synthesis. in Symposia of the Society for Experimental Biology, vol. 12, 1958, p [4] Central dogma of molecular biochemistry with enzymes. [Online]. Available: Dogma of Molecular Biochemistry with Enzymes.jpg [5] More non-random dna wonders. [Online]. Available: wordpress.com/2011/12/26/more-non-random-dna-wonders/ [6] M. Dayhoff, R. Schwartz, and B. Orcutt, A model for evolutionary change. mo dayhoff, ed, Atlas of protein sequence and structure, vol. 5, p. 345, [7] S. Henikoff and J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proceedings of the National Academy of Sciences, vol. 89, no. 22, pp , [8] S. Whelan and N. Goldman, A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach, Molecular biology and evolution, vol. 18, no. 5, pp , [9] A. Schneider, G. M. Cannarozzi, and G. H. Gonnet, Empirical codon substitution matrix, BMC bioinformatics, vol. 6, no. 1, p. 134, [10] S. W. Cheng, Multidimensional scaling (mds). [Online]. Available: http: // swcheng/teaching/stat5191/lecture/06 MDS.pdf [11] W. R. Taylor, The classification of amino acid conservation, Journal of theoretical Biology, vol. 119, no. 2, pp ,

34 Bibliography 30 [12] Amino acids venn diagram. [Online]. Available: wiki/file:amino Acids Venn Diagram.png [13] J. SantaLucia, A unified view of polymer, dumbbell, and oligonucleotide dna nearest-neighbor thermodynamics, Proceedings of the National Academy of Sciences, vol. 95, no. 4, pp , [14] F. Reif, Fundamentals of Statistical and Thermal Physics, international student edition ed. McGraw-Hill Book, [15] W. Allison, Lecture notes on statistical physics. [Online]. Available: wa14/camonly/statistical/lecture2.pdf [16] A. Huan, Course notes on statistical mechanics. [Online]. Available: [17] J. Saunders, Classical and statistical thermodynamics. [Online]. Available: files/sect2.pdf [18] M. Evans, Statistical physics section 1: Information theory approach to statistical mechanics. [Online]. Available: mevans/sp/sp1.pdf

Practical Bioinformatics

Practical Bioinformatics 5/2/2017 Dictionaries d i c t i o n a r y = { A : T, T : A, G : C, C : G } d i c t i o n a r y [ G ] d i c t i o n a r y [ N ] = N d i c t i o n a r y. h a s k e y ( C ) Dictionaries g e n e t i c C o

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Advanced topics in bioinformatics

Advanced topics in bioinformatics Feinberg Graduate School of the Weizmann Institute of Science Advanced topics in bioinformatics Shmuel Pietrokovski & Eitan Rubin Spring 2003 Course WWW site: http://bioinformatics.weizmann.ac.il/courses/atib

More information

Supplementary Information for

Supplementary Information for Supplementary Information for Evolutionary conservation of codon optimality reveals hidden signatures of co-translational folding Sebastian Pechmann & Judith Frydman Department of Biology and BioX, Stanford

More information

NSCI Basic Properties of Life and The Biochemistry of Life on Earth

NSCI Basic Properties of Life and The Biochemistry of Life on Earth NSCI 314 LIFE IN THE COSMOS 4 Basic Properties of Life and The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB http://physics.csusb.edu/~karen/ WHAT IS LIFE? HARD TO DEFINE,

More information

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA

SUPPORTING INFORMATION FOR. SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA SUPPORTING INFORMATION FOR SEquence-Enabled Reassembly of β-lactamase (SEER-LAC): a Sensitive Method for the Detection of Double-Stranded DNA Aik T. Ooi, Cliff I. Stains, Indraneel Ghosh *, David J. Segal

More information

Crick s early Hypothesis Revisited

Crick s early Hypothesis Revisited Crick s early Hypothesis Revisited Or The Existence of a Universal Coding Frame Ryan Rossi, Jean-Louis Lassez and Axel Bernal UPenn Center for Bioinformatics BIOINFORMATICS The application of computer

More information

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm

High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence beyond 950 nm Electronic Supplementary Material (ESI) for Nanoscale. This journal is The Royal Society of Chemistry 2018 High throughput near infrared screening discovers DNA-templated silver clusters with peak fluorescence

More information

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin

Characterization of Pathogenic Genes through Condensed Matrix Method, Case Study through Bacterial Zeta Toxin International Journal of Genetic Engineering and Biotechnology. ISSN 0974-3073 Volume 2, Number 1 (2011), pp. 109-114 International Research Publication House http://www.irphouse.com Characterization of

More information

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc

Supplemental data. Pommerrenig et al. (2011). Plant Cell /tpc Supplemental Figure 1. Prediction of phloem-specific MTK1 expression in Arabidopsis shoots and roots. The images and the corresponding numbers showing absolute (A) or relative expression levels (B) of

More information

SUPPLEMENTARY DATA - 1 -

SUPPLEMENTARY DATA - 1 - - 1 - SUPPLEMENTARY DATA Construction of B. subtilis rnpb complementation plasmids For complementation, the B. subtilis rnpb wild-type gene (rnpbwt) under control of its native rnpb promoter and terminator

More information

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture.

Clay Carter. Department of Biology. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Clay Carter Department of Biology QuickTime and a TIFF (LZW) decompressor are needed to see this picture. Ornamental tobacco

More information

Number-controlled spatial arrangement of gold nanoparticles with

Number-controlled spatial arrangement of gold nanoparticles with Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2016 Number-controlled spatial arrangement of gold nanoparticles with DNA dendrimers Ping Chen,*

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1 Supplementary Figure 1 Zn 2+ -binding sites in USP18. (a) The two molecules of USP18 present in the asymmetric unit are shown. Chain A is shown in blue, chain B in green. Bound Zn 2+ ions are shown as

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Supplementary Information

Supplementary Information Electronic Supplementary Material (ESI) for RSC Advances. This journal is The Royal Society of Chemistry 2014 Directed self-assembly of genomic sequences into monomeric and polymeric branched DNA structures

More information

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr),

SSR ( ) Vol. 48 No ( Microsatellite marker) ( Simple sequence repeat,ssr), 48 3 () Vol. 48 No. 3 2009 5 Journal of Xiamen University (Nat ural Science) May 2009 SSR,,,, 3 (, 361005) : SSR. 21 516,410. 60 %96. 7 %. (),(Between2groups linkage method),.,, 11 (),. 12,. (, ), : 0.

More information

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models)

Regulatory Sequence Analysis. Sequence models (Bernoulli and Markov models) Regulatory Sequence Analysis Sequence models (Bernoulli and Markov models) 1 Why do we need random models? Any pattern discovery relies on an underlying model to estimate the random expectation. This model

More information

Electronic supplementary material

Electronic supplementary material Applied Microbiology and Biotechnology Electronic supplementary material A family of AA9 lytic polysaccharide monooxygenases in Aspergillus nidulans is differentially regulated by multiple substrates and

More information

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Supporting Information for Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)- Dependence and Its Ability to Chelate Multiple Nutrient Transition Metal Ions Rose C. Hadley,

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis: Protein synthesis uses the information in genes to make proteins. 2 Steps

More information

From DNA to protein, i.e. the central dogma

From DNA to protein, i.e. the central dogma From DNA to protein, i.e. the central dogma DNA RNA Protein Biochemistry, chapters1 5 and Chapters 29 31. Chapters 2 5 and 29 31 will be covered more in detail in other lectures. ph, chapter 1, will be

More information

Supporting Information

Supporting Information Supporting Information T. Pellegrino 1,2,3,#, R. A. Sperling 1,#, A. P. Alivisatos 2, W. J. Parak 1,2,* 1 Center for Nanoscience, Ludwig Maximilians Universität München, München, Germany 2 Department of

More information

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Protein Synthesis Unit 6 Goal: Students will be able to describe the processes of transcription and translation. Types of RNA Messenger RNA (mrna) makes a copy of DNA, carries instructions for making proteins,

More information

Supplemental Figure 1.

Supplemental Figure 1. A wt spoiiiaδ spoiiiahδ bofaδ B C D E spoiiiaδ, bofaδ Supplemental Figure 1. GFP-SpoIVFA is more mislocalized in the absence of both BofA and SpoIIIAH. Sporulation was induced by resuspension in wild-type

More information

Why do more divergent sequences produce smaller nonsynonymous/synonymous

Why do more divergent sequences produce smaller nonsynonymous/synonymous Genetics: Early Online, published on June 21, 2013 as 10.1534/genetics.113.152025 Why do more divergent sequences produce smaller nonsynonymous/synonymous rate ratios in pairwise sequence comparisons?

More information

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy

Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Supporting Information Building a Multifunctional Aptamer-Based DNA Nanoassembly for Targeted Cancer Therapy Cuichen Wu,, Da Han,, Tao Chen,, Lu Peng, Guizhi Zhu,, Mingxu You,, Liping Qiu,, Kwame Sefah,

More information

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics

Modelling and Analysis in Bioinformatics. Lecture 1: Genomic k-mer Statistics 582746 Modelling and Analysis in Bioinformatics Lecture 1: Genomic k-mer Statistics Juha Kärkkäinen 06.09.2016 Outline Course introduction Genomic k-mers 1-Mers 2-Mers 3-Mers k-mers for Larger k Outline

More information

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval

Evolvable Neural Networks for Time Series Prediction with Adaptive Learning Interval Evolvable Neural Networs for Time Series Prediction with Adaptive Learning Interval Dong-Woo Lee *, Seong G. Kong *, and Kwee-Bo Sim ** *Department of Electrical and Computer Engineering, The University

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION DOI:.38/NCHEM.246 Optimizing the specificity of nucleic acid hyridization David Yu Zhang, Sherry Xi Chen, and Peng Yin. Analytic framework and proe design 3.. Concentration-adjusted

More information

PROTEIN SYNTHESIS INTRO

PROTEIN SYNTHESIS INTRO MR. POMERANTZ Page 1 of 6 Protein synthesis Intro. Use the text book to help properly answer the following questions 1. RNA differs from DNA in that RNA a. is single-stranded. c. contains the nitrogen

More information

TM1 TM2 TM3 TM4 TM5 TM6 TM bp

TM1 TM2 TM3 TM4 TM5 TM6 TM bp a 467 bp 1 482 2 93 3 321 4 7 281 6 21 7 66 8 176 19 12 13 212 113 16 8 b ATG TCA GGA CAT GTA ATG GAG GAA TGT GTA GTT CAC GGT ACG TTA GCG GCA GTA TTG CGT TTA ATG GGC GTA GTG M S G H V M E E C V V H G T

More information

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.

Videos. Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu. Translation Translation Videos Bozeman, transcription and translation: https://youtu.be/h3b9arupxzg Crashcourse: Transcription and Translation - https://youtu.be/itsb2sqr-r0 Translation Translation The

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION DOI:.8/NCHEM. Conditionally Fluorescent Molecular Probes for Detecting Single Base Changes in Double-stranded DNA Sherry Xi Chen, David Yu Zhang, Georg Seelig. Analytic framework and probe design.. Design

More information

UNIT 5. Protein Synthesis 11/22/16

UNIT 5. Protein Synthesis 11/22/16 UNIT 5 Protein Synthesis IV. Transcription (8.4) A. RNA carries DNA s instruction 1. Francis Crick defined the central dogma of molecular biology a. Replication copies DNA b. Transcription converts DNA

More information

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R

Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R Table S1. Primers and PCR conditions used in this paper Primers Sequence (5 3 ) Thermal conditions Reference Rhizobacteria 27F 1492R AAC MGG ATT AGA TAC CCK G GGY TAC CTT GTT ACG ACT T Detection of Candidatus

More information

Protein Threading. Combinatorial optimization approach. Stefan Balev.

Protein Threading. Combinatorial optimization approach. Stefan Balev. Protein Threading Combinatorial optimization approach Stefan Balev Stefan.Balev@univ-lehavre.fr Laboratoire d informatique du Havre Université du Havre Stefan Balev Cours DEA 30/01/2004 p.1/42 Outline

More information

CCHS 2015_2016 Biology Fall Semester Exam Review

CCHS 2015_2016 Biology Fall Semester Exam Review Biomolecule General Knowledge Macromolecule Monomer (building block) Function Energy Storage Structure 1. What type of biomolecule is hair, skin, and nails? 2. What is the polymer of a nucleotide? 3. Which

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Chapters 12&13 Notes: DNA, RNA & Protein Synthesis Name Period Words to Know: nucleotides, DNA, complementary base pairing, replication, genes, proteins, mrna, rrna, trna, transcription, translation, codon,

More information

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria

evoglow - express N kit distributed by Cat.#: FP product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information distributed by Cat.#: FP-21020 Content: Product Overview... 3 evoglow express N -kit... 3 The evoglow -Fluorescent

More information

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA RNA & PROTEIN SYNTHESIS Making Proteins Using Directions From DNA RNA & Protein Synthesis v Nitrogenous bases in DNA contain information that directs protein synthesis v DNA remains in nucleus v in order

More information

Codon Distribution in Error-Detecting Circular Codes

Codon Distribution in Error-Detecting Circular Codes life Article Codon Distribution in Error-Detecting Circular Codes Elena Fimmel, * and Lutz Strüngmann Institute for Mathematical Biology, Faculty of Computer Science, Mannheim University of Applied Sciences,

More information

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies

3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies. 3. Evolution makes sense of homologies Richard Owen (1848) introduced the term Homology to refer to structural similarities among organisms. To Owen, these similarities indicated that organisms were created following a common plan or archetype.

More information

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer.

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer. Midterm Review Guide Name: Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer. 4. Fill in the Organic Compounds chart : Elements Monomer

More information

Supplemental Table 1. Primers used for cloning and PCR amplification in this study

Supplemental Table 1. Primers used for cloning and PCR amplification in this study Supplemental Table 1. Primers used for cloning and PCR amplification in this study Target Gene Primer sequence NATA1 (At2g393) forward GGG GAC AAG TTT GTA CAA AAA AGC AGG CTT CAT GGC GCC TCC AAC CGC AGC

More information

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions.

In previous lecture. Shannon s information measure x. Intuitive notion: H = number of required yes/no questions. In previous lecture Shannon s information measure H ( X ) p log p log p x x 2 x 2 x Intuitive notion: H = number of required yes/no questions. The basic information unit is bit = 1 yes/no question or coin

More information

Chapter 17. From Gene to Protein. Biology Kevin Dees

Chapter 17. From Gene to Protein. Biology Kevin Dees Chapter 17 From Gene to Protein DNA The information molecule Sequences of bases is a code DNA organized in to chromosomes Chromosomes are organized into genes What do the genes actually say??? Reflecting

More information

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Lesson Overview. Ribosomes and Protein Synthesis 13.2 13.2 The Genetic Code The first step in decoding genetic messages is to transcribe a nucleotide base sequence from DNA to mrna. This transcribed information contains a code for making proteins. The Genetic

More information

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria

evoglow - express N kit Cat. No.: product information broad host range vectors - gram negative bacteria evoglow - express N kit broad host range vectors - gram negative bacteria product information Cat. No.: 2.1.020 evocatal GmbH 2 Content: Product Overview... 4 evoglow express N kit... 4 The evoglow Fluorescent

More information

CCHS 2016_2017 Biology Fall Semester Exam Review

CCHS 2016_2017 Biology Fall Semester Exam Review CCHS 2016_2017 Biology Fall Semester Exam Review Biomolecule General Knowledge Macromolecule Monomer (building block) Function Structure 1. What type of biomolecule is hair, skin, and nails? Energy Storage

More information

Translation Part 2 of Protein Synthesis

Translation Part 2 of Protein Synthesis Translation Part 2 of Protein Synthesis IN: How is transcription like making a jello mold? (be specific) What process does this diagram represent? A. Mutation B. Replication C.Transcription D.Translation

More information

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole

The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded DNA Sequence Is Self-Designed as a Numerical Whole Applied Mathematics, 2013, 4, 37-53 http://dx.doi.org/10.4236/am.2013.410a2004 Published Online October 2013 (http://www.scirp.org/journal/am) The 3 Genomic Numbers Discovery: How Our Genome Single-Stranded

More information

Biology 2018 Final Review. Miller and Levine

Biology 2018 Final Review. Miller and Levine Biology 2018 Final Review Miller and Levine bones blood cells elements All living things are made up of. cells If a cell of an organism contains a nucleus, the organism is a(n). eukaryote prokaryote plant

More information

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila

Evolutionary dynamics of abundant stop codon readthrough in Anopheles and Drosophila biorxiv preprint first posted online May. 3, 2016; doi: http://dx.doi.org/10.1101/051557. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved.

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

The role of the FliD C-terminal domain in pentamer formation and

The role of the FliD C-terminal domain in pentamer formation and The role of the FliD C-terminal domain in pentamer formation and interaction with FliT Hee Jung Kim 1,2,*, Woongjae Yoo 3,*, Kyeong Sik Jin 4, Sangryeol Ryu 3,5 & Hyung Ho Lee 1, 1 Department of Chemistry,

More information

Lecture 15: Programming Example: TASEP

Lecture 15: Programming Example: TASEP Carl Kingsford, 0-0, Fall 0 Lecture : Programming Example: TASEP The goal for this lecture is to implement a reasonably large program from scratch. The task we will program is to simulate ribosomes moving

More information

part 3: analysis of natural selection pressure

part 3: analysis of natural selection pressure part 3: analysis of natural selection pressure markov models are good phenomenological codon models do have many benefits: o principled framework for statistical inference o avoiding ad hoc corrections

More information

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton.

part 4: phenomenological load and biological inference. phenomenological load review types of models. Gαβ = 8π Tαβ. Newton. 2017-07-29 part 4: and biological inference review types of models phenomenological Newton F= Gm1m2 r2 mechanistic Einstein Gαβ = 8π Tαβ 1 molecular evolution is process and pattern process pattern MutSel

More information

The Trigram and other Fundamental Philosophies

The Trigram and other Fundamental Philosophies The Trigram and other Fundamental Philosophies by Weimin Kwauk July 2012 The following offers a minimal introduction to the trigram and other Chinese fundamental philosophies. A trigram consists of three

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Objective: You will be able to justify the claim that organisms share many conserved core processes and features.

Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Objective: You will be able to justify the claim that organisms share many conserved core processes and features. Do Now: Read Enduring Understanding B Essential knowledge: Organisms share many conserved

More information

Using algebraic geometry for phylogenetic reconstruction

Using algebraic geometry for phylogenetic reconstruction Using algebraic geometry for phylogenetic reconstruction Marta Casanellas i Rius (joint work with Jesús Fernández-Sánchez) Departament de Matemàtica Aplicada I Universitat Politècnica de Catalunya IMA

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

Translation. Genetic code

Translation. Genetic code Translation Genetic code If genes are segments of DNA and if DNA is just a string of nucleotide pairs, then how does the sequence of nucleotide pairs dictate the sequence of amino acids in proteins? Simple

More information

Organic Chemistry Option II: Chemical Biology

Organic Chemistry Option II: Chemical Biology Organic Chemistry Option II: Chemical Biology Recommended books: Dr Stuart Conway Department of Chemistry, Chemistry Research Laboratory, University of Oxford email: stuart.conway@chem.ox.ac.uk Teaching

More information

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi

Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Supporting Information http://www.genetics.org/cgi/content/full/genetics.110.116228/dc1 Sex-Linked Inheritance in Macaque Monkeys: Implications for Effective Population Size and Dispersal to Sulawesi Ben

More information

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype Lecture Series 7 From DNA to Protein: Genotype to Phenotype Reading Assignments Read Chapter 7 From DNA to Protein A. Genes and the Synthesis of Polypeptides Genes are made up of DNA and are expressed

More information

From Gene to Protein

From Gene to Protein From Gene to Protein Gene Expression Process by which DNA directs the synthesis of a protein 2 stages transcription translation All organisms One gene one protein 1. Transcription of DNA Gene Composed

More information

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine.

1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. Protein Synthesis & Mutations RNA 1. Contains the sugar ribose instead of deoxyribose. 2. Single-stranded instead of double stranded. 3. Contains uracil in place of thymine. RNA Contains: 1. Adenine 2.

More information

Molecular Biology - Translation of RNA to make Protein *

Molecular Biology - Translation of RNA to make Protein * OpenStax-CNX module: m49485 1 Molecular Biology - Translation of RNA to make Protein * Jerey Mahr Based on Translation by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative

More information

Symmetry Studies. Marlos A. G. Viana

Symmetry Studies. Marlos A. G. Viana Symmetry Studies Marlos A. G. Viana aaa aac aag aat caa cac cag cat aca acc acg act cca ccc ccg cct aga agc agg agt cga cgc cgg cgt ata atc atg att cta ctc ctg ctt gaa gac gag gat taa tac tag tat gca gcc

More information

Chapter

Chapter Chapter 17 17.4-17.6 Molecular Components of Translation A cell interprets a genetic message and builds a polypeptide The message is a series of codons on mrna The interpreter is called transfer (trna)

More information

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First

Laith AL-Mustafa. Protein synthesis. Nabil Bashir 10\28\ First Laith AL-Mustafa Protein synthesis Nabil Bashir 10\28\2015 http://1drv.ms/1gigdnv 01 First 0 Protein synthesis In previous lectures we started talking about DNA Replication (DNA synthesis) and we covered

More information

Slide 1 / 54. Gene Expression in Eukaryotic cells

Slide 1 / 54. Gene Expression in Eukaryotic cells Slide 1 / 54 Gene Expression in Eukaryotic cells Slide 2 / 54 Central Dogma DNA is the the genetic material of the eukaryotic cell. Watson & Crick worked out the structure of DNA as a double helix. According

More information

Topic 1 - The building blocks of. cells! Name:!

Topic 1 - The building blocks of. cells! Name:! B2 - Revision Topic 1 - The building blocks of Lesson cells Name: Topic B2.1 Plant and Animal Cells B2.2 Inside Bacteria B2.3 DNA B2.4 Extracting DNA: PCA B2.5 DNA Discovery B2.6 Genetic Engineering B2.7

More information

Introduction to Molecular Phylogeny

Introduction to Molecular Phylogeny Introduction to Molecular Phylogeny Starting point: a set of homologous, aligned DNA or protein sequences Result of the process: a tree describing evolutionary relationships between studied sequences =

More information

DNA THE CODE OF LIFE 05 JULY 2014

DNA THE CODE OF LIFE 05 JULY 2014 LIFE SIENES N THE OE OF LIFE 05 JULY 2014 Lesson escription In this lesson we nswer questions on: o N, RN and Protein synthesis o The processes of mitosis and meiosis o omparison of the processes of meiosis

More information

Re- engineering cellular physiology by rewiring high- level global regulatory genes

Re- engineering cellular physiology by rewiring high- level global regulatory genes Re- engineering cellular physiology by rewiring high- level global regulatory genes Stephen Fitzgerald 1,2,, Shane C Dillon 1, Tzu- Chiao Chao 2, Heather L Wiencko 3, Karsten Hokamp 3, Andrew DS Cameron

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

Evolutionary Analysis of Viral Genomes

Evolutionary Analysis of Viral Genomes University of Oxford, Department of Zoology Evolutionary Biology Group Department of Zoology University of Oxford South Parks Road Oxford OX1 3PS, U.K. Fax: +44 1865 271249 Evolutionary Analysis of Viral

More information

Genetic Code, Attributive Mappings and Stochastic Matrices

Genetic Code, Attributive Mappings and Stochastic Matrices Genetic Code, Attributive Mappings and Stochastic Matrices Matthew He Division of Math, Science and Technology Nova Southeastern University Ft. Lauderdale, FL 33314, USA Email: hem@nova.edu Abstract: In

More information

Timing molecular motion and production with a synthetic transcriptional clock

Timing molecular motion and production with a synthetic transcriptional clock Timing molecular motion and production with a synthetic transcriptional clock Elisa Franco,1, Eike Friedrichs 2, Jongmin Kim 3, Ralf Jungmann 2, Richard Murray 1, Erik Winfree 3,4,5, and Friedrich C. Simmel

More information

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line

ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line PRODUCT DATASHEET ChemiScreen CaS Calcium Sensor Receptor Stable Cell Line CATALOG NUMBER: HTS137C CONTENTS: 2 vials of mycoplasma-free cells, 1 ml per vial. STORAGE: Vials are to be stored in liquid N

More information

Biology I Fall Semester Exam Review 2014

Biology I Fall Semester Exam Review 2014 Biology I Fall Semester Exam Review 2014 Biomolecules and Enzymes (Chapter 2) 8 questions Macromolecules, Biomolecules, Organic Compunds Elements *From the Periodic Table of Elements Subunits Monomers,

More information

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Number of questions TEK (Learning Target) Biomolecules & Enzymes Unit Biomolecules & Enzymes Number of questions TEK (Learning Target) on Exam 8 questions 9A I can compare and contrast the structure and function of biomolecules. 9C I know the role of enzymes and how

More information

L I F E S C I E N C E S

L I F E S C I E N C E S 1a L I F E S C I E N C E S 5 -UUA AUA UUC GAA AGC UGC AUC GAA AAC UGU GAA UCA-3 5 -TTA ATA TTC GAA AGC TGC ATC GAA AAC TGT GAA TCA-3 3 -AAT TAT AAG CTT TCG ACG TAG CTT TTG ACA CTT AGT-5 NOVEMBER 7, 2006

More information

Objective 3.01 (DNA, RNA and Protein Synthesis)

Objective 3.01 (DNA, RNA and Protein Synthesis) Objective 3.01 (DNA, RNA and Protein Synthesis) DNA Structure o Discovered by Watson and Crick o Double-stranded o Shape is a double helix (twisted ladder) o Made of chains of nucleotides: o Has four types

More information

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation.

Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Supplementary Figure 1. Schematic of split-merger microfluidic device used to add transposase to template drops for fragmentation. Inlets are labelled in blue, outlets are labelled in red, and static channels

More information

Lecture IV A. Shannon s theory of noisy channels and molecular codes

Lecture IV A. Shannon s theory of noisy channels and molecular codes Lecture IV A Shannon s theory of noisy channels and molecular codes Noisy molecular codes: Rate-Distortion theory S Mapping M Channel/Code = mapping between two molecular spaces. Two functionals determine

More information

CCHS 2014_2015 Biology Fall Semester Exam Review

CCHS 2014_2015 Biology Fall Semester Exam Review CCHS 2014_2015 Biology Fall Semester Exam Review Biomolecule General Knowledge Macromolecule Monomer (building block) Function Structure 1. What type of biomolecule is hair, skin, and nails? Energy Storage

More information

Near-instant surface-selective fluorogenic protein quantification using sulfonated

Near-instant surface-selective fluorogenic protein quantification using sulfonated Electronic Supplementary Material (ESI) for rganic & Biomolecular Chemistry. This journal is The Royal Society of Chemistry 2014 Supplemental nline Materials for ear-instant surface-selective fluorogenic

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell. RNAs L.Os. Know the different types of RNA & their relative concentration Know the structure of each RNA Understand their functions Know their locations in the cell Understand the differences between prokaryotic

More information

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become four, four become eight, and so on.

Notes Chapter 4 Cell Reproduction. That cell divided and becomes two, two become four, four become eight, and so on. 4.1 Cell Division and Mitosis Many organisms start as one cell. Notes Chapter 4 Cell Reproduction That cell divided and becomes two, two become four, four become eight, and so on. Many-celled organisms,

More information

Interphase & Cell Division

Interphase & Cell Division 1 Interphase & Cell Division 2 G1 = cell grows and carries out its normal job. S phase = DNA is copied (replicated/duplicated) G2 = Cell prepares for division 3 During mitosis, the nuclear membrane breaks

More information

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE

THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE STATISTICA, anno LXIX, n. 2 3, 2009 THE MATHEMATICAL STRUCTURE OF THE GENETIC CODE: A TOOL FOR INQUIRING ON THE ORIGIN OF LIFE Diego Luis Gonzalez CNR-IMM, Bologna Section, Via Gobetti 101, I-40129, Bologna,

More information