8 Protein secondary structure

Size: px
Start display at page:

Download "8 Protein secondary structure"

Transcription

1 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Protein secondary structure Sources for this chapter, which are all recommended reading: Introduction to Protein Structure, Branden & Tooze, V.V. Solovyev and I.N. Shindyalov. Properties and prediction of protein secondary structure. In Current Topics in Computational Molecular Biology, T. Jiang, Y. Xu and M.Q. Zhang (editors), MIT press, chapter 15, pages , 22. D.W. Mount. Bioinformatics: Sequences and Genome analysis, Cold Spring Harbor Press, Chapter 9: Protein classification and structure prediction. pages , Proteins A protein is a chain of amino acids joined by peptide bonds. It is usually produced by a ribosome that moves along an mrna and adds amino acids according to the codons that it encounters in the mrna. Here are the 2 standard amino acids: Name 3-letter 1-letter Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E Phenylalanine Phe F Glycine Gly G Histidine His H Isoleucin Ile I Lysine Lys K Leucine Leu L Here is a classification of these amino acids: Name 3-letter 1-letter Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arginine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y Here are two amino acids within a polypeptide chain: R H H O N # C "! N C C H H O R

2 14 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 Neighboring amino acids are joined by a peptide bond between the C=O and NH groups. A chain of repeated N-C α -C s make up the backbone of the protein. In such a polypeptide chain, each amino acid has two rotational degrees of freedom: the rotational angle φ ( phi ) of the bond between N and C α, and the rotational angle ψ ( psi ) of the bond between C α and C. Both bonds are free to rotate, subject to spatial constraints posed by adjacent R groups. The third angle Ω of the peptide bond between the C=O and NH groups is nearly always 18, which implies the planarity of the peptide bond. (Image source: wiki.cmbi.ru.nl) Polypeptide chains of amino-acids are called protein sequences. called a peptide sequence. A short chain or fragment is also A protein (sequence) starts with a free NH group (the N-terminus) and ends with a free COOH group (the C-terminus). Example: N-terminus C-terminus The amino acid sequence here is: R 1 R 2 R Hierarchy of protein structure We distinguish between four levels of protein structure (Linderstrom-Lang & Schnellman 1959): Primary structure: The sequence of amino acid residues in a polypeptide chain. Secondary structure: Helices and β-sheets that are formed by hydrogen bonds between the C=O and NH groups of the backbone. Tertiary structure: The three dimension structure of a polypeptide chain, consisting of secondary structure elements linked by loops and stabilized (primarily) by side-chain interactions. Quaternary structure: The aggregation of different polypeptide chains into a functional protein.

3 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, (Image source: Protein Structure and Function, GA Petsko and D Ringe (24)) The values of all pairs of rotation angles φ and ψ determines the tertiary structure of a protein. The tertiary structure of proteins is of great interest, as the shape of a protein determines much, if not all, of its function. Here is the structure of myoglobin, the first experimentally derived structure: The experimental determination of protein structure via x-ray crystallography or NMR is difficult and time-consuming. 8.2 The Holy Grail of Bioinformatics Central biochemical assumption: sequence specifies 3D-structure. Hence, we would like to be able to determine the structure of a protein from its sequence. The Holy Grail of Bioinformatics: Develop an algorithm that can reliably predict the structure (and thus function) of a protein from its amino acid sequence... MATGDERFYAEHLMPTLQGLLDPESAHR LAVRFTSLGLLPRARFQDSDMLEVRVLGH KFRNPVGIAAGFDKHGEAVDGLYKMGFGF VEIGSVTPKPQEGNPRPRVFRLPEDQAVIN RYGFNSHGLSVVEHRLRARQQKQAKLTED GLPLGVNLGKNKTSVDAAEDYAEGVRVLG PLADYLVVNVSSPNTAGLRSLQGKAELRR LLTKVLQERDGLRRVHRPAVLVKIAPDLTS QDKEDIASVVKELGIDGLIVTNTTVSRPAGL QGALRSETGGLSGKPLRDLSTQTIREMYAL TQGRVPIIGVGGVSSGQDALEKIRAGASLVQ LYTALTFWGPPVVGKVKRELEALLKEQGFG GVTDAIGADHRR...? We will return to this problem in the next chapter Secondary structure of proteins Regular features of the main chain of a protein give rise to the secondary structure. Determining the secondary structure is an important first step toward determining the three-

4 16 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 dimensional structure. There are two main types of (repetitive) secondary structure elements, called α-helices and β-sheets (L. Pauling 1951), corresponding to specific choices of the φ and ψ angles along the chain Ramachandran Plot In a Ramachandran plot pairs of torsion angles (φ,ψ) are plotted in a scatter plot. Certain torsion angle pairs are energetically particularly favorable. The following is a Ramachandran plot of observed pairs of angles in a collection of known protein structures: (Image source: Wikimedia Commons) The pairs near φ = 6 and ψ = 4 correspond to α-helices. The pairs near ( 9, 12 ) correspond to β-strands α-helices Helices arise when hydrogen bonds occur between (the C=O group of) the amino acid at position i and (the NH group of) the amino acid at position i + k (with k = 3, 4 or 5), for a run of consecutive values of i. Here is the bonding pattern of an α-helix: C=O NH NH C=O C=O NH C=O NH C=O C=O NH NH C=O NH C NH! C=O Usually, k = 4 and the resulting structure is called an α-helix. (φ, ψ) = ( 58, 47) and there are 3.6 residues per turn. The (idealized) torsion angles are Seldomly, k = 3 and then we have a 3 1 -helix. The (idealized) torsion angles are (-74, -4) and there are 3. residues per turn. Very rarely, k = 5 and then we have a π-helix. The idealized torsion angles are (-57, -7) and there are 4.4 residues per turn.

5 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, (Image source:???) β-sheets So-called β-sheets consist of β-strands that are runs of 5-1 consecutive amino acids, which are held together by H bonds: There are two possible configurations of β-sheets. (Image source: Wikimedia Commons) In a parallel β-sheet, all chains run in the same direction, while in an anti-parallel sheet, chains run in alternating directions: (Source: Wikimedia Commons) Example of an anti-parallel β-sheet (variable light chain of an immunoglobulin):

6 18 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Loops All other (non-repetitive) structures are called loops. Loops are regions of a protein chain that lie between α-helices and β-sheets. The lengths and threedimensional structure of loops can vary. Hairpin loops joining two anti-parallel β-strands may be as short as two amino acids. Loops lie on the surface of the structure. Turns are narrow 18 loops that contain at least 3 amino acids. A region of secondary structure that is not a helix, a sheet, or a recognizable turn is called a coil. 8.4 Classification of protein structures Proteins are classified to reflect both structural and evolutionary relatedness. A typical classification scheme will employ different hierarchical levels, such as: 1. Folds: Based on major structural similarities. 2. Superfamilies: Based on probable evolutionary relationships. 3. Families: Based on clear evolutionary relationships. Mount describes six principal classes of protein structures based on the three-dimensional arrangement of secondary structures, four taken from Levitt and Chothia (1976), and two additional ones taken from the SCOP database (Murzin et al., 1995): (1) A member of class α consists of a bundle of α-helices connected by loops on the surface of the proteins, e.g.: (The four letter codes are PDB accession numbers.) Hemoglobin (3hhb) (2) A member of class β consists of β-sheets, usually two sheets in close contact forming a sandwich. Examples are enzymes, transport proteins and antibodies, e.g.:

7 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, T-cell receptor CD8 (1cd8) (3) A member of class α/β consists mainly of β-sheets with intervening α-helices. This class contains many metabolic enzymes, e.g.: Tryptophan synthase β subunit (2tsy) (4) A member of class α + β consists of segregated α-helices and β-sheets, e.g.: G-specific endonuclease(1rnb) (5) This class consists of all multi-domain (α and β) proteins with domains from more than one of the above four classes. (6) Membrane and cell-surface proteins and peptides, e.g.: Integral membrane light-harvesting complex (1kzu) Databases The databases SCOP ( and CATH ( cathdb.info) both contain a hierarchical classification of protein domains by their structures. 37. structures in the PDB 971 folds in SCOP (release 1.71) Number of folds in the six classes in SCOP :

8 11 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 Class No of folds α 226 β 149 α/β 134 α + β 286 Multi-domain protein 48 Membrane and cell surface proteins Computing the secondary structure of a known 3D structure Given the positions of the main chain atoms of a protein, the DSSP (definition of secondary structure of proteins) program 1 determines the secondary structure of the protein. (It also computes geometrical features and solvent exposure.) Note that the program does not predict protein secondary structures from sequences, but rather it computes them from coordinates. The DSSP algorithm proceeds as follows: First determine which C=O and NH groups in the main chain are joined by hydrogen bonds. This decision is based on an electrostatic model using the following energy calculation: ( ) 1 E = q 1 q 2 r(on) + 1 r(ch) 1 r(oh) 1 f, r(cn) with q 1 =.42e and q 2 =.2e, where e = esu (electrostatic unit) is the unit electron charge, r(ab) is the inter-atomic distance between atom A in the first amino acid and atom B in the second in Angstroms, f = 332 is a constant called the dimensionality factor, and E is the energy in kcal/mol. Hydrogen bonds have a binding energy of about 3kCal/mol, however DSSP assigns an H-bond between C=O of residue i and NH of residue j if E <.5kCal/mol. Any H-bond detected in this way is called a k-turn, if it connects the C=O group of amino acid i to the NH group of amino acid i + k, where k = 3, 4 or 5, and a bridge, if it connects residues that are not close to each other in the sequence. Here are some of the patterns that are used to identify secondary structure elements: 3-turn NH - C α - C=O NH - C α - C=O NH - C α - C=O NH - C α - C=O parallel bridge NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O anti-parallel bridge NH - C - C=O NH - C - C=O NH - C - C=O NH - C - C=O C=O - C - NH C=O - C - NH C=O - C - NH C=O - C - NH An α-helix is identified as a consecutive run of (at least two) 4-turns. Any two helices that are offset by two or three residues are concatenated into a single helix. 1 Kabsch and Sander, Dictionary of Protein Secondary Structure: Pattern recognition of Hydrogen-Bonded and Geometrical Fatures. Biopolymers 22, , 1983.

9 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, A β-sheet corresponds to a sequence of bridges between consecutive residues in two different regions of the chain. More precisely, we need to introduce two types of patterns: a ladder is a set of one or more consecutive bridges of the same type, and a sheet is one or more ladders connected by shared residues. Detected sheets are then defined to be β-sheets. To allow for irregularities, β-bulges are introduced, in which two perfect ladders or bridges can be connected through a gap of one residue on one side and four on the other. This is how α-helices and β-sheets are defined, detected and annotated in practice. 8.6 Secondary structure prediction from sequences Secondary structure prediction problem: Assume we are given a protein sequence, e.g.: MATVAERCPICLEDPSNYSMALPCL HAFCYVCITRWIRQNPTCPLCKVPV ESVVHTIESDSEFGDQLI The secondary structure prediction problem is to assign a secondary structure type to each amino acid in the sequence, e.g. S (for strand), H (for helical), C (for coils or loops): MATVAERCPICLEDPSNYSMALPCL SSSCCCC HAFCYVCITRWIRQNPTCPLCKVPV SSS-CCHHHHHHHH---CCCC---- ESVVHTIESDSEFGDQLI --SS SSP and discriminant-analysis The secondary structure prediction program (SSP) developed by Solovyev and Salamov (1991, 1994) 2 is aimed at getting the location of entire α-helices and β-strands correct rather than assigning each individual residue to the correct type of secondary structure. The SSP algorithm is based on the assumption that secondary structures can be identified by statistical properties associated with an α-helix or β-strand. The SSP algorithm is based on the assumption that secondary structures can be identified by statistical properties of five regions associated with an α-helix or β-strand, namely the N l region, N-terminal, internal, C-terminal and C r regions, respectively, as indicated here: N! helix or " strand N l N internal C C r C The singleton characteristic The singleton characteristic is an average of single-residue preferences. Using a database of known protein structures, for every amino acid a the preference of being in a specific segment of type k 2 see second item in literature list of chapter

10 112 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 (e.g.,an α-helix or a β-strand) is calculated as S k (a) = P k (a) P (a), where P (a) and P k (a) are the proportions of amino acids of type a that are contained in the whole database and in segments of type k, respectively (see P.Y. Chou and G.D. Fasman 1978). Consider a sequence of amino acids A = a 1... a L. Choose start and end positions p and q in the sequence, and a structure type k (e.g, α-helix). The singleton characteristic S k (p, q) is defined as: p 1 i=p m p+m 1 S N l (a i ) + S N (a i ) + i=p S k (p, q) = q m i=p+m 1 (q p + 1) + 2m S internal (a i ) + q i=q m+1 S C (a i ) + q+m i=q+1 S Cr (a i ). Here, m is a pre-chosen parameter that determines the size of the non-internal segments N l, N, C and C r. It usually equals 3 or The doublet characteristic The doublet characteristic is similar to the singlet characteristic. The hope is to obtain a better discrimination by considering pairs of amino acids separated by d =, 1, 2 or 3 other residues. The preference for a particular type of secondary segment k for a pair of amino acids of type a and b, separated by d other residues, is defined as: D k (a, b, d) = P k (a, b, d) P (a, b, d), where P (a, b, d) is the proportion of pairs of amino acids a and b whose positions differ by d in a segment, and P k (a, b, d) is same value restricted to those segments of type k, in the given training database. The average preference of a segment a p a p+1... a q to be in a particular secondary structure k is denoted by D k (p, q, d) and is obtained as the normalized sum of all the pair characteristics occurring in the N l, N, internal, C and C r segments The hydrophobic moment Secondary structure prediction can be aided by examining the periodicity of amino acids with hydrophobic side chains in the protein chain. Tables assigning a hydrophobicity value h(a) (Kyte and Doolittle 1982) to each amino acid a are used to the determine the hydrophobicity of different regions of a protein: 5 4 hydrophobicity 3 2 1!1!2!3!4!5 A R N D C Q E G H I L K M F P S T W Y V amino acid

11 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Here, a positive values means hydrophobe, whereas a negative value means hydrophile. Observation: Helices often lie on the surface of a protein and there is a tendency for hydrophobic residues to face the core of the protein and for polar and charged amino acids to face the aqueous environment on the outside of the helix. The hydrophobic moment is calculated for a segment and different angles of rotation per residue (from 18 o ) and measures how well the peptide separates hydrophobic and hydrophilic regions in a pattern that is typical for a helix or strand (Eisenberg et al., 1984). For a given segment a p a p+1... a q of sequence, the hydrophobic moment for an angle ω is defined as: q M ω (p, q) = h(a i ) cos(iω) where h(a) denotes the hydrophobicity of the amino acid a. i=p 2 2 q + h(a i ) sin(iω) Here, hydrophobicity is treated as a vector or a quantity with both a magnitude (positive or negative!) and a direction. The hydrophobic moment is the length of the sum of these individual hydrophobicity vectors. In the context of predicting α-helices and β-sheets, the angles considered are ω = 1 and ω = 16, respectively. We use ω(k) to denote the angle associated with the structure type k. 1 i=p 1 2, Combining the discriminant functions The SSP method for secondary structure prediction uses a linear combination of all three described discriminant functions (LDF, linear discriminating function): Z k (p, q) = α k 1 S k (p, q) + α k 2 D k (p, q, d) + α k 3 M ω(k) (p, q) Given a threshold c k, this function classifies a segment of sequence a p a p+1... a q into class 1 (i.e., is structure of type k), if Z k (p, q) > c k, or class 2 (i.e., is not structure of type k), if Z k (p, q) c k. For each type of structure k, the method of linear discriminant analysis is used to to determine the coefficients (α k 1, αk 2, αk 3 ) and the threshold constant ck. For a given training set, the goal is to maximize the ratio of the between-class variation of Z k to within-class variation. (We will skip the details (see Fisher, 1936).) The SSP algorithm Given a protein sequence A = a 1 a 2... a L, the SSP algorithm predicts secondary structures in the following way: Algorithm (SSP for α-helices)

12 114 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Determine a seed α-helix consisting of a segment a p a p+1... a q of five residues with an average singleton characteristic higher than a pre-given threshold t. 2. Compute the value of Z k (p, q) for k = α-helix. 3. While Z k (p, q) > c k, extend the segment by one residue, up to a maximal extension of 15 residues in each direction. 4. The extended segment that gives rise to the highest LDF score is considered a potential α-helix. A similar seed-and-extend strategy is used to determine potential β-strand segments. Here the length of the initial seed is 3. The result of the two seed-and-extend phases is a set of potential α-helices and β-strands. To obtain a final prediction, overlapping pieces are assigned to the secondary structure types that have the higher LDF value. Non-overlapping remainders of such pieces with lower LDF values are retained as predictions, if they are still long enough. SSP server: Measuring prediction accuracy How to determine the accuracy of computational methods that need to be trained on a database of solved structures? Here is a very simple cross-validation method: Definition (Leave one out (LOO) cross validation) Assume that we have a training set consisting of n datasets and want to evaluate the performance of some computational method M. In the leave-one-out procedure, for each dataset D repeat the following: Train the method M on all datasets except D ( leave one out ). Run the method M on D. Determine whether the method M produced the correct answer on D. Report the accuracy of the method M as the proportion of correct answers. To evaluate the performance of a secondary structure prediction, one possibility is to assess the level of single-residue accuracy. However, this may be problematic, for example, a clearly wrong prediction such as αβαβα... in an α-helix region will still give rise to a score of 5% correct residue predictions. Thus, in practice one also evaluates the number of correctly predicted α-helices and β-strands, considering a structure to be correctly predicted, if it contains more than a pre-defined number of correctly predicted residues, often just Performance of different characteristics of SSP An experimental evaluation of secondary structure predictions was performed on 126 non-homologous proteins with known three-dimensional structures (Rost and Sander 1993), the secondary structure of which was assigned using the DSSP program.

13 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Different combinations of characteristics were compared with each other, giving rise to the following results 3. Characteristics used Q (%) Singleton characteristic (S) 58.5 S + hydrophobic moment M 61.4 Doublet characteristic (D) 62.2 D + M 64.8 S + D + M Neural networks Neural networks are used for classification problems for which there exist a good supply of training data and little understanding of the structure of the problem at hand. They are inspired by the biology of the brain. A neural network is a graph in which nodes represent neurons and edges represent connections between the neurons. Signals flow through the network and are processed by the neurons. Connections can be weak or strong depending on their weight. These weights are usually set by supervised training. We destinguish between recurrent architectures that contain directed loops, and feed-forward architectures that do not contain directed loops. An architecture is called layered if elements are grouped in layers and connections between elements are defined through the layers. Input data is presented to an input layer and the output is read from an output layer. Other layers are called hidden: input layer hidden layer output layer In Bioinformatics mostly layered feed-forward neural nets are used. The neuron is the universal basic element of a neural network. One commonly used type is the perceptron: x 2 x w 2 w 1 f(σw i x i ) y x r w r In a feed-forward neural net, a node y is fed from r nodes x 1,..., x r by edges (x i, y) with weights w i. It processes these inputs and fires a signal of strength f(x), where x = r i=1 w ix i. Here is a very simple example of a neural net whose task it is to determine whether x 1 > 1 2 x 2: 3 T. Jiang, Y. Xu and M.Q. Zhang (editors), 22, page 383

14 116 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 X1!"#$% Y &$%#$% X 2 (!' It takes two numbers x 1 and x 2 as input and produces a signal y = 2x 1 + ( 1)x 2 as output that is positive, if 2x 1 > x 2, and negative, if 2x 1 < x 2. To mimic the firing of a neuron, we would like the output of the node labeled y to be 1, if 2x 1 > x 2 and, if 2x 1 < x 2. This could be realized using a simple step function { 1 if 2x1 > x y = 2 else. However, it is better to use a continuous function for this purpose, such as a step-like sigmoidal 1 function of the form: f(x) = sgm(x) = 1+exp( x), which looks like this: #!"+ #,-#./1-!22!"*!")!"(!"'!"&!"%!"$!"#!!#!!'! ' #! Constructing a neural network There are two steps to constructing a neural network. The first step is to design the topology of the network. This involves determining the number of input nodes and output nodes and how they are associated with external variables. Additionally, the number of internal or hidden (layers of) nodes must be determined. Finally, nodes have to be connected using edges. The second step is called training. Supervised training requires a training set consisting of input data points for which the desired output is known. Each such data point is presented to the neural net and then the weights in the net are slightly modified using a gradient descent method so as to increase the performance of the network (as discussed below). The goal is to set the weights of the edges so that the number of correct results produced for a given training data set is maximized. 8.9 The PHD neural network The PHD (PHD-sec) algorithm by Rost and Sander 4 uses a neural network to predict the secondary structure of a given residue. The model consists of three processing units: the input layer, the output layer and a hidden layer. The units of the input layer the amino acids read a small segment (13-17 residues) of sequence around the position of interest, obtained using a sliding window. There are 21 input units per sequence position, namely one per amino acid and one for padding at the beginning and end of the sequence. Given a single sequence, the input unit corresponding to a given amino acid at a given position is set to 1. 4 B. Rost and C. Sander, Prediction of protein secondary structure at better than 7% accuracy. J Mol Biol 232, , 1993.

15 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Then signals are sent to units in the hidden layer, which process them and pass them on to the units of the output layer. The final output determines which of the three types of secondary structure is assigned to the central residue. The PHD paper describes three successive neural networks, PHD-sec for secondary structure prediction, PHD-htm for predicting transmembrane helices and PHD-acc for solvent accessibility : Here is a simplified depiction of PHD-sec: input sequence input layer window hidden layer output layer L S W T K C Y A V S G A P 1... Hj Ok (Rost, 1996) predicted structure 1 α β coil (Adapted from Mount, 21) If the input to the neural net consists of a sequence profile, then each input unit is set to the frequency of the associated amino acid at the given position. Additionally, two input units are used to count insertions and deletions. The predictions obtained for adjacent windows are then post-processed by applying rules or additional neural nets to obtain a final prediction. Experimental studies show that the PHD method applied to sequences obtains a single-residue accuracy of 7.8%. Application to sequence profiles gives rise to an accuracy of 72% (Rost and Sand 1994). The PHD algorithm uses sequences from the HSSP (homology-derived secondary structure of proteins) database for training (Sander and Schneider, 1991). 8.1 Training the PHD neural network A method called back-propagation can be used to train such neural networks. For example, consider the output node O k shown in the network above and assume that it predicts whether the central residue lies in an α-helix. The output signal O k predicts an α-helix, if it is close to 1, or not, if it is

16 118 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, 211 close to. Presented with a training data point, we know whether or not the central residue actually lies in an α-helix, and thus, what the desired output D k of O k should be. Consider one of the hidden units H j that is connected to O k and emits a signal H j that is modified by the weight w jk. The signal arriving at O k is w jk H j : Hj Hj W jkhj When training the network, the main question is: how should we alter w jk so as to bring the value O k = sgm( j w jkh j ) of node O k closer to the desired value D k? Ok Assume that the network has p input, q hidden and r output nodes. In this case, the output of a hidden node H j is given by: ( p ) H j = sgm w Ii H j I i. The output of an output node O k is given by: q O k = sgm w Hj O k H j. i=1 j=1 Hence, ( q p ) O k = sgm w Hj O k sgm w Ii H j I i, for k = 1, 2,..., r. j=1 i=1 This allows us to calculate the output for a given input set. A training set specifies t pairs of inputs and desired outputs, (I 1 1, I1 2,..., I1 p, D 1 1,..., D1 r),..., (I t 1, It 2,..., It p, D t 1,..., Dt r). The mean square error is defined as: E = t q=1 i=1 r (D q i Oq i )2, which is straight-forward to calculate using the previous equation. The gradient descent method specifies that we repeatedly do the following: Choose some weight w ij in the network and modify it by a small amount w ij = n E/ w ij, so as to decrease the error E. The factor n is the training rate (.3). For example, in the case of an edge jk attaching a hidden node H j to an output node O k, the partial derivative of the error E with respect to w jk is given by which will we not show here. E/ w jk = (O k D k )O k (1 O k )H j, So in this case, the weight w jk is modified by this amount: w jk = n(o k D k )O k (1 O k )H j. Secondary structure prediction web server:

17 Grundlagen der Bioinformatik, SoSe 11, D. Huson, June 6, Summary The most important features of the secondary structure of a protein are its α-helices and β- strands. The DSSP program defines the secondary structure elements of a protein based on the 3D coordinates of the atoms in the protein. The SSP program uses a linear discriminant function to predict secondary structure from sequence. The program PhD addresses the same problem using a neural network.

7 Protein secondary structure

7 Protein secondary structure 78 Grundlagen der Bioinformatik, SS 1, D. Huson, June 17, 21 7 Protein secondary structure Sources for this chapter, which are all recommended reading: Introduction to Protein Structure, Branden & Tooze,

More information

12 Protein secondary structure

12 Protein secondary structure Grundlagen der Bioinformatik, SoSe 14, D. Huson, July 2, 214 147 12 Protein secondary structure Sources for this chapter, which are all recommended reading: Introduction to Protein Structure, Branden &

More information

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods

Protein structure. Protein structure. Amino acid residue. Cell communication channel. Bioinformatics Methods Cell communication channel Bioinformatics Methods Iosif Vaisman Email: ivaisman@gmu.edu SEQUENCE STRUCTURE DNA Sequence Protein Sequence Protein Structure Protein structure ATGAAATTTGGAAACTTCCTTCTCACTTATCAGCCACCT...

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Proteins: Characteristics and Properties of Amino Acids

Proteins: Characteristics and Properties of Amino Acids SBI4U:Biochemistry Macromolecules Eachaminoacidhasatleastoneamineandoneacidfunctionalgroupasthe nameimplies.thedifferentpropertiesresultfromvariationsinthestructuresof differentrgroups.thergroupisoftenreferredtoastheaminoacidsidechain.

More information

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 the goal is the prediction of the secondary structure conformation which is local each amino

More information

Physiochemical Properties of Residues

Physiochemical Properties of Residues Physiochemical Properties of Residues Various Sources C N Cα R Slide 1 Conformational Propensities Conformational Propensity is the frequency in which a residue adopts a given conformation (in a polypeptide)

More information

Protein Structure Bioinformatics Introduction

Protein Structure Bioinformatics Introduction 1 Swiss Institute of Bioinformatics Protein Structure Bioinformatics Introduction Basel, 27. September 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute of Bioinformatics Klingelbergstr

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Properties of amino acids in proteins

Properties of amino acids in proteins Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated

More information

Translation. A ribosome, mrna, and trna.

Translation. A ribosome, mrna, and trna. Translation The basic processes of translation are conserved among prokaryotes and eukaryotes. Prokaryotic Translation A ribosome, mrna, and trna. In the initiation of translation in prokaryotes, the Shine-Dalgarno

More information

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov 2018 Biochemistry 110 California Institute of Technology Lecture 2: Principles of Protein Structure Linus Pauling (1901-1994) began his studies at Caltech in 1922 and was directed by Arthur Amos oyes to

More information

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:

More information

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration:

PROTEIN STRUCTURE AMINO ACIDS H R. Zwitterion (dipolar ion) CO 2 H. PEPTIDES Formal reactions showing formation of peptide bond by dehydration: PTEI STUTUE ydrolysis of proteins with aqueous acid or base yields a mixture of free amino acids. Each type of protein yields a characteristic mixture of the ~ 20 amino acids. AMI AIDS Zwitterion (dipolar

More information

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes

More information

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics

Protein Structure. Role of (bio)informatics in drug discovery. Bioinformatics Bioinformatics Protein Structure Principles & Architecture Marjolein Thunnissen Dep. of Biochemistry & Structural Biology Lund University September 2011 Homology, pattern and 3D structure searches need

More information

Chemistry Chapter 22

Chemistry Chapter 22 hemistry 2100 hapter 22 Proteins Proteins serve many functions, including the following. 1. Structure: ollagen and keratin are the chief constituents of skin, bone, hair, and nails. 2. atalysts: Virtually

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Viewing and Analyzing Proteins, Ligands and their Complexes 2 2 Viewing and Analyzing Proteins, Ligands and their Complexes 2 Overview Viewing the accessible surface Analyzing the properties of proteins containing thousands of atoms is best accomplished by representing

More information

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Major Types of Association of Proteins with Cell Membranes. From Alberts et al Major Types of Association of Proteins with Cell Membranes From Alberts et al Proteins Are Polymers of Amino Acids Peptide Bond Formation Amino Acid central carbon atom to which are attached amino group

More information

Protein Struktur (optional, flexible)

Protein Struktur (optional, flexible) Protein Struktur (optional, flexible) 22/10/2009 [ 1 ] Andrew Torda, Wintersemester 2009 / 2010, AST nur für Informatiker, Mathematiker,.. 26 kt, 3 ov 2009 Proteins - who cares? 22/10/2009 [ 2 ] Most important

More information

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013

Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013 Protein Structure Marianne Øksnes Dalheim, PhD candidate Biopolymers, TBT4135, Autumn 2013 The presentation is based on the presentation by Professor Alexander Dikiy, which is given in the course compedium:

More information

The Structure of Enzymes!

The Structure of Enzymes! The Structure of Enzymes Levels of Protein Structure 0 order amino acid composition Primary Secondary Motifs Tertiary Domains Quaternary ther sequence repeating structural patterns defined by torsion angles

More information

The Structure of Enzymes!

The Structure of Enzymes! The Structure of Enzymes Levels of Protein Structure 0 order amino acid composition Primary Secondary Motifs Tertiary Domains Quaternary ther sequence repeating structural patterns defined by torsion angles

More information

1. Amino Acids and Peptides Structures and Properties

1. Amino Acids and Peptides Structures and Properties 1. Amino Acids and Peptides Structures and Properties Chemical nature of amino acids The!-amino acids in peptides and proteins (excluding proline) consist of a carboxylic acid ( COOH) and an amino ( NH

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

Secondary and sidechain structures

Secondary and sidechain structures Lecture 2 Secondary and sidechain structures James Chou BCMP201 Spring 2008 Images from Petsko & Ringe, Protein Structure and Function. Branden & Tooze, Introduction to Protein Structure. Richardson, J.

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

UNIT TWELVE. a, I _,o "' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I "-oh

UNIT TWELVE. a, I _,o ' I I I. I I.P. l'o. H-c-c. I ~o I ~ I / H HI oh H...- I II I II 'oh. HO\HO~ I -oh UNT TWELVE PROTENS : PEPTDE BONDNG AND POLYPEPTDES 12 CONCEPTS Many proteins are important in biological structure-for example, the keratin of hair, collagen of skin and leather, and fibroin of silk. Other

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Model Mélange. Physical Models of Peptides and Proteins

Model Mélange. Physical Models of Peptides and Proteins Model Mélange Physical Models of Peptides and Proteins In the Model Mélange activity, you will visit four different stations each featuring a variety of different physical models of peptides or proteins.

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Problem Set 1

Problem Set 1 2006 7.012 Problem Set 1 Due before 5 PM on FRIDAY, September 15, 2006. Turn answers in to the box outside of 68-120. PLEASE WRITE YOUR ANSWERS ON THIS PRINTOUT. 1. For each of the following parts, pick

More information

Amino Acids and Peptides

Amino Acids and Peptides Amino Acids Amino Acids and Peptides Amino acid a compound that contains both an amino group and a carboxyl group α-amino acid an amino acid in which the amino group is on the carbon adjacent to the carboxyl

More information

Basic Principles of Protein Structures

Basic Principles of Protein Structures Basic Principles of Protein Structures Proteins Proteins: The Molecule of Life Proteins: Building Blocks Proteins: Secondary Structures Proteins: Tertiary and Quartenary Structure Proteins: Geometry Proteins

More information

Protein Secondary Structure Prediction using Feed-Forward Neural Network

Protein Secondary Structure Prediction using Feed-Forward Neural Network COPYRIGHT 2010 JCIT, ISSN 2078-5828 (PRINT), ISSN 2218-5224 (ONLINE), VOLUME 01, ISSUE 01, MANUSCRIPT CODE: 100713 Protein Secondary Structure Prediction using Feed-Forward Neural Network M. A. Mottalib,

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39

Protein Struktur. Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep. wake up at slide 39 Protein Struktur Biologen und Chemiker dürfen mit Handys spielen (leise) go home, go to sleep wake up at slide 39 Andrew Torda, Wintersemester 2016/ 2017 Andrew Torda 17.10.2016 [ 1 ] Proteins - who cares?

More information

Supersecondary Structures (structural motifs)

Supersecondary Structures (structural motifs) Supersecondary Structures (structural motifs) Various Sources Slide 1 Supersecondary Structures (Motifs) Supersecondary Structures (Motifs): : Combinations of secondary structures in specific geometric

More information

Peptides And Proteins

Peptides And Proteins Kevin Burgess, May 3, 2017 1 Peptides And Proteins from chapter(s) in the recommended text A. Introduction B. omenclature And Conventions by amide bonds. on the left, right. 2 -terminal C-terminal triglycine

More information

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Structures: Experiments and Modeling. Patrice Koehl Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:

More information

Exam I Answer Key: Summer 2006, Semester C

Exam I Answer Key: Summer 2006, Semester C 1. Which of the following tripeptides would migrate most rapidly towards the negative electrode if electrophoresis is carried out at ph 3.0? a. gly-gly-gly b. glu-glu-asp c. lys-glu-lys d. val-asn-lys

More information

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain. Biochemistry Quiz Review 1I A general note: Short answer questions are just that, short. Writing a paragraph filled with every term you can remember from class won t improve your answer just answer clearly,

More information

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES

PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES PROTEIN SECONDARY STRUCTURE PREDICTION USING NEURAL NETWORKS AND SUPPORT VECTOR MACHINES by Lipontseng Cecilia Tsilo A thesis submitted to Rhodes University in partial fulfillment of the requirements for

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Exam III. Please read through each question carefully, and make sure you provide all of the requested information.

Exam III. Please read through each question carefully, and make sure you provide all of the requested information. 09-107 onors Chemistry ame Exam III Please read through each question carefully, and make sure you provide all of the requested information. 1. A series of octahedral metal compounds are made from 1 mol

More information

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 9. Protein Structure Prediction I Structure Prediction Overview Overview of problem variants Secondary structure prediction

More information

Section Week 3. Junaid Malek, M.D.

Section Week 3. Junaid Malek, M.D. Section Week 3 Junaid Malek, M.D. Biological Polymers DA 4 monomers (building blocks), limited structure (double-helix) RA 4 monomers, greater flexibility, multiple structures Proteins 20 Amino Acids,

More information

Central Dogma. modifications genome transcriptome proteome

Central Dogma. modifications genome transcriptome proteome entral Dogma DA ma protein post-translational modifications genome transcriptome proteome 83 ierarchy of Protein Structure 20 Amino Acids There are 20 n possible sequences for a protein of n residues!

More information

Protein Secondary Structure Prediction

Protein Secondary Structure Prediction Protein Secondary Structure Prediction Doug Brutlag & Scott C. Schmidler Overview Goals and problem definition Existing approaches Classic methods Recent successful approaches Evaluating prediction algorithms

More information

Solutions In each case, the chirality center has the R configuration

Solutions In each case, the chirality center has the R configuration CAPTER 25 669 Solutions 25.1. In each case, the chirality center has the R configuration. C C 2 2 C 3 C(C 3 ) 2 D-Alanine D-Valine 25.2. 2 2 S 2 d) 2 25.3. Pro,, Trp, Tyr, and is, Trp, Tyr, and is Arg,

More information

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter

Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction. Sepp Hochreiter Bioinformatics III Structural Bioinformatics and Genome Analysis Part Protein Secondary Structure Prediction Institute of Bioinformatics Johannes Kepler University, Linz, Austria Chapter 4 Protein Secondary

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

On the Structure Differences of Short Fragments and Amino Acids in Proteins with and without Disulfide Bonds

On the Structure Differences of Short Fragments and Amino Acids in Proteins with and without Disulfide Bonds On the Structure Differences of Short Fragments and Amino Acids in Proteins with and without Disulfide Bonds A thesis submitted for the degree of Doctor of Philosophy Saravanan Dayalan B.E., M.App.Sc(IT),

More information

DATA MINING OF ELECTROSTATIC INTERACTIONS BETWEEN AMINO ACIDS IN COILED-COIL PROTEINS USING THE STABLE COIL ALGORITHM ANKUR S.

DATA MINING OF ELECTROSTATIC INTERACTIONS BETWEEN AMINO ACIDS IN COILED-COIL PROTEINS USING THE STABLE COIL ALGORITHM ANKUR S. University of Colorado at Colorado Springs i DATA MINING OF ELECTROSTATIC INTERACTIONS BETWEEN AMINO ACIDS IN COILED-COIL PROTEINS USING THE STABLE COIL ALGORITHM BY ANKUR S. DESHMUKH A project submitted

More information

CHAPTER 29 HW: AMINO ACIDS + PROTEINS

CHAPTER 29 HW: AMINO ACIDS + PROTEINS CAPTER 29 W: AMI ACIDS + PRTEIS For all problems, consult the table of 20 Amino Acids provided in lecture if an amino acid structure is needed; these will be given on exams. Use natural amino acids (L)

More information

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure.

12/6/12. Dr. Sanjeeva Srivastava IIT Bombay. Primary Structure. Secondary Structure. Tertiary Structure. Quaternary Structure. Dr. anjeeva rivastava Primary tructure econdary tructure Tertiary tructure Quaternary tructure Amino acid residues α Helix Polypeptide chain Assembled subunits 2 1 Amino acid sequence determines 3-D structure

More information

Protein Structure & Motifs

Protein Structure & Motifs & Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor

LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor LS1a Fall 2014 Problem Set #2 Due Monday 10/6 at 6 pm in the drop boxes on the Science Center 2 nd Floor Note: Adequate space is given for each answer. Questions that require a brief explanation should

More information

Bioinformatics. Macromolecular structure

Bioinformatics. Macromolecular structure Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain

More information

BCH 4053 Exam I Review Spring 2017

BCH 4053 Exam I Review Spring 2017 BCH 4053 SI - Spring 2017 Reed BCH 4053 Exam I Review Spring 2017 Chapter 1 1. Calculate G for the reaction A + A P + Q. Assume the following equilibrium concentrations: [A] = 20mM, [Q] = [P] = 40fM. Assume

More information

1. What is an ångstrom unit, and why is it used to describe molecular structures?

1. What is an ångstrom unit, and why is it used to describe molecular structures? 1. What is an ångstrom unit, and why is it used to describe molecular structures? The ångstrom unit is a unit of distance suitable for measuring atomic scale objects. 1 ångstrom (Å) = 1 10-10 m. The diameter

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

Heteropolymer. Mostly in regular secondary structure

Heteropolymer. Mostly in regular secondary structure Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!

More information

Overview. The peptide bond. Page 1

Overview. The peptide bond. Page 1 Overview Secondary structure: the conformation of the peptide backbone The peptide bond, steric implications Steric hindrance and sterically allowed conformations. Ramachandran diagrams Side chain conformations

More information

B O C 4 H 2 O O. NOTE: The reaction proceeds with a carbonium ion stabilized on the C 1 of sugar A.

B O C 4 H 2 O O. NOTE: The reaction proceeds with a carbonium ion stabilized on the C 1 of sugar A. hbcse 33 rd International Page 101 hemistry lympiad Preparatory 05/02/01 Problems d. In the hydrolysis of the glycosidic bond, the glycosidic bridge oxygen goes with 4 of the sugar B. n cleavage, 18 from

More information

THE UNIVERSITY OF MANITOBA. PAPER NO: _1_ LOCATION: 173 Robert Schultz Theatre PAGE NO: 1 of 5 DEPARTMENT & COURSE NO: CHEM / MBIO 2770 TIME: 1 HOUR

THE UNIVERSITY OF MANITOBA. PAPER NO: _1_ LOCATION: 173 Robert Schultz Theatre PAGE NO: 1 of 5 DEPARTMENT & COURSE NO: CHEM / MBIO 2770 TIME: 1 HOUR THE UNIVERSITY OF MANITOBA 1 November 1, 2016 Mid-Term EXAMINATION PAPER NO: _1_ LOCATION: 173 Robert Schultz Theatre PAGE NO: 1 of 5 DEPARTMENT & COURSE NO: CHEM / MBIO 2770 TIME: 1 HOUR EXAMINATION:

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

The Structure and Functions of Proteins

The Structure and Functions of Proteins Wright State University CORE Scholar Computer Science and Engineering Faculty Publications Computer Science and Engineering 2003 The Structure and Functions of Proteins Dan E. Krane Wright State University

More information

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge

EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I Dr. Glen B Legge EXAM 1 Fall 2009 BCHS3304, SECTION # 21734, GENERAL BIOCHEMISTRY I 2009 Dr. Glen B Legge This is a Scantron exam. All answers should be transferred to the Scantron sheet using a #2 pencil. Write and bubble

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Tamer Barakat. Razi Kittaneh. Mohammed Bio. Diala Abu-Hassan

Tamer Barakat. Razi Kittaneh. Mohammed Bio. Diala Abu-Hassan 14 Tamer Barakat Razi Kittaneh Mohammed Bio Diala Abu-Hassan Protein structure: We already know that when two amino acids bind, a dipeptide is formed which is considered to be an oligopeptide. When more

More information

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU

Can protein model accuracy be. identified? NO! CBS, BioCentrum, Morten Nielsen, DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU NO! Identification of Protein-model accuracy Why is it important? What is accuracy RMSD, fraction correct, Protein model correctness/quality

More information

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror

Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Protein structure (and biomolecular structure more generally) CS/CME/BioE/Biophys/BMI 279 Sept. 28 and Oct. 3, 2017 Ron Dror Please interrupt if you have questions, and especially if you re confused! Assignment

More information

D Dobbs ISU - BCB 444/544X 1

D Dobbs ISU - BCB 444/544X 1 11/7/05 Protein Structure: Classification, Databases, Visualization Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses

More information

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction.

Lecture 7. Protein Secondary Structure Prediction. Secondary Structure DSSP. Master Course DNA/Protein Structurefunction. C N T R F O R N T G R A T V B O N F O R M A T C S V U Master Course DNA/Protein Structurefunction Analysis and Prediction Lecture 7 Protein Secondary Structure Prediction Protein primary structure 20 amino

More information

Studies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine

Studies Leading to the Development of a Highly Selective. Colorimetric and Fluorescent Chemosensor for Lysine Supporting Information for Studies Leading to the Development of a Highly Selective Colorimetric and Fluorescent Chemosensor for Lysine Ying Zhou, a Jiyeon Won, c Jin Yong Lee, c * and Juyoung Yoon a,

More information

Biomolecules: lecture 9

Biomolecules: lecture 9 Biomolecules: lecture 9 - understanding further why amino acids are the building block for proteins - understanding the chemical properties amino acids bring to proteins - realizing that many proteins

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution Supplemental Materials for Structural Diversity of Protein Segments Follows a Power-law Distribution Yoshito SAWADA and Shinya HONDA* National Institute of Advanced Industrial Science and Technology (AIST),

More information

IT og Sundhed 2010/11

IT og Sundhed 2010/11 IT og Sundhed 2010/11 Sequence based predictors. Secondary structure and surface accessibility Bent Petersen 13 January 2011 1 NetSurfP Real Value Solvent Accessibility predictions with amino acid associated

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Resonance assignments in proteins. Christina Redfield

Resonance assignments in proteins. Christina Redfield Resonance assignments in proteins Christina Redfield 1. Introduction The assignment of resonances in the complex NMR spectrum of a protein is the first step in any study of protein structure, function

More information

CHEM J-9 June 2014

CHEM J-9 June 2014 CEM1611 2014-J-9 June 2014 Alanine (ala) and lysine (lys) are two amino acids with the structures given below as Fischer projections. The pk a values of the conjugate acid forms of the different functional

More information

Dental Biochemistry Exam The total number of unique tripeptides that can be produced using all of the common 20 amino acids is

Dental Biochemistry Exam The total number of unique tripeptides that can be produced using all of the common 20 amino acids is Exam Questions for Dental Biochemistry Monday August 27, 2007 E.J. Miller 1. The compound shown below is CH 3 -CH 2 OH A. acetoacetate B. acetic acid C. acetaldehyde D. produced by reduction of acetaldehyde

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

From Amino Acids to Proteins - in 4 Easy Steps

From Amino Acids to Proteins - in 4 Easy Steps From Amino Acids to Proteins - in 4 Easy Steps Although protein structure appears to be overwhelmingly complex, you can provide your students with a basic understanding of how proteins fold by focusing

More information

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

More information

Conformational Geometry of Peptides and Proteins:

Conformational Geometry of Peptides and Proteins: Conformational Geometry of Peptides and Proteins: Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three

More information

Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

More information

7.012 Problem Set 1 Solutions

7.012 Problem Set 1 Solutions ame TA Section 7.012 Problem Set 1 Solutions Your answers to this problem set must be inserted into the large wooden box on wheels outside 68120 by 4:30 PM, Thursday, September 15. Problem sets will not

More information

Study of Mining Protein Structural Properties and its Application

Study of Mining Protein Structural Properties and its Application Study of Mining Protein Structural Properties and its Application A Dissertation Proposal Presented to the Department of Computer Science and Information Engineering College of Electrical Engineering and

More information

DATE A DAtabase of TIM Barrel Enzymes

DATE A DAtabase of TIM Barrel Enzymes DATE A DAtabase of TIM Barrel Enzymes 2 2.1 Introduction.. 2.2 Objective and salient features of the database 2.2.1 Choice of the dataset.. 2.3 Statistical information on the database.. 2.4 Features....

More information

Structural Alignment of Proteins

Structural Alignment of Proteins Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE

More information

Chemical Properties of Amino Acids

Chemical Properties of Amino Acids hemical Properties of Amino Acids Protein Function Make up about 15% of the cell and have many functions in the cell 1. atalysis: enzymes 2. Structure: muscle proteins 3. Movement: myosin, actin 4. Defense:

More information

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Structure: Data Bases and Classification Ingo Ruczinski Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information