Introduction to Computational Modelling and Functional Analysis of Proteins

Size: px
Start display at page:

Download "Introduction to Computational Modelling and Functional Analysis of Proteins"

Transcription

1 Introduction to Computational Modelling and Functional Analysis of Proteins AG Prof. Dr. Monika Fritz Pure and Applied Biomineralisation Institute for Biophysics AG Prof. Dr. Manfred Radmacher Institute for Biophysics Aim and scope of the lab class The aim of this lab class is to give an introduction into the concept and fundamentals of protein structure modelling and computational functional analysis of proteins. In the course of the class you are introduced to useful tools for protein structure modelling. These tools comprise protein sequence comparison techniques, sequence alignment techniques, comparative modelling algorithms, surface potential calculators, systematic docking programs and protein structure visualizers. Introduction Along with the advent of synchrotrons as high quality X- ray sources (see e.g. EMBL at DESY) and NMR facilities one might assume that computational protein structure modelling is old- fashioned. Nevertheless it is often very laborious and/or expensive it might even be impossible - to obtain the protein of interest in a sufficient amount, to purify it to a high degree and to prepare a protein crystal for X- ray investigation. The last step might be obsolete in the future when free electron lasers (like XFEL) will be operational. In contrast to the experimental structure determination, which requires expensive equipment and methods computational modelling requires only a modern workstation and the appropriate software. Today structure modelling is even possible on web servers. This discrepancy is reflected in the number of available experimental protein structures (Protein Data Base: structures) compared to the number of available models (Protein Model Portal: proteins covered by at least one model). To date (July 2012) about 24 million proteins have been sequenced. Consequently there are still opportunities to reveal structures of a lot of proteins. Proteins play a major role in every organism. They take part in nearly every cellular process e.g. metabolism, transport, signalling, cell structure, immune defence etc. From the protein structure or from a reasonable protein model important features can be deduced. The protein surface may reveal possible binding sites for ligands. From surface charge calculations the protein behaviour in different aqueous environments may be obtained. Even the function of catalytic sites in enzymes can be determined. For every modelling of an unknown protein structure the sequence of the protein of interest is needed. The protein sequence is the sequence of amino acids that constitute the macromolecule and is said to be the primary structure. The amino acid sequence can be obtained experimentally or determined from the DNA fragment that encodes the protein if available. Once the protein sequence is available a data base search against the sequences of all known structures may reveal relationships to known proteins. If related proteins have been found their structure can be used as a template in the modelling

2 process. The central assumption of every template based modelling is that the protein sequence is related to the protein structure: similar sequences should result in similar structures. Further reading Alberts, B. et al., Molecular Biology of the Cell, Garland Publishing Inc., 3 rd edition, 1994 (Chapter about Protein Structure) Martí- Renom, M.A. et al., Comparative Protein Structure Modeling of Genes and Genomes, Annu. Rev. Biophys. Biomol. Struct. 29, (2000) Computational Protein Sequence Comparison As outlined above every modelling of an unknown structure starts with a search of related or homologous protein sequences (remember that the structures must be available also) to the sequence of the protein of interest. During the evolution of species parts of the DNA are modified and/or the transcription of the DNA is influenced. The first process is called mutation. Since the DNA is transcribed and translated into the amino acid sequence of proteins these macromolecules are changed accordingly. Some mutations proved to be advantageous to some species while other might have let to a decline of an organism. There are two important purposes of computational protein sequence comparison. One is the construction of phylogenetic trees. Phylogenetic trees depict the evolutionary dependency of proteins of the same class in different clades. The determination of the evolutionary distance of two proteins requires a measure of similarity of two (or more) sequences. Therefore the determination of evolutionary dependency of two or more proteins is closely related to the second purpose of computational protein sequence comparison: The alignment of two sequences and determination of the similarity of two sequences under certain constraints. Consider the following example. Given the following two sequences one is a N- acetyl- D- glucosamine kinase and the other is a human glucokinase. >sp Q9UJ70 NAGK_HUMAN N-acetyl-D-glucosamine kinase OS=Homo sapiens GN=NAGK PE=1 SV=4 MAAIYGGVEGGGTRSEVLLVSEDGKILAEADGLSTNHWLIGTDKCVERINEMVNRAKRKA GVDPLVPLRSLGLSLSGGDQEDAGRILIEELRDRFPYLSESYLITTDAAGSIATATPDGG VVLISGTGSNCRLINPDGSESGCGGWGHMMGDEGSAYWIAHQAVKIVFDSIDNLEAAPHD IGYVKQAMFHYFQVPDRLGILTHLYRDFDKCRFAGFCRKIAEGAQQGDPLSRYIFRKAGE MLGRHIVAVLPEIDPVLFQGKIGLPILCVGSVWKSWELLKEGFLLALTQGREIQAQNFFS SFTLMKLRHSSALGGASLGARHIGHLLPMDYSANAIAFYSYTFS >sp P35557 HXK4_HUMAN Glucokinase OS=Homo sapiens GN=GCK PE=1 SV=1 MLDDRARMEAAKKEKVEQILAEFQLQEEDLKKVMRRMQKEMDRGLRLETHEEASVKMLPT YVRSTPEGSEVGDFLSLDLGGTNFRVMLVKVGEGEEGQWSVKTKHQMYSIPEDAMTGTAE MLFDYISECISDFLDKHQMKHKKLPLGFTFSFPVRHEDIDKGILLNWTKGFKASGAEGNN VVGLLRDAIKRRGDFEMDVVAMVNDTVATMISCYYEDHQCEVGMIVGTGCNACYMEEMQN

3 VELVEGDEGRMCVNTEWGAFGDSGELDEFLLEYDRLVDESSANPGQQLYEKLIGGKYMGE LVRLVLLRLVDENLLFHGEASEQLRTRGAFETRFVSQVESDTGDRKQIYNILSTLGLRPS TTDCDIVRRACESVSTRAAHMCSAGLAGVINRMRESRSEDVMRITVGVDGSVYKLHPSFK ERFHASVRRLTPSCEITFIESEEGSGRGAALVSAVACKKACMLGQ Possible tasks could be to determine the relationship or the degree of similarity between these two proteins or to perform an alignment between these two sequences in order to identify conserved protein motifs. The figure below shows an alignment of the two sequences above in the residue range ( : ). P35557 EVGDFLSLDLGGTNFRVMLVKVGEGEEGQWSVKTKH--QMYSIPEDAMTGTAEMLFDYIS 127 Q9UJ70 --GGVVLISGTGSNCRLINPDGSESGCGGWGHMMGDEGSAYWIAHQA----VKIVFDSID 172 *.: :. *:* *::..*. * *... * *.:*.:::** *. P35557 ECISDFLDKHQMKHKKLPLGFTFSFPVRH EDIDKGILLNWTKGFKASGAEGNN 180 Q9UJ70 NLEAA---PHDIGYVKQAMFHYFQVPDRLGILTHLYRDFDKCRFAGFCRKIAEGAQQGDP 229 : : *:: : * :. *..* *.*:** : : : :.. :*: If an alignment is real or if the sequences are evolutionary related it can be expected that identities (same amino acids) and conservative substitutions (amino acids with similar physico- chemical properties) are more likely than random alignments. These kind of aligned amino acids should be assigned a positive score. At the same time non- conservative changes should be observed less frequent than random alignments. These changes should have a negative score. Consider two sequences x and y each composed of letters x, y A of an alphabet A at position i (here the 20 amino acids or four nucleic acids). If an amino acid occurs with the frequency q then the probability of a random alignment can be stated as P x, y R = q q The probability that an alignment occurs according to some match model is P x, y M = p Taking the logarithm of the odds ratio gives S = s x, y = log p q q S is the total score of an alignment and stated as the sum of individual amino acid pairs score. While the frequencies of individual amino acids q can be determined quite easily from a large database of protein sequences the frequency of real substitutions of amino acids p is more difficult to determine. Following figure shows a (scaled) BLOSUM62 substitution matrix. This scheme assigns a score to each amino acid pair. The numbers on the diagonal are positive and relatively large. On the contrary the pair Trp- Lys has a score of - 3. A small (negative) number means, that during evolution the substitution. The Lys- Arg pair has a score of 2. These

4 two amino acids are more likely to be substituted during evolution. Exercise 2 will shed more light on the properties of this substitution matrix. A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V The basic steps of the construction of the BLOSUM (block substitution matrices) by Henikoff & Henikoff were the following. The authors started with a set of aligned ungapped regions (the so- called blocks) of protein families. The sequences inside these blocks were clustered according to certain levels of percent identity of the sequences. Sequences that share at least the given level of identity were treated as one sequence in the alignment therefore reducing multiple contributions. In this reduced blocks the relative frequencies of each amino acid (q ) as well as the relative frequencies of each aligned amino acid pair (p " ) were determined. Sometimes not only a substitution might have occurred during evolution but an insertion or deletion. Since the inserted or deleted amino acid is not known, a gap is inserted. Usually gaps are penalised according to a linear scheme with a constant negative score times gap length or an affine penalty is used with a different negative score for the gap opening and the extension of the gap. Several alignment algorithms are available. There are speed- optimized algorithms such as BLAST or FASTA and the more rigorous Needleman- Wunsch and Smith- Waterman algorithms. The Needleman- Wunsch algorithm searches for the optimal global alignment and the Smith- Waterman algorithm for the best local alignment. The exercise 3 clarifies the principles of the two algorithms. Unfortunately the raw score of an optimal alignment gives no information 1. if the sequences are evolutionary related and 2. about the statistical significance of an alignment score. Analytical solutions exist only for the problem of global and local ungapped alignments. It turns out that in these cases the probability of a score X of a random alignment greater than S is given by P X > S = 1 exp E S = 1 exp ( K m n exp ( λ S))

5 m and n are the length of the sequences that are aligned, K corrects for multiple starting points for local alignments and λ can be thought of as a scale for the substitution scores. If a large database is searched for related proteins the product of the probability given above and the database length gives the number of random alignments to be expected above score S. Further reading Durbin, R., Eddy, S., Krogh, A., Mitchinson, G. Biological sequence analysis Probabilistic models of proteins and nucleic acids, Cambridge University Press, 12 th edition (Chapter 1, , 2.7, 2.8) Pearson, W.R., Protein sequence comparison and Protein Evolution Tutorial ISMB2000, Pearson Group, Dept. of Biochemistry and Molecular Genetics, University of Virginia Pearson, W.R., Guide to the FASTA program package, Pearson Group, Dept. of Biochemistry and Molecular Genetics, University of Virginia Pearson, W.R., Empirical Statistical Estimates for Sequence Similarity Searches, J. Mol. Biol. 276, (1998) Henikoff, S. & Henikoff, J.G., Amino acid substitution matrices from protein blocks, PNAS 89, (1992) Protein structure modelling Comparative modelling with satisfaction of spatial restraints In principle there are three different classes of protein structure prediction approaches, which differ in their requirements and resolution/quality of the model [Baker, D. and Sali, A. 2001]. a.) de novo prediction (resolution approximately 4 to 8 A ) deduce functional sites derive structural similarities o for short sequences up to 80 amino acids o relies solely on force fields o requires large computational resources b.) threading / fold recognition (resolution 3 to 4 A ) determine secondary structures determine conserved domains o no information on less structured regions c.) comparative modelling (resolution 3 to 2 A ) no limited length of sequence site- directed mutagenesis docking studies only modest computational resources required template for X- ray or NMR structure determination

6 o template structure required with at least approx. 30% sequence identity o alignment critical In the following the program MODELLER (A. Sali, salilab.org) is introduced. MODELLER performs comparative modelling with satisfaction of spatial restraints. The question which is answered by MODELLER reads: What is the most probable structure for a sequence given its alignment with related structures? [Sali, A. & Blundell, J. Mol. Biol. 234 (1993)] A protein structure has several characteristic features that can be classified. A feature can be defined as a quantity associated with a certain set of atoms of the protein structure. One feature class comprises stereo- chemical properties on an atomic level bond angles and bond lengths dihedral angles disulphide bonds and angles Lennard- Jones interactions Coulomb interactions On a residue level following feature can be deduced from a structure distance between equivalent backbone atoms main- chain atomic distances main- chain and side- chain dihedral angles neighbouring residues solvent accessibility For each feature a probability density function (pdf) is derived. These functions, their parameters and their dependence of more fundamental protein structure properties were calculated from many homologous protein structures or are known from experiments. Once for each feature of a protein structure a pdf p is known, a pdf for the whole molecular structure can be constructed as a product p = p Here it is assumed that the features are independent of each other. Some feature restraints are independent of the alignment of the target sequence with the templates such as bond lengths. Other feature restraints are dependent of the alignment for example the distance of equivalent C atoms. This molecular probability density function can be used to calculate the most probable structure of an amino acid sequence given its alignment with the sequence(s) of known template structures.

7 For computational reasons the objective function F = ln p is minimized during modelling instead of maximizing the probability of the molecular pdf with respect to the Cartesian coordinates of the atoms. A flow chart of the MODELLER comparative modelling process is given in the online manual ( Further reading Baker, D. and Sali, A. Protein Structure Prediction and Structural Genomics, Science 294, 93 (2001) Sali, A. and Blundell, T.L. Comparative Protein Modelling by Satisfaction of Spatial Restraints, J. Mol. Biol. 234, Shen, M. and Sali, A. Statistical potential for assessment and prediction of protein structures, Protein Science 15, (2006) MODELLER online manual ( Introduction, Automated comparative modelling, Methods pk a calculations PROPKA software The natural environment of proteins is an aqueous solution. The ph- value of such aqueous solutions inside an organism is regulated. Since some amino acids of the proteins are titrable, the ph value of the solution determines the net charge of the protein and the charge state of individual amino acid residues. The charge of the whole protein and/or of sites on the protein surface strongly affects the behaviour of the protein in solution. The charge on certain residues might control protein- ligand or protein- protein interaction. Therefore the ph is crucial to the function of the protein machinery in an organism. The charge of a titrable group of a protein residue at a given ph can only be determined if the pka of this particular group is known. Experimental values of the pka of single/individual amino acids in aqueous solutions are known. But if the amino acid is incorporated into a different environment like a protein structure these pka values can shift. There are several possible contributions Coulomb interactions between charged groups H- bonds desolvation effects These shifts can be calculated from a given structure with the PROPKA software.

8 Further reading Hui Li et al. Very Fast Empirical Prediction and Rationalization of Protein pka Values, PROTEINS 61, (2005) Olsson, M.H.M et al. PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pka Predictions, J. Chem. Theory Comput. (7), (2011) Surface potential calculations Adaptive Poisson- Boltzmann Solver (APBS) software The surface potential of proteins can reveal information about binding sites, aggregation behaviour and stability in solution. The Poisson equation for non- homogenous materials reads ε r φ r = ρ(r) with the position dependent dielectric function ε, charge distribution ρ and electrostatic potential φ. To account for ions in the aqueous environment of proteins the Debye- Hückel model is used. According to this model the distribution n ± of an ion species in a solvent differs from the bulk density n n ± = n exp( U ± /k T) where U ± is the free energy of potential of mean force. The free energy of potential of mean force is approximated by the product of ion charge and the electrostatic potential. This approximation results in the Poisson- Boltzmann equation ε r φ r = ρ r ε r κ r sinh φ r The electrostatic potential is given now in units of k T/e. The term κ r is closely related to the Debye- Hückel screening parameter (up to a multiplicative constant) κ r ~ Ie k Tε r The inverse of the screening parameter is called the Debye length and gives a length scale of the screening of electrostatic interaction in solutions with ions. If the electrostatic potential is small compared to the thermal energy the hyperbolic sine can be approximated and results in the linearized Poisson- Boltzmann equation. These equations above can be solved numerically with different techniques. The APBS uses a finite difference technique.

9 Further reading Honig, B. and Nicholls, A. Classical Electrostatics in Biology and Chemistry, Science 268, (1995) Davis, M.E. and McCammon, J.A. Electrostatics in Biomolecular Structure and Dynamics, Chem. Rev. 90, (1990) Baker, N.A. et al. Electrostatics of nanosystems: Application to microtubules and the ribosome, PNAS 98, (2001) Protein- Protein Docking ATTRACT Many biological processes rely on the formation or decoy of protein complexes. As mentioned above the preparation of crystals of single proteins for X- ray structure determination is challenging. Even more challenging is the crystallization of proteins in a particular complex state. The docking program ATTRACT takes the following approach to propose reasonable protein complex geometries. In a first step the complexity of the problem is reduced. The atoms of a high resolution protein structure are replaced by pseudo atoms. The protein backbone atoms are substituted with two pseudo atoms: one at the nitrogen and the other at the carbonyl oxygen site. The short side- chains (Ala, Ser, Thr, Val, Leu, Ile, Asn, Asp) are represented by one pseudo atom with position at the geometric centre of all side- chain heavy atoms. The large side- chains (Arg, Lys, Glu, Gln, His, Met, Phe, Tyr, Trp) are built of two pseudo atoms. The first is positioned in the middle between the C and C atoms of the side- chain. The second is placed at the geometric centre of the remaining heavy atoms. To each pseudo atom a radius R and a van der Waals interaction parameter A is assigned. Additionally the acidic and basic residues receive a charge of - /+ 1. For example: Alanin: R " = A, A " = 1 [RT. ]; Tryptophan: pseudo atom 1 R "# = A, A "# = 1.5 [RT. ]; pseudo atom 2 R "# = A, A "# = 2.6 [RT. ]. On the one hand this simplification saves computational time and one the other hand this procedure leads to a smoothening of the protein surface. Since the most probable binding geometry of two proteins has to be determined, it is necessary to generate and evaluate many different contact orientations of the two proteins. So in a next step the position of one protein (called receptor) is kept fixed while the other protein (called ligand) is placed in many different orientations (relative to the receptor) around the receptor. This results in many (in the order of 10 ) different geometries of the problem. Each of the positions of the two proteins described above is used as a starting orientation for an energy minimization. The interaction energy between the two proteins is stated as

10 V r " = A A (R + R ) r " A A R + R r " 2V r "# + A A R + R r " A A R + R r " + A A R + R r " + + A A R + R r " q q ε r " r " ; attractive residues q q ε r " r " ; repulsive res. ; r " > r "# + q q ε r " r " ; rep. ; r " r "# The attractive interaction is described as the sum of a soft (6-8) Lennard- Jones potential and Coulomb potential. The latter is screened with a distance dependent dielectric function ε r " = 15 r ". The repulsive interaction is split into two parts. Both parts lead to a repulsive interaction at any distance but with a saddle point at r "#. After the potential has been calculated the energy is minimized with respect to the translational and rotational degrees of freedom of the ligand (the receptor is kept fixed). This is usually done in several consecutive minimization procedures. In a rigid docking approach the surface side chains cannot react to the approach of the ligand although it is known that the surface side chains can rearrange during complex formation. To account for such behaviour several copies of large surface side chains with different dihedral angles are evaluated during the energy minimization. The energetically most favourable side chain rotamer is maintained in the complex structure. Finally all of the calculated complexes are binned. Two complexes have an equal energy minimum if their ligand RMSd value is less than 0.2A. Then the complexes are ranked according to their energy. Further reading Zacharias, M. Protein- protein docking with a reduced protein model accounting for side- chain flexibility, Protein Science 12, (2003) Zacharias, M. EMBO Practical Course Protein- protein docking with ATTRACT using a reduced protein model (2008) Zacharias, M. ATTRACT Protein- Protein Docking in CAPRI Using a Reduced Protein Model, PROTEINS 60, (2005)

11 Tasks 1. Summarise the properties of the protein β- lactoglobulin (short: blg). See e.g. Kontopidis, G. et al. Invited Review: b- Lactoglobulin: Binding Properties, Structure and Function, Journal of Dairy Science 87, (2004) 2. Consider the two following groups of amino acids: (D, E, K) and (V, I, L). a. What are their physico- chemical differences? b. What is the average BLOSUM62 score in the two groups? c. What is the average BLOSUM62 score between the two groups? d. What might be a cause for the results? 3. Perform an alignment of the sequence GNYLW and DDGRW manually using the Smith- Waterman algorithm and the Needleman- Wunsch algorithm as described in Durbin et al. 4. Search homologous structures to the protein sequence of blg of the Eastern grey kangaroo (UniProt identifier: P11944) using the Smith- Waterman and Needleman- Wunsch algorithm implementation in the FASTA program package. a. Comment the results. b. Choose a template for modelling of the target sequence. c. Make an alignment of the target and the template sequence using the Smith- Waterman algorithm and the Needleman- Wunsch algorithm. Compare the results. 5. Modelling of the protein a. Summarize the basic principle of MODELLER b. Use the alignment from 4c) and the structure from 4b) to compute models for P11944 c. Analyse the models using i. restraint violations ii. DOPE potential iii. RMSd values iv. PROCHECK software (Laskowski, R.A., J. Appl. Cryst. 26, ,1993) 6. Use PROPKA to predict the pka values of the amino acids of the protein model a. Summarize the basic principle of PROPKA b. Comment the results 7. Surface potential calculations with APBS a. Summarize the basic principle of APBS b. Perform surface potential calculations for two appropriate ph values. Choose the ph values according to the dimerization behaviour of blg. c. Perform surface potential calculations for two ionic strengths.

12 d. Comment the results. 8. Systematic docking with ATTRACT a. Summarize the basic principle of ATTRACT. b. Calculate possible dimer structures of your protein model. c. Analyse the interface with VMD. Look at the amino acid composition and area of the interface. d. Compare your results with the dimer of a blg crystal structure. last revision 15/10/12 (ml)

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

More information

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell

Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Using Higher Calculus to Study Biologically Important Molecules Julie C. Mitchell Mathematics and Biochemistry University of Wisconsin - Madison 0 There Are Many Kinds Of Proteins The word protein comes

More information

Protein Structures: Experiments and Modeling. Patrice Koehl

Protein Structures: Experiments and Modeling. Patrice Koehl Protein Structures: Experiments and Modeling Patrice Koehl Structural Bioinformatics: Proteins Proteins: Sources of Structure Information Proteins: Homology Modeling Proteins: Ab initio prediction Proteins:

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

More information

Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix. 16 march 2017 Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013 Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

More information

Building 3D models of proteins

Building 3D models of proteins Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Properties of amino acids in proteins

Properties of amino acids in proteins Properties of amino acids in proteins one of the primary roles of DNA (but not the only one!) is to code for proteins A typical bacterium builds thousands types of proteins, all from ~20 amino acids repeated

More information

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013

Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India. 1 st November, 2013 Hydration of protein-rna recognition sites Ranjit P. Bahadur Assistant Professor Department of Biotechnology Indian Institute of Technology Kharagpur, India 1 st November, 2013 Central Dogma of life DNA

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy 7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies

More information

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

More information

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55 Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise

More information

CSE 549: Computational Biology. Substitution Matrices

CSE 549: Computational Biology. Substitution Matrices CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27 Acta Cryst. (2014). D70, doi:10.1107/s1399004714021695 Supporting information Volume 70 (2014) Supporting information for article: Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase

More information

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement

More information

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011

8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 8 Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 18, 2011 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

More information

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm Lecture 2, 12/3/2003: Introduction to sequence alignment The Needleman-Wunsch algorithm for global sequence alignment: description and properties Local alignment the Smith-Waterman algorithm 1 Computational

More information

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009

8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 8 Grundlagen der Bioinformatik, SS 09, D. Huson, April 28, 2009 2 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance and alignment 4. The number

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

schematic diagram; EGF binding, dimerization, phosphorylation, Grb2 binding, etc.

schematic diagram; EGF binding, dimerization, phosphorylation, Grb2 binding, etc. Lecture 1: Noncovalent Biomolecular Interactions Bioengineering and Modeling of biological processes -e.g. tissue engineering, cancer, autoimmune disease Example: RTK signaling, e.g. EGFR Growth responses

More information

7.012 Problem Set 1 Solutions

7.012 Problem Set 1 Solutions ame TA Section 7.012 Problem Set 1 Solutions Your answers to this problem set must be inserted into the large wooden box on wheels outside 68120 by 4:30 PM, Thursday, September 15. Problem sets will not

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Lecture 15: Realities of Genome Assembly Protein Sequencing

Lecture 15: Realities of Genome Assembly Protein Sequencing Lecture 15: Realities of Genome Assembly Protein Sequencing Study Chapter 8.10-8.15 1 Euler s Theorems A graph is balanced if for every vertex the number of incoming edges equals to the number of outgoing

More information

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran Geometrical Concept-reduction in conformational space.and his Φ-ψ Map G. N. Ramachandran Communication paths in trna-synthetase: Insights from protein structure networks and MD simulations Saraswathi Vishveshwara

More information

Modeling for 3D structure prediction

Modeling for 3D structure prediction Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Pairwise & Multiple sequence alignments

Pairwise & Multiple sequence alignments Pairwise & Multiple sequence alignments Urmila Kulkarni-Kale Bioinformatics Centre 411 007 urmila@bioinfo.ernet.in Basis for Sequence comparison Theory of evolution: gene sequences have evolved/derived

More information

Pairwise sequence alignment

Pairwise sequence alignment Department of Evolutionary Biology Example Alignment between very similar human alpha- and beta globins: GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKL G+ +VK+HGKKV A+++++AH+D++ +++++LS+LH KL GNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKL

More information

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Viewing and Analyzing Proteins, Ligands and their Complexes 2 2 Viewing and Analyzing Proteins, Ligands and their Complexes 2 Overview Viewing the accessible surface Analyzing the properties of proteins containing thousands of atoms is best accomplished by representing

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming

20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, Global and local alignment of two sequences using dynamic programming 20 Grundlagen der Bioinformatik, SS 08, D. Huson, May 27, 2008 4 Pairwise alignment We will discuss: 1. Strings 2. Dot matrix method for comparing sequences 3. Edit distance 4. Global and local alignment

More information

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS

PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS TASKQUARTERLYvol.20,No4,2016,pp.353 360 PROTEIN-PROTEIN DOCKING REFINEMENT USING RESTRAINT MOLECULAR DYNAMICS SIMULATIONS MARTIN ZACHARIAS Physics Department T38, Technical University of Munich James-Franck-Str.

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary

What makes a good graphene-binding peptide? Adsorption of amino acids and peptides at aqueous graphene interfaces: Electronic Supplementary Electronic Supplementary Material (ESI) for Journal of Materials Chemistry B. This journal is The Royal Society of Chemistry 21 What makes a good graphene-binding peptide? Adsorption of amino acids and

More information

Protein Fragment Search Program ver Overview: Contents:

Protein Fragment Search Program ver Overview: Contents: Protein Fragment Search Program ver 1.1.1 Developed by: BioPhysics Laboratory, Faculty of Life and Environmental Science, Shimane University 1060 Nishikawatsu-cho, Matsue-shi, Shimane, 690-8504, Japan

More information

Computational Biology

Computational Biology Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

More information

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells?

7.012 Problem Set 1. i) What are two main differences between prokaryotic cells and eukaryotic cells? ame 7.01 Problem Set 1 Section Question 1 a) What are the four major types of biological molecules discussed in lecture? Give one important function of each type of biological molecule in the cell? b)

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007 Softwares for Molecular Docking Lokesh P. Tripathi NCBS 17 December 2007 Molecular Docking Attempt to predict structures of an intermolecular complex between two or more molecules Receptor-ligand (or drug)

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Bioinformatics. Macromolecular structure

Bioinformatics. Macromolecular structure Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain

More information

In-Depth Assessment of Local Sequence Alignment

In-Depth Assessment of Local Sequence Alignment 2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices. Shifra Ben-Dor Irit Orr Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Supplementary figure 1. Comparison of unbound ogm-csf and ogm-csf as captured in the GIF:GM-CSF complex. Alignment of two copies of unbound ovine

Supplementary figure 1. Comparison of unbound ogm-csf and ogm-csf as captured in the GIF:GM-CSF complex. Alignment of two copies of unbound ovine Supplementary figure 1. Comparison of unbound and as captured in the GIF:GM-CSF complex. Alignment of two copies of unbound ovine GM-CSF (slate) with bound GM-CSF in the GIF:GM-CSF complex (GIF: green,

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM Homology modeling Dinesh Gupta ICGEB, New Delhi Protein structure prediction Methods: Homology (comparative) modelling Threading Ab-initio Protein Homology modeling Homology modeling is an extrapolation

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Central Dogma. modifications genome transcriptome proteome

Central Dogma. modifications genome transcriptome proteome entral Dogma DA ma protein post-translational modifications genome transcriptome proteome 83 ierarchy of Protein Structure 20 Amino Acids There are 20 n possible sequences for a protein of n residues!

More information

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two Supplementary Figure 1. Biopanningg and clone enrichment of Alphabody binders against human IL 23. Positive clones in i phage ELISA with optical density (OD) 3 times higher than background are shown for

More information

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

More information

Sequence analysis and Genomics

Sequence analysis and Genomics Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

More information

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain.

Biochemistry Quiz Review 1I. 1. Of the 20 standard amino acids, only is not optically active. The reason is that its side chain. Biochemistry Quiz Review 1I A general note: Short answer questions are just that, short. Writing a paragraph filled with every term you can remember from class won t improve your answer just answer clearly,

More information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,

More information

DOCKING TUTORIAL. A. The docking Workflow

DOCKING TUTORIAL. A. The docking Workflow 2 nd Strasbourg Summer School on Chemoinformatics VVF Obernai, France, 20-24 June 2010 E. Kellenberger DOCKING TUTORIAL A. The docking Workflow 1. Ligand preparation It consists in the standardization

More information

Major Types of Association of Proteins with Cell Membranes. From Alberts et al

Major Types of Association of Proteins with Cell Membranes. From Alberts et al Major Types of Association of Proteins with Cell Membranes From Alberts et al Proteins Are Polymers of Amino Acids Peptide Bond Formation Amino Acid central carbon atom to which are attached amino group

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

More information

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT-

Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT- SUPPLEMENTARY DATA Molecular dynamics Molecular modeling. A fragment sequence of 24 residues encompassing the region of interest of WT- KISS1R, i.e. the last intracellular domain (Figure S1a), has been

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Computational Protein Design

Computational Protein Design 11 Computational Protein Design This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. 11.1 Introduction Given

More information

Life Science Webinar Series

Life Science Webinar Series Life Science Webinar Series Elegant protein- protein docking in Discovery Studio Francisco Hernandez-Guzman, Ph.D. November 20, 2007 Sr. Solutions Scientist fhernandez@accelrys.com Agenda In silico protein-protein

More information

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

More information

User Guide for LeDock

User Guide for LeDock User Guide for LeDock Hongtao Zhao, PhD Email: htzhao@lephar.com Website: www.lephar.com Copyright 2017 Hongtao Zhao. All rights reserved. Introduction LeDock is flexible small-molecule docking software,

More information

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution

Supplemental Materials for. Structural Diversity of Protein Segments Follows a Power-law Distribution Supplemental Materials for Structural Diversity of Protein Segments Follows a Power-law Distribution Yoshito SAWADA and Shinya HONDA* National Institute of Advanced Industrial Science and Technology (AIST),

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA

SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS. Prokaryotes and Eukaryotes. DNA and RNA SEQUENCE ALIGNMENT BACKGROUND: BIOINFORMATICS 1 Prokaryotes and Eukaryotes 2 DNA and RNA 3 4 Double helix structure Codons Codons are triplets of bases from the RNA sequence. Each triplet defines an amino-acid.

More information

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel ) Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Receptor Based Drug Design (1)

Receptor Based Drug Design (1) Induced Fit Model For more than 100 years, the behaviour of enzymes had been explained by the "lock-and-key" mechanism developed by pioneering German chemist Emil Fischer. Fischer thought that the chemicals

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

More information

Administration. ndrew Torda April /04/2008 [ 1 ]

Administration. ndrew Torda April /04/2008 [ 1 ] ndrew Torda April 2008 Administration 22/04/2008 [ 1 ] Sprache? zu verhandeln (Englisch, Hochdeutsch, Bayerisch) Selection of topics Proteins / DNA / RNA Two halves to course week 1-7 Prof Torda (larger

More information

Lysozyme pka example - Software. APBS! >!Examples! >!pka calculations! >! Lysozyme pka example. Background

Lysozyme pka example - Software. APBS! >!Examples! >!pka calculations! >! Lysozyme pka example. Background Software Search this site Home Announcements An update on mailing lists APBS 1.2.0 released APBS 1.2.1 released APBS 1.3 released New APBS 1.3 Windows Installer PDB2PQR 1.7.1 released PDB2PQR 1.8 released

More information

Molecular Structure Prediction by Global Optimization

Molecular Structure Prediction by Global Optimization Molecular Structure Prediction by Global Optimization K.A. DILL Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, CA 94118 A.T. PHILLIPS Computer Science

More information

Build_model v User Guide

Build_model v User Guide Build_model v.2.0.1 User Guide MolTech Build_model User Guide 2008-2011 Molecular Technologies Ltd. www.moltech.ru Please send your comments and suggestions to contact@moltech.ru. Table of Contents Input

More information

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov

Read more about Pauling and more scientists at: Profiles in Science, The National Library of Medicine, profiles.nlm.nih.gov 2018 Biochemistry 110 California Institute of Technology Lecture 2: Principles of Protein Structure Linus Pauling (1901-1994) began his studies at Caltech in 1922 and was directed by Arthur Amos oyes to

More information

BIOINFORMATICS: An Introduction

BIOINFORMATICS: An Introduction BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and

More information

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS

PROTEIN SECONDARY STRUCTURE PREDICTION: AN APPLICATION OF CHOU-FASMAN ALGORITHM IN A HYPOTHETICAL PROTEIN OF SARS VIRUS Int. J. LifeSc. Bt & Pharm. Res. 2012 Kaladhar, 2012 Research Paper ISSN 2250-3137 www.ijlbpr.com Vol.1, Issue. 1, January 2012 2012 IJLBPR. All Rights Reserved PROTEIN SECONDARY STRUCTURE PREDICTION:

More information