Biol Introduction to Bioinformatics

Size: px
Start display at page:

Download "Biol Introduction to Bioinformatics"

Transcription

1 Biol Introduction to Bioinformatics Schedule Week Nov 15 Nov Reading: Ch Ch for next week Ch 14.4 Monday Protein energetics/dynamics Wednesday Homology-based modeling Friday Homology-based modeling, protein families 1 Biol47800/59500 Bioinformatics

2 Motivation Secondary structure prediction, independent of its accuracy, doesn't tell you what the three dimensional structure is. It is difficult or impossible to even go from KNOWN secondary structure elements to the three dimensional structure. What then can one do?? Often If a structure is known, one can reasonably accurately "predict" or model the three-dimensional structure of homologous proteins. The 3D structure database (PDB) is growing exponentially, the same as the other databases - many homologous structures are available, perhaps 50% of all sequences. Structural genomics (high-throughput structure solution) is increasing the number of sequences for which this is possible 2 Biol47800/59500 Bioinformatics

3 Protein Energetics Proteins exist at or very near their minimum free energy conformations Crystallographic structures may be slightly stressed due to crystal contacts and solution conditions Folding is generally rapid, and often does not require any assistance Anfinsen experiments Chaperones Proteins are not solid rocks they exhibit thermal motions which are important in conformational change 3 Biol47800/59500 Bioinformatics

4 Molecular dynamics Beta lactamase with water green fluorescent protein neuraminidase + tamiflu GLFG backbone 100 ps MHC Protein Glucocorticoid receptor/dna glucosamine deaminase blood clotting protein binding to membrane Protein and DOPC bilayer, 50 nsec DADME binding to Polynucleotide phosphorylase 4 Biol47800/59500 Bioinformatics

5 Molecular Dynamics MD simulations generally begins where experimental structure determination leaves off, if not during the structure refinement itself. MD is generally not used to predict structure from sequence nor to model the protein folding pathways. MD simulations can fold extended sequences to global potential energy minima, ONLY for very small systems (peptide length ten, or so, in vacuum) MD is most commonly used to simulate the dynamics of known structures. 5 Biol47800/59500 Bioinformatics

6 Molecular Dynamics Proteins are flexible and rapidly fluctuating Molecules Classification of Motions: Times (log sec) Distances (Angstroms) Atomic fluctuations -15 to -11 ~ 1 A Vibrations of individual bonds Collective Motions: - 12 to -30 ~ 10 A Groups of Atoms (AA Side Chains, Protein Motif or Domain, RNA Base, ) Triggered Conformational Changes: -9 to +31 ~ 100 A Motion is Response to Stimulus Correct Structural Template (H bonding, dis bridges, solvent accessibility, etc) 6 Biol47800/59500 Bioinformatics

7 Molecular Dynamics Energy minimization atomic coordinates and potential energy (force field) incrementally change coordinates according to force field (descent to lowest energy) Molecular Dynamics include velocities incrementally change atomic coordinates using numerical solutions of time-dependent equations of motion for atoms (F=ma) Result - Simulated trajectory through time of positions and momenta of all atoms of the molecule - explore conformational space in time 7 Biol47800/59500 Bioinformatics

8 Molecular Dynamics Basic Computational Approach: Begin with Initial Atomic Coordinates Calculate the Potential Energy (U) of the system (force field) This gives the force on each atom Force is the negative derivative of potential energy F = - du/dt Sum of forces on each atom gives acceleration Let molecules move for a very short time (femtoseconds) Recalculate energy 8 Biol47800/59500 Bioinformatics

9 Molecular Dynamics Force Field = empirical energy functions treats large molecules essentially as spheres and springs, with resulting following potential energy terms: E empirical = S E bond + S E angle + S E dihedral + S E VDW + S E elec where: E bond = S k b (r - r 0 ) 2 E angle = S k q (q-q 0 ) 2 + k(r 13 -r 0 ) 2 E dihedral = S S k f (1 + cos(n f f+d)) E VDW = A/r 12 - B/r 6 (Lennard-Jones potential) E elec = q i q j /r 2 9 Biol47800/59500 Bioinformatics

10 k b (r - r 0 ) 2 Local Local k q (q-q 0 ) 2 + k(r 13 -r 0 ) 2 k f (1 + cos(n f f+d)) Long-range Non-local q i q j /r 2 A/r 12 - B/r 6 10 Biol47800/59500 Bioinformatics

11 Molecular Dynamics - Energy Minimization General Optimization Methods Iterative Descent Method Change each atomic coordinate by a small descent step size, in direction of the force acting on the atom Recalculate potential energy from the new atomic coordinates Recalculate descent step direction from the new potential energy Iterate this procedure, varying the descent step size as needed Stop when a minimum in the potential energy is reached (can not proceed in any direction without increasing potential energy) Conjugate Gradient Method Similar to Iterative Descent Method BUT Each new descent step direction is based on previous directions as well as the current force Changes in direction less abrupt Convergence is faster 11 Biol47800/59500 Bioinformatics

12 Molecular Dynamics - Energy Minimization Energy calculations are used to solve Newton's equation of motion, i.e., F = ma = - E empirical These calculations yield an acceleration and velocity for each atom Very small time steps, about 1 femtosecond ( sec) To minimize energy, most common to use "simulated annealing" "Heat" molecule to get high thermal motion which samples conformational space Slowly "cool" to find minimum energy, hopefully a global minimum SA will only move a structure a small distance from the starting point, perhaps 1-2 Å 12 Biol47800/59500 Bioinformatics

13 Molecular Dynamics - Energy Minimization Computationally Intensive Requires 10,000s of energy evaluations and 1000s of steps of dynamics to minimize energy of a medium size structure This can require hours of supercomputer time Difficult to correctly model solvent effects Hydrophobic effect is important Solvent Bulk solvent model (continuum model) Explicit solvent model insert model in a "box" of water this adds thousands of additional atoms Energy minimization often used to refine a model or structure 13 Biol47800/59500 Bioinformatics Particularly useful with good initial structure, e.g., position of sidechain or

14 Homology-based modeling (Comparative modeling) Prediction of three dimensional structure of a target protein from the amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available. Why a Model: X-ray crystallography (or NMR structures are unavailable or intractable) The model provides a wealth of information of how the protein functions with information at residue property level. This information can than be used for mutational studies or for drug design. 14 Biol47800/59500 Bioinformatics

15 Some Applications of Comparative Modeling: Design mutants to test hypothesis about a proteins function. Identify active sites and binding interfaces Model substrate specificity Protein-protein docking Effects of Coding SNPs (Single Nucleotide Polymorphisms) and other naturally occurring Polymorphisms on Protein Structure 15 Biol47800/59500 Bioinformatics

16 Methods Homology-based modeling Match sequence to known structure Change sequence Optimize with MD Fragment-based modeling Match subsequences to structure fragments Optimize with MD Threading Environment based profiles Pseudo-energy fitting 16 Biol47800/59500 Bioinformatics

17 Homology Modeling Flowchart Homology modeling Query Protein Sequence Sequence Database Search Structure Database Search No Hits Hits (Multiple) Sequence Alignment Identify Structurally Conserved Regions Iterative Search PsiBlast/Profiles Model Core SCRs Threading Model Loops Similar hits Fold Recognition Model Sidechains Secondary Structure Prediction Evaluate Model(s) Energy Minimization 17 Biol47800/59500 Bioinformatics

18 Quality of Known structures What is a good 3-dimensional structure? 6 Å resolution or so - secondary structure often clear, particularly alpha helices Less than 3 Å resolution - one has many errors in side groups 2.5 Å or better - good BUT loops or surface regions may still be disordered Usually must be at least this good for successful homology modeling 2.0 Å or better, very good to excellent, the best structures are below 1.5 Å resolution. Portions may still be invisible. R-factor measures X-ray crystallographic error. R measures difference between observed reflections and reflections predicted from model Should be close to or below 20% Temperature factor - lower is better measures thermal motion temperature factors for well ordered residues are in the 1-15 range. Above 50 means the residue was invisible Main-chain torsion angles reflect quality of structure (sometimes) 18 Biol47800/59500 Bioinformatics Torsion angles are restrained in refinement.

19 Electron Density Maps Two dimensional Three dimensional 19 Biol47800/59500 Bioinformatics

20 Known structures Crystallographic and NMR structures are models Models minimize the difference between observed and calculated data Crystallography: diffraction intensities NMR: coupling between atoms (distance restraints) 20 Biol47800/59500 Bioinformatics

21 Protein Models Stereo images 21 Biol47800/59500 Bioinformatics

22 C a trace 22 Biol47800/59500 Bioinformatics

23 Protein Models NMR Structural Ensemble 23 Biol47800/59500 Bioinformatics

24 How good does structure need to be? 24 Biol47800/59500 Bioinformatics

25 Homology Modeling Assumptions The overall 3-D structure of the target protein is similar to that of related proteins, and particularly the template structure. Regions of conserved sequence have similar structure. Residues conserved throughout a family of proteins are the most structurally conserved. Residues involved in biological activity have similar structure throughout the protein family. Loop regions (non-conserved residues) allow insertions and deletions without disrupting the core structure of the protein. Loop regions are flexible and therefore need not be constructed as strictly as the conserved regions - assuming that they play no role in biological activity. This doesn't apply to proteins whose surface loops play critical roles. 25 Biol47800/59500 Bioinformatics

26 Requirements for Homology-based modeling The query: The amino acid sequence of the protein to be built The template: The high-resolution structure of a homologous protein (AKA reference) Desirables for a Homology Project Additional sequences of related proteins (for multiple sequence alignment) Additional reference protein structures 26 Biol47800/59500 Bioinformatics

27 Steps in Homology Modeling Identify reference/template structures - one or more (the more the better) These will form the template for the target structure (model). Sequence Alignment. The most important step errors made at this point cannot be fixed Use best alignment possible Multiple alignments are usually better than pairwise alignments Proteins with less than ~<30 sequence identity with reference can be problematic Map sequence onto Template Transfer the coordinates from the template(s) to the target of structurally conserved regions (SCR s) Convert template side chains Optimize sidechain orientations with rotamer library Model variable regions: loops and side chains Loop insertions: Search of a high resolution fragment database Deletions: local minimizations may be sufficient. Minimize free energy of model Local - especially loop-hinge regions Global molecular dynamics/energy minimization Evaluate Model 27 Biol47800/59500 Bioinformatics

28 Locating and Aligning Homologs The modeling idea: extrapolate knowledge of related protein structures to a new homologous sequence Can include both related sequences and related 3D structures Approach: alignment procedures and database searches already learned in this course Extend search beyond a single sequence: Multiple alignments, profile analysis, at least consensus sequences or regular expressions Motifs via PROSITE database: regular expressions may be able to model some small regions if not the entire protein Global vs Local alignments: may be able to make separate models for independent domains and duplicated regions 28 Biol47800/59500 Bioinformatics

29 Sequence Similarity and Alignment: Homology modeling is based on using similar structures No similar structures = No Model Need sequence similarity across the whole sequence, not just in one part 40% - amino acid identity or higher is best Below 25% - is less useful but examples of success exist at this level 20% - 35% - sequence identity is often referred to as the «twilight zone» Identify target structure by sequence based search of structure database (PDB) sequences FASTA or BLAST Multiple sequence comparison to improve the sensitivity of the search and identify highly conserved regions. Muscle, HMM, profile, ClustalW2, PSI- 29 Biol47800/59500 Bioinformatics

30 Modeling Structurally Conserved Regions Core regions must be examined for effect of indels Sequence residues are copied into positions of template residues in three-dimensional structure When the template residue is bigger, some empty space is left Nature abhors a vacuum When the template residue is smaller, there is steric conflict some atoms are too close together or maybe even interpenetrating 30 Biol47800/59500 Bioinformatics

31 Sidechain Conflicts Fixing conflicts Amino acid sidechains assume preferred positions (rotamers), which have been tabulated from known structures Computationally try all rotamers for sidechains affected in the region of a conflict Not all problems can be fixed, some require backbone movement Alternative alignments may be desirable 31 Biol47800/59500 Bioinformatics

32 Modeling Variable Regions (Loops) Search structure database for a loops with similar size and anchor points Ab initio Use molecular dynamics/energy minimization to find a plausible structure (energetically reasonable) Structure outside of loop region is not allowed to move Mainly used for very small loops and deletions where endpoints are close 32 Biol47800/59500 Bioinformatics

33 Models tend to stay close to template 1u5b/1qs0 Comparison of experimental model (1u5b) and model template RMSD 1qs o1x Model error by position red = high error 33 Biol47800/59500 Bioinformatics

34 How good are models? cabcii-c4s tetrasaccharide complex pink cabc1 (template) gray cabcii (model) 34 Biol47800/59500 Bioinformatics active site groove seen from the nonreducing and reducing end of the octasaccharide, respectively. The octasaccharide is readily accommodated in the active site of cabci, but the access for the octasaccharide is constricted on the nonreducing end in cabcii. Recombinant Expression, Purification, and Biochemical Characterization of Chondroitinase ABC II from Proteus vulgaris, Prabhakar et al., J.Biol.Chem. 284, , 2009

35 Homology Modeling Fragment assembly method (Rosetta) Start with known structures in PDB Divide up into short fragments 9 residue library 3 residue library For unknown protein, find best 200 three and nine residue fragments at each position (sequence match) Start with protein in fully extended conformation (no steric conflicts) Energies steric repulsion - vdw environment (solvation) - env residue pair interactions pair strand pairing (hydrogen bonding) SS strand arrangement in shhets sheet helix-strand packing HS radius of gyration (compactness) rg Cβ density (compactness) cbeta 35 Biol47800/59500 Bioinformatics

36 Homology Modeling Fragment assembly method (Rosetta) Iterate 28,000 times Choose random 9 residue fragment in model replace torsion angles with one of best from list evaluate energy, keep if better Energy function is very approximate version of MD energy function initially only steric overlap energy is calculated (until all initial torsion angles are replaced) next 2,000 iterations, evaluate all energy terms except compactness, strand pairing weight=0.3 next 20,000 iterations: strand pairing weight=1.0, compactness weight 0.5 last 6,000 iterations: full weights on energies Attempt to improve using 8,000 trials of 3 residue fragment library 36 Biol47800/59500 Bioinformatics

37 Homology Modeling Fragment assembly method (Rosetta) Correct Structure 37 Biol47800/59500 Bioinformatics CASP5 T0135 and T0171

38 How good are models? 38 Biol47800/59500 Bioinformatics

39 Modeling Good or Bad? Proteins whose structure cannot be solved by NMR or X-ray crystallography can still be modeled Modeling takes only a few hours, but 3D structures often take months to years to solve experimentally Accuracy of models can be very good, nearly as good as crystal structures in the best case Can be good enough to generate lead compounds Model can be (need to be) experimentally tested: NMR In vitro mutagenesis 39 Biol47800/59500 Bioinformatics

40 Some sources of errors in comparative models: Errors due to Misalignments: Largest source of error, minimized by constructing multiple alignments No amount of MD will fix these errors Errors in sidechain packing: As sequences diverge, the packing of sidechains in the protein core changes. Backbone movements accommodate sidechain changes Distortions and shifts in correctly aligned regions: In some correctly aligned regions, the template is locally different from the target. Errors in regions without a template: Segments of the target sequence that have no equivalent region in the template structure are the most difficult regions to model (insertions and loops). If insertions are relatively short (less than 9 residues), some methods can correctly predict the conformation of the backbone. Incorrect templates: This is a problem when distantly related proteins are used as templates. Difficult to distinguish between a model based on a incorrect template 40 Biol47800/59500 Bioinformatics

41 Model Evaluation If it was easy to tell a correct model from an incorrect model the modeling process would be easy. One would simply use the "correctness" criterion as the objective function. Unfortunately, there is no completely satisfactory approach. Techniques for evaluation Model geometry Bond lengths, bond angles, dihedral angles, Van der Wals contacts, H bonds Programs used to evaluate the models: VERIFY3D, PROSAII, HARMONY and ANOLEA, and many others Agreement with homologous sequences (multiple alignment, Profile) Conserved regions in core, variable regions at surface Structural templates (3D profiles) Pair potentials (pseudo-energies) 41 Biol47800/59500 Bioinformatics

42 Model Quality Model based on 1qs0 Model based on 2o1x QMEAN score (higher is better) torsion angles pairwise potential solvation secondary structure potential phi/psi agreement solvent accessibility agreement 42 Biol47800/59500 Bioinformatics

43 Model Quality Anolea Atomic Non-Local Environment Assessment Distance based mean force potential Model based on 1qs0 Model based on 2o1x 43 Biol47800/59500 Bioinformatics

44 Homology Modeling Threading/Inverse Folding Methods Try to determine if a sequence is compatible with a known structure Inverse folding predict sequence from 3-D structure Compare to folding predict 3-D structure from sequence Threading imagine pulling the sequence through the known structure until a best match is obtained Threading approaches Local environment methods Characterize each sequence position according to its local three dimensional environment - 3D profile Simple to calculate match Could allow flexibility on variable regions Pseudo-energy methods (Contact potential) optimize pairwise interactions between residues in 3D space Difficult calculation 44 Biol47800/59500 Bioinformatics Ensures that residue-residue interactions approximate real proteins

45 Homology Modeling Local Environment Methods Three-dimensional Profile For each residues in the three dimensional structure, look at the structure type and surrounding residues to infer spectrum of allowed substitutions Secondary structure - alpha, beta or coil Solvent accessibility - buried, partially buried or accessible Hydrogen bonding / sidechain polarity 18 total states Preferred distributions of residues calculated from known structures in PDB probabilities for each of the 20 residues in each environment (observed frequencies are presumed to be optimal) Does not take conservation into account conserved positions use the same distributions as unconserved Align to profile as discussed previously 45 Biol47800/59500 Bioinformatics

46 Homology Modeling Threading - Pseudo-energy Methods Two approaches to threading - soft and hard threading (my terms) Soft threading - move the sequence along the template structure assuming that the interacting residues are the ones in the template Equivalent to local environment method Dynamic programming works Hard threading - move the sequence through the structure, with gaps, calculating all of the interacting pairs Very time consuming (NP-complete) 46 Biol47800/59500 Bioinformatics

47 Homology Modeling Pseudo-energy methods (quasi-energy, statistical potential, empirical energy function, knowledge-based force field) Boltzmann distribution relates probability to energy Z is the partition function that describes the probabilities of all states in system Frequencies at which residue pairs are seen in real structures can be converted to a pseudo-energy Calculate the energies for all residue pairs at all different separations The energy of any three dimensional structure can then be calculated by summing up the energies of all the pairs at the observed distances 47 Biol47800/59500 Bioinformatics

48 Homology Modeling Pseudo-energy Methods (see also fig 13.6 in text) 48 Biol47800/59500 Bioinformatics

49 Homology Modeling Threading Can it find matches that sequence matching cannot? A is dihydrofolate reductase Interacts to form homodimer Contains catalytic site B is kinase SH3 Interacts with other proteins to make protein-protein interactions Structural only DHFR - thick blue Human survival motor protein - grey E. coli biotin holoenzyme - magenta Repressor KotB - green HIV integrase - orange 49 Biol47800/59500 Bioinformatics

50 Example Swiss-model ( Starting sequence: Medicago calcium-dependent protein kinase Contains protein kinase domain and EF-hand Ca binding domain 50 Biol47800/59500 Bioinformatics

51 Example Swiss-model Four templates found 2vn Human calcium/calmodulin depenndent protein kinasse 2qg Cryptosporidium parvum calcium dependent protein kinase 3hx Toxoplasma gondii CDPK1 2aao Arabidopsis thaliana Calcium dependent kinase EF hand region 51 Biol47800/59500 Bioinformatics

52 Example-Swiss-model Alignment and structure assignment for each template (reference structure) Deletions after residues 238, 260, 345 Insertion after 122, Biol47800/59500 Bioinformatics

53 Example Swiss-model Deletions after residues 238, 260, 345 Insertion after 122, 140 Structurally conserved region Add loops Delete extra residues Rotamer optimization 53 Biol47800/59500 Bioinformatics Energy minimization

54 Example-Swiss-model Structure assessment Gromos MD Anolea stat. potential 54 Biol47800/59500 Bioinformatics

55 Example Swiss-model Final model 55 Biol47800/59500 Bioinformatics

56 Protein Analysis Homologs - Fructose bis-phosphate aldolase 56 Biol47800/59500 Bioinformatics

57 Protein Analysis Homology vs Structural Similarity TIM barrel proteins One of the most common protein folds (>900 examples) Active site always at C-terminal end of beta-barrel Fructose 1,6-bisphosphate aldolase Homologs Triose phosphate isomerase Probably not a homolog 57 Biol47800/59500 Bioinformatics

58 Protein Analysis Structurally similar? Text - page 569 There are many cases, where a protein shares no or little sequence homology and yet is a functional homolog. While these proteins share a betasandwich architecture, they are connected entirely differently Are they homologs? Polycystin 1 (polycycstic kidney disease protein) a cell surface glycoprotein histone deposition protein 58 Biol47800/59500 Bioinformatics

59 Protein Analysis Structure Classifications SCOP - manual CATH largely automatic 59 Biol47800/59500 Bioinformatics

60 Protein Analysis Structural Similarity Structural similarity is measured by overlap of corresponding residue coordinates Most commonly used measure is RMS coordinate difference (RMSD) RMSD is very sensitive to outliers (car door effect) Problem is how to find which residues correspond DALI / FSSP Matches secondary structure elements regardless of connectivity CE (combinatorial extension) Builds up from small matching pieces, according to connectivity VAST Secondary structure orientation and connectivity Not clear which is best, not clear how to evaluate significance since completely unrelated structures are unavailable 60 Biol47800/59500 Bioinformatics

61 Protein Analysis Protein Folds 61 Biol47800/59500 Bioinformatics

62 Protein Analysis The protein structure universe total yearly How many protein folds are there? Are there certain kinds of folds that are more stable? How do you detect structural similarity? 62 Biol47800/59500 Bioinformatics

63 Protein Families Protein families - groups of homologous molecules superfamily, family, subfamily classification introduced by Dayhoff homeologous family families are seen both across and within species Structural classes / Folds - similar structures based on 3-dimensional coordinates may not be homologous - not clear to what extent certain structures are preferred by chance only recently becoming populated Domain Sequence or structure based independently folding unit Families are important for information mapping because they give a guide to how much variation is expected between homologous proteins that maintain similar (or have different) function. 63 Biol47800/59500 Bioinformatics

64 Protein Families Dayhoff Protein Classification Hierarchical classification Folds: Structural similarity Superfamilies: P < 10-3 Highly probable homology Superfamilies generally are entire sequences (homeomorphic family) Newer concept is homology domain - only part of sequence Families: > 50% identical (~E<10-30 ) Clear homology Similar function Substrates and function similar but not identical Subfamilies: >80% identical (~E<10-80 ) Identical function Probably bind nearly identical substrates 64 Biol47800/59500 Bioinformatics

65 Protein Families Clusters of Orthologous Groups COGs & KOGs genomes, 38 orders, 28 classes 14 phyla (192,987 proteins) prokaryotic (COGs) 5666 eukaryotic (KOGs) 4852 Originally (1997), 3307 COGs were delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain % of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. 65 Biol47800/59500 Bioinformatics

66 Protein Families COGs 1. Perform the all-against-all protein sequence comparison. 2. Detect and collapse obvious paralogs, that is, proteins from the same genome that are more similar to each other than to any proteins from other species. 3. Detect triangles of mutually consistent, genome-specific best hits (BeTs), taking into account the paralogous groups detected at step Merge triangles with a common side to form COGs. 5. A case-by-case analysis of each COG. This analysis serves to eliminate false-positives and to identify groups that contain multidomain proteins by examining the pictorial representation of the BLAST search outputs. The sequences of detected multidomain proteins are split into single-domain segments and steps 1 4 are repeated with these sequences, which results in the assignment of individual domains to COGs in accordance with their distinct evolutionary affinities. 6. Examination of large COGs that include multiple members from all or several of the genomes using phylogenetic trees, cluster analysis and visual inspection of alignments; as a result, some of these groups are split into two or more smaller ones that are included in the final set of COGs. 66 Biol47800/59500 Bioinformatics

67 Protein Families COGs & KOGS How well do COGs cover complete genomes? Phyletic patterns of COGs Phyletic patterns of KOGs 67 Biol47800/59500 Bioinformatics

68 Protein Families COGs 68 Biol47800/59500 Bioinformatics

69 Protein Families EggNOG Automatic COGs 630 genomes 529 bacteria 46 archaea 55 eukarya 224,847 Ogs 9724 extended versions of original COG and KOG Green = function annotated Orange = unannotated Gray = no match 69 Biol47800/59500 Bioinformatics

70 Protein Families Structural classifications SCOP Heuristic classification according to traditional crystallographic ideas Recently used as a standard for sequence comparisons v1.75, June PDB Entries Domains. CATH Systematic semi-automatic procedure with more clearly defined process Version 3.3.0, July ,625 PDB chains, 128,688 domains 70 Biol47800/59500 Bioinformatics

71 Protein Families SCOP Primarily manually curated according to traditional crystallographic ideas Family: Clear evolutionarily relationship Generally, pairwise residue identities greater than 30%. In some cases, similar functions and structures provide definitive evidence of common descent in the absence of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Low sequence identity, but structural and functional features suggest a common evolutionary origin. For example, actin, the ATPase domain of the heat shock protein, and hexokinase together form a superfamily. Fold: Major structural similarity Major secondary structures in same arrangement and topology. Proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. Proteins with a common fold may not have a common evolutionary origin: the structural similarities could arise from physical-chemical properties of proteins that 71 Biol47800/59500 Bioinformatics

72 Protein Families SCOP - SCOP 1.75A statistics: PDB entries (released/updated prior to ) Domains. 1 Literature reference. Class Number Number of of folds superfamilies Number of families a: All alpha proteins b: All beta proteins c: Alpha and beta proteins (a/b) d: Alpha and beta proteins (a+b) e: Multi-domain proteins (alpha and beta) f: Membrane and cell surface proteins and peptides g: Small proteins Totals Biol47800/59500 Bioinformatics

73 Protein Families SCOP Class - All Alpha Proteins Globin-like (2) (Globins and Phycocyanins) core: 6 helices; folded leaf, partly opened; Long alpha-hairpin (11) 2 helices; antiparallel hairpin, left-handed twist Cytochrome c (1) core: 3 helices; folded leaf, opened; DNA-binding 3-helical bundle (10) core: 3-helices; bundle, closed or partly opened, right-handed twist; upand down Many more Biol47800/59500 Bioinformatics

74 Protein Families CATH Classification v 3.5.0, September 2011 CATH is more formally specified and less reliant on human intervention than SCOP CATH ,536 domains 2,626 superfamilies 51,334 PDB entries 74 Biol47800/59500 Bioinformatics

75 Protein Families CATH Classification Class Determined according to the secondary structure composition and packing within the structure. Assigned automatically using the method of Michie et al. (1996). Architecture The overall shape of the domain structure as determined by the orientations of the secondary structures; ignores the connectivity between the secondary structures. Assigned manually Topology Fold families at this level depend on both the overall shape and connectivity of the secondary structures. This is done using the structure comparison algorithm SSAP (Taylor & Orengo, 1989). Homologous Superfamily Similarities are identified first by sequence comparisons and subsequently by structure comparison using SSAP. Criteria: Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20%, 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller, and domains have related functions Sequence Families Domains clustered in the same sequence families have sequence identities 75 Biol47800/59500 Bioinformatics

76 Protein Families 2010 CATH Classification Biol47800/59500 Bioinformatics

77 Protein Docking Finding binding sites Proteins with unknown function Conserved surface areas Hydrophobic surface area Highly charged areas (mostly for nucleic acid binding) Active sites usually in pockets Proteins with known partners docking Rotate and translate in all possible orientations Use scoring function to evaluate match Charge Shape Hydrophobicity How should you deal with flexibility of protein/induced fit 77 Biol47800/59500 Bioinformatics

78 Protein Docking Conformational Search (text: Ch 14) Given two proteins with three-dimensional structures, how do they bind? Hold one fixed Rotate and translate the other 3 angles, 10º increments = 23,000 positions 3 translational parameters, 100Å at 0.5Å intervals = 8 x 10 6 positions Total = 2 x positions to consider All docking methods use approximations What is a good position? Electrostatic interactions Steric interactions Solvent effects 78 Biol47800/59500 Bioinformatics

79 Protein Docking Search Methods Monte Carlo (Metropolis) methods Most common Start in a random position Calculate approximate energy Make a random move Accept the move probabilistically based on energy difference Often merged with genetic algorithm Consider many random starting positions (each is a genome) Each random modification is a mutation Fitness is energy Examples Gold (see text), Autodock 79 Biol47800/59500 Bioinformatics

80 Protein Docking Search Methods Other methods Point complementarity Distance Geometry Tabu search CAPRI Critical Assessment of Protein Interaction Docking contest (like CASP) 80 Biol47800/59500 Bioinformatics

81 Protein Docking Quality Is it a good fit (scoring function) MD energy models (force fields) from MD programs such as CHARMM, AMBER, Gromos Time consuming to calculate Approximate models usually focusing on electrostatics, and atomic overlap Statistical potentials (pseudo energies, knowledge-based scoring) Problems both molecules can move to accommodate binding (induced fit) water Water in binding site may be bound and act as a part of molecule, or Water may be released resulting in entropy increase ( ΔG = ΔH TΔS ) Flexible docking allows molecules to move 81 Biol47800/59500 Bioinformatics

82 Protein Docking Protein Docking Scoring functions are not that great Trypsin/trypsin inhibitor 2PTC beta trypsin (structure with I) 1TPO beta trypsin (structure without I) Bound structure is often significantly different from free structure Even when binding site is correct, the conformation may still be wrong 2PTC vs inhibitor 82 Biol47800/59500 Bioinformatics

83 Protein Docking 83 Biol47800/59500 Bioinformatics

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Bioinformatics. Macromolecular structure

Bioinformatics. Macromolecular structure Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain

More information

Protein Structure Prediction

Protein Structure Prediction Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

More information

Basics of protein structure

Basics of protein structure Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu

More information

Protein Structure Prediction, Engineering & Design CHEM 430

Protein Structure Prediction, Engineering & Design CHEM 430 Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

Modeling for 3D structure prediction

Modeling for 3D structure prediction Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing

More information

Building 3D models of proteins

Building 3D models of proteins Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier

More information

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Structure: Data Bases and Classification Ingo Ruczinski Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References

More information

Structural Bioinformatics (C3210) Molecular Docking

Structural Bioinformatics (C3210) Molecular Docking Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM

Homology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM Homology modeling Dinesh Gupta ICGEB, New Delhi Protein structure prediction Methods: Homology (comparative) modelling Threading Ab-initio Protein Homology modeling Homology modeling is an extrapolation

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Protein Structure Detection Methods October 30, 2017 Comparative Modeling Comparative modeling is modeling of the unknown based on comparison to what is known In the context of modeling or computing

More information

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769

Dihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Protein Structures. 11/19/2002 Lecture 24 1

Protein Structures. 11/19/2002 Lecture 24 1 Protein Structures 11/19/2002 Lecture 24 1 All 3 figures are cartoons of an amino acid residue. 11/19/2002 Lecture 24 2 Peptide bonds in chains of residues 11/19/2002 Lecture 24 3 Angles φ and ψ in the

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

Heteropolymer. Mostly in regular secondary structure

Heteropolymer. Mostly in regular secondary structure Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!

More information

Molecular Mechanics, Dynamics & Docking

Molecular Mechanics, Dynamics & Docking Molecular Mechanics, Dynamics & Docking Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@uchsc.edu http://compbio.uchsc.edu/hunter

More information

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

More information

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years. Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007 Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target. HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION

THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure

More information

Computational Molecular Modeling

Computational Molecular Modeling Computational Molecular Modeling Lecture 1: Structure Models, Properties Chandrajit Bajaj Today s Outline Intro to atoms, bonds, structure, biomolecules, Geometry of Proteins, Nucleic Acids, Ribosomes,

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Getting To Know Your Protein

Getting To Know Your Protein Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

Protein Structure Determination

Protein Structure Determination Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101

More information

RNA and Protein Structure Prediction

RNA and Protein Structure Prediction RNA and Protein Structure Prediction Bioinformatics: Issues and Algorithms CSE 308-408 Spring 2007 Lecture 18-1- Outline Multi-Dimensional Nature of Life RNA Secondary Structure Prediction Protein Structure

More information

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror

Protein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Review. Membrane proteins. Membrane transport

Review. Membrane proteins. Membrane transport Quiz 1 For problem set 11 Q1, you need the equation for the average lateral distance transversed (s) of a molecule in the membrane with respect to the diffusion constant (D) and time (t). s = (4 D t) 1/2

More information

From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics

From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics BCHS 6229 Protein Structure and Function Lecture 6 (Oct 27, 2011) From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics 1 From Sequence to Function in the

More information

Computational Molecular Biology. Protein Structure and Homology Modeling

Computational Molecular Biology. Protein Structure and Homology Modeling Computational Molecular Biology Protein Structure and Homology Modeling Prof. Alejandro Giorge1 Dr. Francesco Musiani Sequence, function and structure relationships v Life is the ability to metabolize

More information

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007

Softwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007 Softwares for Molecular Docking Lokesh P. Tripathi NCBS 17 December 2007 Molecular Docking Attempt to predict structures of an intermolecular complex between two or more molecules Receptor-ligand (or drug)

More information

D Dobbs ISU - BCB 444/544X 1

D Dobbs ISU - BCB 444/544X 1 11/7/05 Protein Structure: Classification, Databases, Visualization Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses

More information

Computational protein design

Computational protein design Computational protein design There are astronomically large number of amino acid sequences that needs to be considered for a protein of moderate size e.g. if mutating 10 residues, 20^10 = 10 trillion sequences

More information

The protein folding problem consists of two parts:

The protein folding problem consists of two parts: Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

Introduction to Computational Structural Biology

Introduction to Computational Structural Biology Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography

More information

BCH 4053 Spring 2003 Chapter 6 Lecture Notes

BCH 4053 Spring 2003 Chapter 6 Lecture Notes BCH 4053 Spring 2003 Chapter 6 Lecture Notes 1 CHAPTER 6 Proteins: Secondary, Tertiary, and Quaternary Structure 2 Levels of Protein Structure Primary (sequence) Secondary (ordered structure along peptide

More information

Comparing Protein Structures. Why?

Comparing Protein Structures. Why? 7.91 Amy Keating Comparing Protein Structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures classify

More information

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability

Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,

Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules

Bioengineering 215. An Introduction to Molecular Dynamics for Biomolecules Bioengineering 215 An Introduction to Molecular Dynamics for Biomolecules David Parker May 18, 2007 ntroduction A principal tool to study biological molecules is molecular dynamics simulations (MD). MD

More information

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are 1 Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity

More information

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy

7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy 7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.

Protein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1. Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists

More information

A. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility

A. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility (P&S Ch 5; Fer Ch 2, 9; Palm Ch 10,11; Zub Ch 9) A. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility B.

More information

Unfolding CspB by means of biased molecular dynamics

Unfolding CspB by means of biased molecular dynamics Chapter 4 Unfolding CspB by means of biased molecular dynamics 4.1 Introduction Understanding the mechanism of protein folding has been a major challenge for the last twenty years, as pointed out in the

More information

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics

Homology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute

More information

Packing of Secondary Structures

Packing of Secondary Structures 7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller

Chemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics

More information

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009

114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome

More information

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins

Outline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins Margaret Daugherty Fall 2004 Outline Four levels of structure are used to describe proteins; Alpha helices and beta sheets

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

Protein Structure Basics

Protein Structure Basics Protein Structure Basics Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera Importance of Proteins Muscle structure depends on protein-protein interactions Transport across membranes

More information

FlexPepDock In a nutshell

FlexPepDock In a nutshell FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step

More information

Biomolecules: lecture 10

Biomolecules: lecture 10 Biomolecules: lecture 10 - understanding in detail how protein 3D structures form - realize that protein molecules are not static wire models but instead dynamic, where in principle every atom moves (yet

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2 Table 1. Crystallographic data collection, phasing and refinement statistics Native Hg soaked Mn soaked 1 Mn soaked 2 Data collection Space group P2 1 2 1 2 1 P2 1 2 1 2 1 P2 1 2 1 2 1 P2 1 2 1 2 1 Cell

More information

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,

More information