Biol Introduction to Bioinformatics
|
|
- Kenneth Watkins
- 5 years ago
- Views:
Transcription
1 Biol Introduction to Bioinformatics Schedule Week Nov 15 Nov Reading: Ch Ch for next week Ch 14.4 Monday Protein energetics/dynamics Wednesday Homology-based modeling Friday Homology-based modeling, protein families 1 Biol47800/59500 Bioinformatics
2 Motivation Secondary structure prediction, independent of its accuracy, doesn't tell you what the three dimensional structure is. It is difficult or impossible to even go from KNOWN secondary structure elements to the three dimensional structure. What then can one do?? Often If a structure is known, one can reasonably accurately "predict" or model the three-dimensional structure of homologous proteins. The 3D structure database (PDB) is growing exponentially, the same as the other databases - many homologous structures are available, perhaps 50% of all sequences. Structural genomics (high-throughput structure solution) is increasing the number of sequences for which this is possible 2 Biol47800/59500 Bioinformatics
3 Protein Energetics Proteins exist at or very near their minimum free energy conformations Crystallographic structures may be slightly stressed due to crystal contacts and solution conditions Folding is generally rapid, and often does not require any assistance Anfinsen experiments Chaperones Proteins are not solid rocks they exhibit thermal motions which are important in conformational change 3 Biol47800/59500 Bioinformatics
4 Molecular dynamics Beta lactamase with water green fluorescent protein neuraminidase + tamiflu GLFG backbone 100 ps MHC Protein Glucocorticoid receptor/dna glucosamine deaminase blood clotting protein binding to membrane Protein and DOPC bilayer, 50 nsec DADME binding to Polynucleotide phosphorylase 4 Biol47800/59500 Bioinformatics
5 Molecular Dynamics MD simulations generally begins where experimental structure determination leaves off, if not during the structure refinement itself. MD is generally not used to predict structure from sequence nor to model the protein folding pathways. MD simulations can fold extended sequences to global potential energy minima, ONLY for very small systems (peptide length ten, or so, in vacuum) MD is most commonly used to simulate the dynamics of known structures. 5 Biol47800/59500 Bioinformatics
6 Molecular Dynamics Proteins are flexible and rapidly fluctuating Molecules Classification of Motions: Times (log sec) Distances (Angstroms) Atomic fluctuations -15 to -11 ~ 1 A Vibrations of individual bonds Collective Motions: - 12 to -30 ~ 10 A Groups of Atoms (AA Side Chains, Protein Motif or Domain, RNA Base, ) Triggered Conformational Changes: -9 to +31 ~ 100 A Motion is Response to Stimulus Correct Structural Template (H bonding, dis bridges, solvent accessibility, etc) 6 Biol47800/59500 Bioinformatics
7 Molecular Dynamics Energy minimization atomic coordinates and potential energy (force field) incrementally change coordinates according to force field (descent to lowest energy) Molecular Dynamics include velocities incrementally change atomic coordinates using numerical solutions of time-dependent equations of motion for atoms (F=ma) Result - Simulated trajectory through time of positions and momenta of all atoms of the molecule - explore conformational space in time 7 Biol47800/59500 Bioinformatics
8 Molecular Dynamics Basic Computational Approach: Begin with Initial Atomic Coordinates Calculate the Potential Energy (U) of the system (force field) This gives the force on each atom Force is the negative derivative of potential energy F = - du/dt Sum of forces on each atom gives acceleration Let molecules move for a very short time (femtoseconds) Recalculate energy 8 Biol47800/59500 Bioinformatics
9 Molecular Dynamics Force Field = empirical energy functions treats large molecules essentially as spheres and springs, with resulting following potential energy terms: E empirical = S E bond + S E angle + S E dihedral + S E VDW + S E elec where: E bond = S k b (r - r 0 ) 2 E angle = S k q (q-q 0 ) 2 + k(r 13 -r 0 ) 2 E dihedral = S S k f (1 + cos(n f f+d)) E VDW = A/r 12 - B/r 6 (Lennard-Jones potential) E elec = q i q j /r 2 9 Biol47800/59500 Bioinformatics
10 k b (r - r 0 ) 2 Local Local k q (q-q 0 ) 2 + k(r 13 -r 0 ) 2 k f (1 + cos(n f f+d)) Long-range Non-local q i q j /r 2 A/r 12 - B/r 6 10 Biol47800/59500 Bioinformatics
11 Molecular Dynamics - Energy Minimization General Optimization Methods Iterative Descent Method Change each atomic coordinate by a small descent step size, in direction of the force acting on the atom Recalculate potential energy from the new atomic coordinates Recalculate descent step direction from the new potential energy Iterate this procedure, varying the descent step size as needed Stop when a minimum in the potential energy is reached (can not proceed in any direction without increasing potential energy) Conjugate Gradient Method Similar to Iterative Descent Method BUT Each new descent step direction is based on previous directions as well as the current force Changes in direction less abrupt Convergence is faster 11 Biol47800/59500 Bioinformatics
12 Molecular Dynamics - Energy Minimization Energy calculations are used to solve Newton's equation of motion, i.e., F = ma = - E empirical These calculations yield an acceleration and velocity for each atom Very small time steps, about 1 femtosecond ( sec) To minimize energy, most common to use "simulated annealing" "Heat" molecule to get high thermal motion which samples conformational space Slowly "cool" to find minimum energy, hopefully a global minimum SA will only move a structure a small distance from the starting point, perhaps 1-2 Å 12 Biol47800/59500 Bioinformatics
13 Molecular Dynamics - Energy Minimization Computationally Intensive Requires 10,000s of energy evaluations and 1000s of steps of dynamics to minimize energy of a medium size structure This can require hours of supercomputer time Difficult to correctly model solvent effects Hydrophobic effect is important Solvent Bulk solvent model (continuum model) Explicit solvent model insert model in a "box" of water this adds thousands of additional atoms Energy minimization often used to refine a model or structure 13 Biol47800/59500 Bioinformatics Particularly useful with good initial structure, e.g., position of sidechain or
14 Homology-based modeling (Comparative modeling) Prediction of three dimensional structure of a target protein from the amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available. Why a Model: X-ray crystallography (or NMR structures are unavailable or intractable) The model provides a wealth of information of how the protein functions with information at residue property level. This information can than be used for mutational studies or for drug design. 14 Biol47800/59500 Bioinformatics
15 Some Applications of Comparative Modeling: Design mutants to test hypothesis about a proteins function. Identify active sites and binding interfaces Model substrate specificity Protein-protein docking Effects of Coding SNPs (Single Nucleotide Polymorphisms) and other naturally occurring Polymorphisms on Protein Structure 15 Biol47800/59500 Bioinformatics
16 Methods Homology-based modeling Match sequence to known structure Change sequence Optimize with MD Fragment-based modeling Match subsequences to structure fragments Optimize with MD Threading Environment based profiles Pseudo-energy fitting 16 Biol47800/59500 Bioinformatics
17 Homology Modeling Flowchart Homology modeling Query Protein Sequence Sequence Database Search Structure Database Search No Hits Hits (Multiple) Sequence Alignment Identify Structurally Conserved Regions Iterative Search PsiBlast/Profiles Model Core SCRs Threading Model Loops Similar hits Fold Recognition Model Sidechains Secondary Structure Prediction Evaluate Model(s) Energy Minimization 17 Biol47800/59500 Bioinformatics
18 Quality of Known structures What is a good 3-dimensional structure? 6 Å resolution or so - secondary structure often clear, particularly alpha helices Less than 3 Å resolution - one has many errors in side groups 2.5 Å or better - good BUT loops or surface regions may still be disordered Usually must be at least this good for successful homology modeling 2.0 Å or better, very good to excellent, the best structures are below 1.5 Å resolution. Portions may still be invisible. R-factor measures X-ray crystallographic error. R measures difference between observed reflections and reflections predicted from model Should be close to or below 20% Temperature factor - lower is better measures thermal motion temperature factors for well ordered residues are in the 1-15 range. Above 50 means the residue was invisible Main-chain torsion angles reflect quality of structure (sometimes) 18 Biol47800/59500 Bioinformatics Torsion angles are restrained in refinement.
19 Electron Density Maps Two dimensional Three dimensional 19 Biol47800/59500 Bioinformatics
20 Known structures Crystallographic and NMR structures are models Models minimize the difference between observed and calculated data Crystallography: diffraction intensities NMR: coupling between atoms (distance restraints) 20 Biol47800/59500 Bioinformatics
21 Protein Models Stereo images 21 Biol47800/59500 Bioinformatics
22 C a trace 22 Biol47800/59500 Bioinformatics
23 Protein Models NMR Structural Ensemble 23 Biol47800/59500 Bioinformatics
24 How good does structure need to be? 24 Biol47800/59500 Bioinformatics
25 Homology Modeling Assumptions The overall 3-D structure of the target protein is similar to that of related proteins, and particularly the template structure. Regions of conserved sequence have similar structure. Residues conserved throughout a family of proteins are the most structurally conserved. Residues involved in biological activity have similar structure throughout the protein family. Loop regions (non-conserved residues) allow insertions and deletions without disrupting the core structure of the protein. Loop regions are flexible and therefore need not be constructed as strictly as the conserved regions - assuming that they play no role in biological activity. This doesn't apply to proteins whose surface loops play critical roles. 25 Biol47800/59500 Bioinformatics
26 Requirements for Homology-based modeling The query: The amino acid sequence of the protein to be built The template: The high-resolution structure of a homologous protein (AKA reference) Desirables for a Homology Project Additional sequences of related proteins (for multiple sequence alignment) Additional reference protein structures 26 Biol47800/59500 Bioinformatics
27 Steps in Homology Modeling Identify reference/template structures - one or more (the more the better) These will form the template for the target structure (model). Sequence Alignment. The most important step errors made at this point cannot be fixed Use best alignment possible Multiple alignments are usually better than pairwise alignments Proteins with less than ~<30 sequence identity with reference can be problematic Map sequence onto Template Transfer the coordinates from the template(s) to the target of structurally conserved regions (SCR s) Convert template side chains Optimize sidechain orientations with rotamer library Model variable regions: loops and side chains Loop insertions: Search of a high resolution fragment database Deletions: local minimizations may be sufficient. Minimize free energy of model Local - especially loop-hinge regions Global molecular dynamics/energy minimization Evaluate Model 27 Biol47800/59500 Bioinformatics
28 Locating and Aligning Homologs The modeling idea: extrapolate knowledge of related protein structures to a new homologous sequence Can include both related sequences and related 3D structures Approach: alignment procedures and database searches already learned in this course Extend search beyond a single sequence: Multiple alignments, profile analysis, at least consensus sequences or regular expressions Motifs via PROSITE database: regular expressions may be able to model some small regions if not the entire protein Global vs Local alignments: may be able to make separate models for independent domains and duplicated regions 28 Biol47800/59500 Bioinformatics
29 Sequence Similarity and Alignment: Homology modeling is based on using similar structures No similar structures = No Model Need sequence similarity across the whole sequence, not just in one part 40% - amino acid identity or higher is best Below 25% - is less useful but examples of success exist at this level 20% - 35% - sequence identity is often referred to as the «twilight zone» Identify target structure by sequence based search of structure database (PDB) sequences FASTA or BLAST Multiple sequence comparison to improve the sensitivity of the search and identify highly conserved regions. Muscle, HMM, profile, ClustalW2, PSI- 29 Biol47800/59500 Bioinformatics
30 Modeling Structurally Conserved Regions Core regions must be examined for effect of indels Sequence residues are copied into positions of template residues in three-dimensional structure When the template residue is bigger, some empty space is left Nature abhors a vacuum When the template residue is smaller, there is steric conflict some atoms are too close together or maybe even interpenetrating 30 Biol47800/59500 Bioinformatics
31 Sidechain Conflicts Fixing conflicts Amino acid sidechains assume preferred positions (rotamers), which have been tabulated from known structures Computationally try all rotamers for sidechains affected in the region of a conflict Not all problems can be fixed, some require backbone movement Alternative alignments may be desirable 31 Biol47800/59500 Bioinformatics
32 Modeling Variable Regions (Loops) Search structure database for a loops with similar size and anchor points Ab initio Use molecular dynamics/energy minimization to find a plausible structure (energetically reasonable) Structure outside of loop region is not allowed to move Mainly used for very small loops and deletions where endpoints are close 32 Biol47800/59500 Bioinformatics
33 Models tend to stay close to template 1u5b/1qs0 Comparison of experimental model (1u5b) and model template RMSD 1qs o1x Model error by position red = high error 33 Biol47800/59500 Bioinformatics
34 How good are models? cabcii-c4s tetrasaccharide complex pink cabc1 (template) gray cabcii (model) 34 Biol47800/59500 Bioinformatics active site groove seen from the nonreducing and reducing end of the octasaccharide, respectively. The octasaccharide is readily accommodated in the active site of cabci, but the access for the octasaccharide is constricted on the nonreducing end in cabcii. Recombinant Expression, Purification, and Biochemical Characterization of Chondroitinase ABC II from Proteus vulgaris, Prabhakar et al., J.Biol.Chem. 284, , 2009
35 Homology Modeling Fragment assembly method (Rosetta) Start with known structures in PDB Divide up into short fragments 9 residue library 3 residue library For unknown protein, find best 200 three and nine residue fragments at each position (sequence match) Start with protein in fully extended conformation (no steric conflicts) Energies steric repulsion - vdw environment (solvation) - env residue pair interactions pair strand pairing (hydrogen bonding) SS strand arrangement in shhets sheet helix-strand packing HS radius of gyration (compactness) rg Cβ density (compactness) cbeta 35 Biol47800/59500 Bioinformatics
36 Homology Modeling Fragment assembly method (Rosetta) Iterate 28,000 times Choose random 9 residue fragment in model replace torsion angles with one of best from list evaluate energy, keep if better Energy function is very approximate version of MD energy function initially only steric overlap energy is calculated (until all initial torsion angles are replaced) next 2,000 iterations, evaluate all energy terms except compactness, strand pairing weight=0.3 next 20,000 iterations: strand pairing weight=1.0, compactness weight 0.5 last 6,000 iterations: full weights on energies Attempt to improve using 8,000 trials of 3 residue fragment library 36 Biol47800/59500 Bioinformatics
37 Homology Modeling Fragment assembly method (Rosetta) Correct Structure 37 Biol47800/59500 Bioinformatics CASP5 T0135 and T0171
38 How good are models? 38 Biol47800/59500 Bioinformatics
39 Modeling Good or Bad? Proteins whose structure cannot be solved by NMR or X-ray crystallography can still be modeled Modeling takes only a few hours, but 3D structures often take months to years to solve experimentally Accuracy of models can be very good, nearly as good as crystal structures in the best case Can be good enough to generate lead compounds Model can be (need to be) experimentally tested: NMR In vitro mutagenesis 39 Biol47800/59500 Bioinformatics
40 Some sources of errors in comparative models: Errors due to Misalignments: Largest source of error, minimized by constructing multiple alignments No amount of MD will fix these errors Errors in sidechain packing: As sequences diverge, the packing of sidechains in the protein core changes. Backbone movements accommodate sidechain changes Distortions and shifts in correctly aligned regions: In some correctly aligned regions, the template is locally different from the target. Errors in regions without a template: Segments of the target sequence that have no equivalent region in the template structure are the most difficult regions to model (insertions and loops). If insertions are relatively short (less than 9 residues), some methods can correctly predict the conformation of the backbone. Incorrect templates: This is a problem when distantly related proteins are used as templates. Difficult to distinguish between a model based on a incorrect template 40 Biol47800/59500 Bioinformatics
41 Model Evaluation If it was easy to tell a correct model from an incorrect model the modeling process would be easy. One would simply use the "correctness" criterion as the objective function. Unfortunately, there is no completely satisfactory approach. Techniques for evaluation Model geometry Bond lengths, bond angles, dihedral angles, Van der Wals contacts, H bonds Programs used to evaluate the models: VERIFY3D, PROSAII, HARMONY and ANOLEA, and many others Agreement with homologous sequences (multiple alignment, Profile) Conserved regions in core, variable regions at surface Structural templates (3D profiles) Pair potentials (pseudo-energies) 41 Biol47800/59500 Bioinformatics
42 Model Quality Model based on 1qs0 Model based on 2o1x QMEAN score (higher is better) torsion angles pairwise potential solvation secondary structure potential phi/psi agreement solvent accessibility agreement 42 Biol47800/59500 Bioinformatics
43 Model Quality Anolea Atomic Non-Local Environment Assessment Distance based mean force potential Model based on 1qs0 Model based on 2o1x 43 Biol47800/59500 Bioinformatics
44 Homology Modeling Threading/Inverse Folding Methods Try to determine if a sequence is compatible with a known structure Inverse folding predict sequence from 3-D structure Compare to folding predict 3-D structure from sequence Threading imagine pulling the sequence through the known structure until a best match is obtained Threading approaches Local environment methods Characterize each sequence position according to its local three dimensional environment - 3D profile Simple to calculate match Could allow flexibility on variable regions Pseudo-energy methods (Contact potential) optimize pairwise interactions between residues in 3D space Difficult calculation 44 Biol47800/59500 Bioinformatics Ensures that residue-residue interactions approximate real proteins
45 Homology Modeling Local Environment Methods Three-dimensional Profile For each residues in the three dimensional structure, look at the structure type and surrounding residues to infer spectrum of allowed substitutions Secondary structure - alpha, beta or coil Solvent accessibility - buried, partially buried or accessible Hydrogen bonding / sidechain polarity 18 total states Preferred distributions of residues calculated from known structures in PDB probabilities for each of the 20 residues in each environment (observed frequencies are presumed to be optimal) Does not take conservation into account conserved positions use the same distributions as unconserved Align to profile as discussed previously 45 Biol47800/59500 Bioinformatics
46 Homology Modeling Threading - Pseudo-energy Methods Two approaches to threading - soft and hard threading (my terms) Soft threading - move the sequence along the template structure assuming that the interacting residues are the ones in the template Equivalent to local environment method Dynamic programming works Hard threading - move the sequence through the structure, with gaps, calculating all of the interacting pairs Very time consuming (NP-complete) 46 Biol47800/59500 Bioinformatics
47 Homology Modeling Pseudo-energy methods (quasi-energy, statistical potential, empirical energy function, knowledge-based force field) Boltzmann distribution relates probability to energy Z is the partition function that describes the probabilities of all states in system Frequencies at which residue pairs are seen in real structures can be converted to a pseudo-energy Calculate the energies for all residue pairs at all different separations The energy of any three dimensional structure can then be calculated by summing up the energies of all the pairs at the observed distances 47 Biol47800/59500 Bioinformatics
48 Homology Modeling Pseudo-energy Methods (see also fig 13.6 in text) 48 Biol47800/59500 Bioinformatics
49 Homology Modeling Threading Can it find matches that sequence matching cannot? A is dihydrofolate reductase Interacts to form homodimer Contains catalytic site B is kinase SH3 Interacts with other proteins to make protein-protein interactions Structural only DHFR - thick blue Human survival motor protein - grey E. coli biotin holoenzyme - magenta Repressor KotB - green HIV integrase - orange 49 Biol47800/59500 Bioinformatics
50 Example Swiss-model ( Starting sequence: Medicago calcium-dependent protein kinase Contains protein kinase domain and EF-hand Ca binding domain 50 Biol47800/59500 Bioinformatics
51 Example Swiss-model Four templates found 2vn Human calcium/calmodulin depenndent protein kinasse 2qg Cryptosporidium parvum calcium dependent protein kinase 3hx Toxoplasma gondii CDPK1 2aao Arabidopsis thaliana Calcium dependent kinase EF hand region 51 Biol47800/59500 Bioinformatics
52 Example-Swiss-model Alignment and structure assignment for each template (reference structure) Deletions after residues 238, 260, 345 Insertion after 122, Biol47800/59500 Bioinformatics
53 Example Swiss-model Deletions after residues 238, 260, 345 Insertion after 122, 140 Structurally conserved region Add loops Delete extra residues Rotamer optimization 53 Biol47800/59500 Bioinformatics Energy minimization
54 Example-Swiss-model Structure assessment Gromos MD Anolea stat. potential 54 Biol47800/59500 Bioinformatics
55 Example Swiss-model Final model 55 Biol47800/59500 Bioinformatics
56 Protein Analysis Homologs - Fructose bis-phosphate aldolase 56 Biol47800/59500 Bioinformatics
57 Protein Analysis Homology vs Structural Similarity TIM barrel proteins One of the most common protein folds (>900 examples) Active site always at C-terminal end of beta-barrel Fructose 1,6-bisphosphate aldolase Homologs Triose phosphate isomerase Probably not a homolog 57 Biol47800/59500 Bioinformatics
58 Protein Analysis Structurally similar? Text - page 569 There are many cases, where a protein shares no or little sequence homology and yet is a functional homolog. While these proteins share a betasandwich architecture, they are connected entirely differently Are they homologs? Polycystin 1 (polycycstic kidney disease protein) a cell surface glycoprotein histone deposition protein 58 Biol47800/59500 Bioinformatics
59 Protein Analysis Structure Classifications SCOP - manual CATH largely automatic 59 Biol47800/59500 Bioinformatics
60 Protein Analysis Structural Similarity Structural similarity is measured by overlap of corresponding residue coordinates Most commonly used measure is RMS coordinate difference (RMSD) RMSD is very sensitive to outliers (car door effect) Problem is how to find which residues correspond DALI / FSSP Matches secondary structure elements regardless of connectivity CE (combinatorial extension) Builds up from small matching pieces, according to connectivity VAST Secondary structure orientation and connectivity Not clear which is best, not clear how to evaluate significance since completely unrelated structures are unavailable 60 Biol47800/59500 Bioinformatics
61 Protein Analysis Protein Folds 61 Biol47800/59500 Bioinformatics
62 Protein Analysis The protein structure universe total yearly How many protein folds are there? Are there certain kinds of folds that are more stable? How do you detect structural similarity? 62 Biol47800/59500 Bioinformatics
63 Protein Families Protein families - groups of homologous molecules superfamily, family, subfamily classification introduced by Dayhoff homeologous family families are seen both across and within species Structural classes / Folds - similar structures based on 3-dimensional coordinates may not be homologous - not clear to what extent certain structures are preferred by chance only recently becoming populated Domain Sequence or structure based independently folding unit Families are important for information mapping because they give a guide to how much variation is expected between homologous proteins that maintain similar (or have different) function. 63 Biol47800/59500 Bioinformatics
64 Protein Families Dayhoff Protein Classification Hierarchical classification Folds: Structural similarity Superfamilies: P < 10-3 Highly probable homology Superfamilies generally are entire sequences (homeomorphic family) Newer concept is homology domain - only part of sequence Families: > 50% identical (~E<10-30 ) Clear homology Similar function Substrates and function similar but not identical Subfamilies: >80% identical (~E<10-80 ) Identical function Probably bind nearly identical substrates 64 Biol47800/59500 Bioinformatics
65 Protein Families Clusters of Orthologous Groups COGs & KOGs genomes, 38 orders, 28 classes 14 phyla (192,987 proteins) prokaryotic (COGs) 5666 eukaryotic (KOGs) 4852 Originally (1997), 3307 COGs were delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain % of the gene products from each of the complete bacterial and archaeal genomes and ~35% of those from the yeast Saccharomyces cerevisiae genome. 65 Biol47800/59500 Bioinformatics
66 Protein Families COGs 1. Perform the all-against-all protein sequence comparison. 2. Detect and collapse obvious paralogs, that is, proteins from the same genome that are more similar to each other than to any proteins from other species. 3. Detect triangles of mutually consistent, genome-specific best hits (BeTs), taking into account the paralogous groups detected at step Merge triangles with a common side to form COGs. 5. A case-by-case analysis of each COG. This analysis serves to eliminate false-positives and to identify groups that contain multidomain proteins by examining the pictorial representation of the BLAST search outputs. The sequences of detected multidomain proteins are split into single-domain segments and steps 1 4 are repeated with these sequences, which results in the assignment of individual domains to COGs in accordance with their distinct evolutionary affinities. 6. Examination of large COGs that include multiple members from all or several of the genomes using phylogenetic trees, cluster analysis and visual inspection of alignments; as a result, some of these groups are split into two or more smaller ones that are included in the final set of COGs. 66 Biol47800/59500 Bioinformatics
67 Protein Families COGs & KOGS How well do COGs cover complete genomes? Phyletic patterns of COGs Phyletic patterns of KOGs 67 Biol47800/59500 Bioinformatics
68 Protein Families COGs 68 Biol47800/59500 Bioinformatics
69 Protein Families EggNOG Automatic COGs 630 genomes 529 bacteria 46 archaea 55 eukarya 224,847 Ogs 9724 extended versions of original COG and KOG Green = function annotated Orange = unannotated Gray = no match 69 Biol47800/59500 Bioinformatics
70 Protein Families Structural classifications SCOP Heuristic classification according to traditional crystallographic ideas Recently used as a standard for sequence comparisons v1.75, June PDB Entries Domains. CATH Systematic semi-automatic procedure with more clearly defined process Version 3.3.0, July ,625 PDB chains, 128,688 domains 70 Biol47800/59500 Bioinformatics
71 Protein Families SCOP Primarily manually curated according to traditional crystallographic ideas Family: Clear evolutionarily relationship Generally, pairwise residue identities greater than 30%. In some cases, similar functions and structures provide definitive evidence of common descent in the absence of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Low sequence identity, but structural and functional features suggest a common evolutionary origin. For example, actin, the ATPase domain of the heat shock protein, and hexokinase together form a superfamily. Fold: Major structural similarity Major secondary structures in same arrangement and topology. Proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. Proteins with a common fold may not have a common evolutionary origin: the structural similarities could arise from physical-chemical properties of proteins that 71 Biol47800/59500 Bioinformatics
72 Protein Families SCOP - SCOP 1.75A statistics: PDB entries (released/updated prior to ) Domains. 1 Literature reference. Class Number Number of of folds superfamilies Number of families a: All alpha proteins b: All beta proteins c: Alpha and beta proteins (a/b) d: Alpha and beta proteins (a+b) e: Multi-domain proteins (alpha and beta) f: Membrane and cell surface proteins and peptides g: Small proteins Totals Biol47800/59500 Bioinformatics
73 Protein Families SCOP Class - All Alpha Proteins Globin-like (2) (Globins and Phycocyanins) core: 6 helices; folded leaf, partly opened; Long alpha-hairpin (11) 2 helices; antiparallel hairpin, left-handed twist Cytochrome c (1) core: 3 helices; folded leaf, opened; DNA-binding 3-helical bundle (10) core: 3-helices; bundle, closed or partly opened, right-handed twist; upand down Many more Biol47800/59500 Bioinformatics
74 Protein Families CATH Classification v 3.5.0, September 2011 CATH is more formally specified and less reliant on human intervention than SCOP CATH ,536 domains 2,626 superfamilies 51,334 PDB entries 74 Biol47800/59500 Bioinformatics
75 Protein Families CATH Classification Class Determined according to the secondary structure composition and packing within the structure. Assigned automatically using the method of Michie et al. (1996). Architecture The overall shape of the domain structure as determined by the orientations of the secondary structures; ignores the connectivity between the secondary structures. Assigned manually Topology Fold families at this level depend on both the overall shape and connectivity of the secondary structures. This is done using the structure comparison algorithm SSAP (Taylor & Orengo, 1989). Homologous Superfamily Similarities are identified first by sequence comparisons and subsequently by structure comparison using SSAP. Criteria: Sequence identity >= 35%, 60% of larger structure equivalent to smaller SSAP score >= 80.0 and sequence identity >= 20%, 60% of larger structure equivalent to smaller SSAP score >= 80.0, 60% of larger structure equivalent to smaller, and domains have related functions Sequence Families Domains clustered in the same sequence families have sequence identities 75 Biol47800/59500 Bioinformatics
76 Protein Families 2010 CATH Classification Biol47800/59500 Bioinformatics
77 Protein Docking Finding binding sites Proteins with unknown function Conserved surface areas Hydrophobic surface area Highly charged areas (mostly for nucleic acid binding) Active sites usually in pockets Proteins with known partners docking Rotate and translate in all possible orientations Use scoring function to evaluate match Charge Shape Hydrophobicity How should you deal with flexibility of protein/induced fit 77 Biol47800/59500 Bioinformatics
78 Protein Docking Conformational Search (text: Ch 14) Given two proteins with three-dimensional structures, how do they bind? Hold one fixed Rotate and translate the other 3 angles, 10º increments = 23,000 positions 3 translational parameters, 100Å at 0.5Å intervals = 8 x 10 6 positions Total = 2 x positions to consider All docking methods use approximations What is a good position? Electrostatic interactions Steric interactions Solvent effects 78 Biol47800/59500 Bioinformatics
79 Protein Docking Search Methods Monte Carlo (Metropolis) methods Most common Start in a random position Calculate approximate energy Make a random move Accept the move probabilistically based on energy difference Often merged with genetic algorithm Consider many random starting positions (each is a genome) Each random modification is a mutation Fitness is energy Examples Gold (see text), Autodock 79 Biol47800/59500 Bioinformatics
80 Protein Docking Search Methods Other methods Point complementarity Distance Geometry Tabu search CAPRI Critical Assessment of Protein Interaction Docking contest (like CASP) 80 Biol47800/59500 Bioinformatics
81 Protein Docking Quality Is it a good fit (scoring function) MD energy models (force fields) from MD programs such as CHARMM, AMBER, Gromos Time consuming to calculate Approximate models usually focusing on electrostatics, and atomic overlap Statistical potentials (pseudo energies, knowledge-based scoring) Problems both molecules can move to accommodate binding (induced fit) water Water in binding site may be bound and act as a part of molecule, or Water may be released resulting in entropy increase ( ΔG = ΔH TΔS ) Flexible docking allows molecules to move 81 Biol47800/59500 Bioinformatics
82 Protein Docking Protein Docking Scoring functions are not that great Trypsin/trypsin inhibitor 2PTC beta trypsin (structure with I) 1TPO beta trypsin (structure without I) Bound structure is often significantly different from free structure Even when binding site is correct, the conformation may still be wrong 2PTC vs inhibitor 82 Biol47800/59500 Bioinformatics
83 Protein Docking 83 Biol47800/59500 Bioinformatics
ALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationCMPS 3110: Bioinformatics. Tertiary Structure Prediction
CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction
CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the
More informationBioinformatics. Macromolecular structure
Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain
More informationProtein Structure Prediction
Page 1 Protein Structure Prediction Russ B. Altman BMI 214 CS 274 Protein Folding is different from structure prediction --Folding is concerned with the process of taking the 3D shape, usually based on
More informationProtein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.
Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationDocking. GBCB 5874: Problem Solving in GBCB
Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular
More information09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition
Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform
More informationBasics of protein structure
Today: 1. Projects a. Requirements: i. Critical review of one paper ii. At least one computational result b. Noon, Dec. 3 rd written report and oral presentation are due; submit via email to bphys101@fas.harvard.edu
More informationProtein Structure Prediction, Engineering & Design CHEM 430
Protein Structure Prediction, Engineering & Design CHEM 430 Eero Saarinen The free energy surface of a protein Protein Structure Prediction & Design Full Protein Structure from Sequence - High Alignment
More informationExamples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE
Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationModeling for 3D structure prediction
Modeling for 3D structure prediction What is a predicted structure? A structure that is constructed using as the sole source of information data obtained from computer based data-mining. However, mixing
More informationBuilding 3D models of proteins
Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationStructural Bioinformatics (C3210) Molecular Docking
Structural Bioinformatics (C3210) Molecular Docking Molecular Recognition, Molecular Docking Molecular recognition is the ability of biomolecules to recognize other biomolecules and selectively interact
More informationStructure to Function. Molecular Bioinformatics, X3, 2006
Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families
More informationProtein Modeling. Generating, Evaluating and Refining Protein Homology Models
Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationHomology modeling. Dinesh Gupta ICGEB, New Delhi 1/27/2010 5:59 PM
Homology modeling Dinesh Gupta ICGEB, New Delhi Protein structure prediction Methods: Homology (comparative) modelling Threading Ab-initio Protein Homology modeling Homology modeling is an extrapolation
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Protein Structure Detection Methods October 30, 2017 Comparative Modeling Comparative modeling is modeling of the unknown based on comparison to what is known In the context of modeling or computing
More informationDihedral Angles. Homayoun Valafar. Department of Computer Science and Engineering, USC 02/03/10 CSCE 769
Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC The precise definition of a dihedral or torsion angle can be found in spatial geometry Angle between to planes Dihedral
More informationIntroduction to" Protein Structure
Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationDesign of a Novel Globular Protein Fold with Atomic-Level Accuracy
Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein
More informationProtein Structures. 11/19/2002 Lecture 24 1
Protein Structures 11/19/2002 Lecture 24 1 All 3 figures are cartoons of an amino acid residue. 11/19/2002 Lecture 24 2 Peptide bonds in chains of residues 11/19/2002 Lecture 24 3 Angles φ and ψ in the
More informationMolecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment
Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.
More informationHeteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More informationMolecular Mechanics, Dynamics & Docking
Molecular Mechanics, Dynamics & Docking Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@uchsc.edu http://compbio.uchsc.edu/hunter
More informationSyllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)
Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural
More informationCopyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.
Structure Determination and Sequence Analysis The vast majority of the experimentally determined three-dimensional protein structures have been solved by one of two methods: X-ray diffraction and Nuclear
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationMolecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007
Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline
More informationCSCE555 Bioinformatics. Protein Function Annotation
CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The
More informationProgramme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues
Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationHOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.
HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationTemplate Free Protein Structure Modeling Jianlin Cheng, PhD
Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html
More informationTHE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION
THE TANGO ALGORITHM: SECONDARY STRUCTURE PROPENSITIES, STATISTICAL MECHANICS APPROXIMATION AND CALIBRATION Calculation of turn and beta intrinsic propensities. A statistical analysis of a protein structure
More informationComputational Molecular Modeling
Computational Molecular Modeling Lecture 1: Structure Models, Properties Chandrajit Bajaj Today s Outline Intro to atoms, bonds, structure, biomolecules, Geometry of Proteins, Nucleic Acids, Ribosomes,
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationGetting To Know Your Protein
Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationProtein Structure Determination
Protein Structure Determination Given a protein sequence, determine its 3D structure 1 MIKLGIVMDP IANINIKKDS SFAMLLEAQR RGYELHYMEM GDLYLINGEA 51 RAHTRTLNVK QNYEEWFSFV GEQDLPLADL DVILMRKDPP FDTEFIYATY 101
More informationRNA and Protein Structure Prediction
RNA and Protein Structure Prediction Bioinformatics: Issues and Algorithms CSE 308-408 Spring 2007 Lecture 18-1- Outline Multi-Dimensional Nature of Life RNA Secondary Structure Prediction Protein Structure
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationReview. Membrane proteins. Membrane transport
Quiz 1 For problem set 11 Q1, you need the equation for the average lateral distance transversed (s) of a molecule in the membrane with respect to the diffusion constant (D) and time (t). s = (4 D t) 1/2
More informationFrom Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics
BCHS 6229 Protein Structure and Function Lecture 6 (Oct 27, 2011) From Sequence to Function (I): - Protein Profiling - Case Studies in Structural & Functional Genomics 1 From Sequence to Function in the
More informationComputational Molecular Biology. Protein Structure and Homology Modeling
Computational Molecular Biology Protein Structure and Homology Modeling Prof. Alejandro Giorge1 Dr. Francesco Musiani Sequence, function and structure relationships v Life is the ability to metabolize
More informationSoftwares for Molecular Docking. Lokesh P. Tripathi NCBS 17 December 2007
Softwares for Molecular Docking Lokesh P. Tripathi NCBS 17 December 2007 Molecular Docking Attempt to predict structures of an intermolecular complex between two or more molecules Receptor-ligand (or drug)
More informationD Dobbs ISU - BCB 444/544X 1
11/7/05 Protein Structure: Classification, Databases, Visualization Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses
More informationComputational protein design
Computational protein design There are astronomically large number of amino acid sequences that needs to be considered for a protein of moderate size e.g. if mutating 10 residues, 20^10 = 10 trillion sequences
More informationThe protein folding problem consists of two parts:
Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More informationIntroduction to Computational Structural Biology
Introduction to Computational Structural Biology Part I 1. Introduction The disciplinary character of Computational Structural Biology The mathematical background required and the topics covered Bibliography
More informationBCH 4053 Spring 2003 Chapter 6 Lecture Notes
BCH 4053 Spring 2003 Chapter 6 Lecture Notes 1 CHAPTER 6 Proteins: Secondary, Tertiary, and Quaternary Structure 2 Levels of Protein Structure Primary (sequence) Secondary (ordered structure along peptide
More informationComparing Protein Structures. Why?
7.91 Amy Keating Comparing Protein Structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures classify
More informationLecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability
Lecture 2 and 3: Review of forces (ctd.) and elementary statistical mechanics. Contributions to protein stability Part I. Review of forces Covalent bonds Non-covalent Interactions: Van der Waals Interactions
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationBiochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015,
Biochemistry,530:,, Introduc5on,to,Structural,Biology, Autumn,Quarter,2015, Course,Informa5on, BIOC%530% GraduateAlevel,discussion,of,the,structure,,func5on,,and,chemistry,of,proteins,and, nucleic,acids,,control,of,enzyma5c,reac5ons.,please,see,the,course,syllabus,and,
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationBioengineering 215. An Introduction to Molecular Dynamics for Biomolecules
Bioengineering 215 An Introduction to Molecular Dynamics for Biomolecules David Parker May 18, 2007 ntroduction A principal tool to study biological molecules is molecular dynamics simulations (MD). MD
More informationHomologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are
1 Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are as close to each other as possible. Structural similarity
More information7.91 Amy Keating. Solving structures using X-ray crystallography & NMR spectroscopy
7.91 Amy Keating Solving structures using X-ray crystallography & NMR spectroscopy How are X-ray crystal structures determined? 1. Grow crystals - structure determination by X-ray crystallography relies
More informationTiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1
Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with
More informationProtein Structure Analysis and Verification. Course S Basics for Biosystems of the Cell exercise work. Maija Nevala, BIO, 67485U 16.1.
Protein Structure Analysis and Verification Course S-114.2500 Basics for Biosystems of the Cell exercise work Maija Nevala, BIO, 67485U 16.1.2008 1. Preface When faced with an unknown protein, scientists
More informationA. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility
(P&S Ch 5; Fer Ch 2, 9; Palm Ch 10,11; Zub Ch 9) A. Reaction Mechanisms and Catalysis (1) proximity effect (2) acid-base catalysts (3) electrostatic (4) functional groups (5) structural flexibility B.
More informationUnfolding CspB by means of biased molecular dynamics
Chapter 4 Unfolding CspB by means of biased molecular dynamics 4.1 Introduction Understanding the mechanism of protein folding has been a major challenge for the last twenty years, as pointed out in the
More informationHomology Modeling I. Growth of the Protein Data Bank PDB. Basel, September 30, EMBnet course: Introduction to Protein Structure Bioinformatics
Swiss Institute of Bioinformatics EMBnet course: Introduction to Protein Structure Bioinformatics Homology Modeling I Basel, September 30, 2004 Torsten Schwede Biozentrum - Universität Basel Swiss Institute
More informationPacking of Secondary Structures
7.88 Lecture Notes - 4 7.24/7.88J/5.48J The Protein Folding and Human Disease Professor Gossard Retrieving, Viewing Protein Structures from the Protein Data Base Helix helix packing Packing of Secondary
More informationAb-initio protein structure prediction
Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology
More informationChemogenomic: Approaches to Rational Drug Design. Jonas Skjødt Møller
Chemogenomic: Approaches to Rational Drug Design Jonas Skjødt Møller Chemogenomic Chemistry Biology Chemical biology Medical chemistry Chemical genetics Chemoinformatics Bioinformatics Chemoproteomics
More information114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009
114 Grundlagen der Bioinformatik, SS 09, D. Huson, July 6, 2009 9 Protein tertiary structure Sources for this chapter, which are all recommended reading: D.W. Mount. Bioinformatics: Sequences and Genome
More informationOutline. Levels of Protein Structure. Primary (1 ) Structure. Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins
Lecture 6:Protein Architecture II: Secondary Structure or From peptides to proteins Margaret Daugherty Fall 2004 Outline Four levels of structure are used to describe proteins; Alpha helices and beta sheets
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationProtein Structure Basics
Protein Structure Basics Presented by Alison Fraser, Christine Lee, Pradhuman Jhala, Corban Rivera Importance of Proteins Muscle structure depends on protein-protein interactions Transport across membranes
More informationFlexPepDock In a nutshell
FlexPepDock In a nutshell All Tutorial files are located in http://bit.ly/mxtakv FlexPepdock refinement Step 1 Step 3 - Refinement Step 4 - Selection of models Measure of fit FlexPepdock Ab-initio Step
More informationBiomolecules: lecture 10
Biomolecules: lecture 10 - understanding in detail how protein 3D structures form - realize that protein molecules are not static wire models but instead dynamic, where in principle every atom moves (yet
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationTable 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2
Table 1. Crystallographic data collection, phasing and refinement statistics Native Hg soaked Mn soaked 1 Mn soaked 2 Data collection Space group P2 1 2 1 2 1 P2 1 2 1 2 1 P2 1 2 1 2 1 P2 1 2 1 2 1 Cell
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More information