Large-Scale Genomic Surveys
|
|
- Monica Lester
- 5 years ago
- Views:
Transcription
1 Bioinformatics ubtopics Fold Recognition econdary tructure Prediction Docking & Drug Design Protein Geometry tructural Informatics Homology Modeling equence Alignment tructure Classification Gene Prediction Function Classification Database Design Genome Annotation E-literature Expression Clustering Large-cale Genomic urveys 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
2 Databases ome pecific Informatics tools NCBI GenBank- Protein and DNA sequence NCBI Human Map - Human Genome Viewer NCBI Ensembl - Genome browsers for human, mouse, zebra fish, mosquito TIGR - The Institute for Genome Research wissprot - Protein equence and Function ProDom - Protein Domains Pfam - Protein domain families Proite - Protein equence Motifs Protein Data Base (PDB) - Coordinates for Protein 3D structures COP Database- Domain structures organized into evolutionary families HP - Domain database using Dali FlyBase WormBase PubMed / MedLine of Bioinformatics equence Alignment Tools BLAT Clustal MAs FATA PI-Blast Hidden Markov Models 3D tructure Alignments / Classifications Dali VAT PRIM CATH COP
3 Databases ome pecific Informatics tools NCBI GenBank- Protein and DNA sequence NCBI Human Map - Human Genome Viewer NCBI Ensembl - Genome browsers for human, mouse, zebra fish, mosquito TIGR - The Institute for Genome Research wissprot - Protein equence and Function ProDom - Protein Domains Pfam - Protein domain families Proite - Protein equence Motifs Protein Data Base (PDB) - Coordinates for Protein 3D structures COP Database- Domain structures organized into evolutionary families HP - Domain database using Dali CATH Database FlyBase WormBase PubMed / MedLine of Bioinformatics equence Alignment Tools BLAT Clustal MAs FATA PI-Blast Hidden Markov Models 3D tructure Alignments / Classifications Dali CATH COP VAT PRIM
4 Dynamic Programming Algorithm: Alternate Tracebacks Correspond to Alternative Alighments A B C - N Y R Q C L C R - P M A Y C Y N - R - C K C R B P
5 equence imilarity May Miss Functional Homologies Which Can Be Detected by 3D tructural Analysis % equence Identity } Twilight Homologous 3D tructure Non-homologous 3D tructure zone Residues Aligned Adapted from Chris ander
6 tructural Validation of Homology 19% eq ID Z = 12.2 Adenylate Kinase Guanylate Kinase
7 CspA Asp trna ynthetase taphylococcal Nuclease CspB Gene 5 ssdna Binding Protein Topoisomerase I
8 Protein Domains Independent Folding Units residues Mean size residues Alpha folds; Beta Folds; Alpha+Beta Folds; Alpha/Beta Folds
9 COG 272, BRCT family P. Bork et al
10 CDH-4 CDH-3 Cadherin Proteins in Caenorhabditis elegans CDH CDH CDH CDH CDH CDH CDH CDH T01D Y37E11A.94.a 411 Cadherins Fat CG7749 CDH tan 3017 HMR-1a Ds 1223 HMR-1b CadN CG14900 CG3389 CG4655/CG4509 CG15511/CG7805 CG6445 CG7527 hg CG6977 CG11059 CG10421 Cadherin Proteins in Drosophila melanogaster ? ? ? Ret 518 CG10244/HD ignal peptide Cadherin EGF EGF_CA Laminin G Transmembrane Helix 7 Pass Transmembrane Domain HormR GP Merge Position Classic cytoplasmic domain Tyrosine Kinase cytoplasmic domain Type 1 Cytoplasmic domain Type 2 Cytoplasmic domain Other Cytoplasmic domain courtesy of C. Chothia
11 11 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Principal Protein Fold Classes All alpha All beta alpha + beta alpha / beta
12 Classification of Protein Folds - COP -CATH - DALI / FP
13 Most proteins in biology have been produced by the duplication, divergence and recombination of the members of a small number of protein families. courtesy of C. Chothia
14 Average Domain ize: 170 residues Domain Combinations in Genome equences In bacteria close to 1/3 of proteins consist of one domain and 2/3 consist of two or more domains. In eukaryotes close to 1/4 of proteins consist of one domain and 3/4 consist of two or more domains. courtesy of C. Chothia
15 COP - Protein Fold Hierarchy Manually Curated Database of Domain tructures Class - 5 Fold - ~500 uperfamily - ~ 700 Family ~ 1000 Family - domains with common evolutionary origin Homology: Derived by evolutionary divergence
16 Five Principal Fold Classes All α folds All β folds α + β folds α / β folds small irregular folds
17 COP the tructural Classification Of Proteins database This contains all proteins, and protein domains, of known structure classified in terms of their structure and evolutionary relationships. UPERFAMILY This database contains: (a) hidden Markov models (HMMs) of all the proteins and protein domains in COP (b) a list of the matches made by these HMMs to the sequences of 56 genomes classsified by family. courtesy of C. Chothia
18 UPERFAMILY matches to genome sequences Genes Genome hs at ce dm mk sc pa eo ec mu bs bh mb vc cc cs dr ss xf sa af ll nn ph hb nm pm mt tm pb mj hi sq cj ml hp aa tv hq ta cq cp cr tp cm ct bb rp mq mp uu bn mg Genomes courtesy of C. Chothia
19 UPERFAMILY Results for Buchnera and Human Genome equences Buchnera Humans Number of sequences equences matched by UPERFAMILY Coverage of genome 61% 41% Number of matched domains Number of families Mean family size Number of large families that form half the matched domains courtesy of C. Chothia
20 UPERFAMILY Results for Buchnera and Human Genome equences: Top Five Domain Families Buchnera P-loop containing nucleotide triphosphate hydrolases Nucleic acid binding proteins NAD-binding Rossman domains Nucleotidylyl transferases Class II aar synthetases Humans Classic zinc fingers Immunoglobulin superfamily P-loop containing nucleotide triphosphate hydrolases EGF/Laminin Cadherin courtesy of C. Chothia
21 30000 Eukaryotes Other families dm+ce+hs: 45 families at+dm+ce+hs: 56 families All: 381 families sc at dm ce hs courtesy of C. Chothia
22 Bacteria Total EC +B mk pa eo ec bs bh mb mu cc vc sm ca cs dr au ss sa af ll pm av xf st sr tm hb nn nm mt hi ml aa sq pb ph cj mj ap ta tv hq hp tp cp cr cq cm ct rp bb bn mq mp uu mg Genome courtesy of C. Chothia
23 CATH Protein Domain Database Partially Automatic Fold Classificaiton CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Orengo, C.A., Michie, A.D., Jones,., Jones, D.T., windells, M.B., and Thornton, J.M. (1997) CATH- A Hierarchic Classification of Protein Domain tructures. tructure. Vol 5. No 8. p Pearl, F.M.G, Lee, D., Bray, J.E, illitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. (2000) Assigning genomic sequences to CATH Nucleic Acids Research. Vol 28. No
24 CATH Protein Domain Database Partially Automatic Fold Classification Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to topology families and homologous superfamilies are made by sequence and structure comparisons.
25 25 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Representations of Protein tuctures a - full atom b,c - strands / helices d - Topology diagrams
26 26 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu tructural Alignment of Two Globins
27 Hb tructure-based equence Alignments Alignment of Individual tructures Fusing into a ingle Fold Template Mb Hb VLPADKTNVKAAWGKVGAHAGEYGAEALERMFLFPTTKTYFPHF-DL-----HGAQVKGHGKKVADALTNAV Mb VLEGEWQLVLHVWAKVEADVAGHGQDILIRLFKHPETLEKFDRFKHLKTEAEMKAEDLKKHGVTVLTALGAIL Hb AHVD-DMPNALALDLHAHKLRVDPVNFKLLHCLLVTLAAHLPAEFTPAVHALDKFLAVTVLTKYR Mb KK-KGHHEAELKPLAQHATKHKIPIKYLEFIEAIIHVLHRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Elements: Domain definitions; Aligned structures, collecting together Non-homologous equences; Core annotation Previous work: Remington, Matthews 80; Taylor, Orengo 89, 94; Artymiuk, Rice, Willett 89; ali, Blundell, 90; Vriend, ander 91; Russell, Barton 92; Holm, ander 93; Godzik, kolnick 94; Gibrat, Madej, Bryant 96; Falicov, F Cohen, 96; Feng, ippl 96; G Cohen 97; ingh & Brutlag, (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu tructure equence Core Core 2hhb HAHU - D M P N A L A L D L H A H K L - F - - R V D P V NKL L H C L L V T L A A H < HADG - D LPGA L A L D L H A YKL - F - - RV D PVNKLLHCL LVT L ACH HAT - D L P T A L A L D L H A H K L - F - - R V D P A NK L L H C I L V T L A C H HABOKA - D LPGA L D L D L H A H K L - F - - RV D PVNKLLHL LVT L A H HTOR - D L P H A L A L H L H A C Q L - F - - R V D P A Q L L G H C L L V T L A R H HBA_CAIMO - D I A G A L KL D L H A QKL - F - - R V D PVNKFLGHC F LVVVA I H HBAT_HO - E L P R A L A L R H R H V R E L - L - - R V D P A Q L L G H C L L V T P A R H 1ecd GGICE3 P N I E A D V NT F V A H K P R G - L - N - - T H D Q N N F R A G F V Y M K A H < CTTEE P N I G K H V DA L V A T H K P R G - F - N - - T H A QNN FRA A F I A Y L K G H GGICE1 P T I L A K A K D F G K H K R A - L - T - - P A Q D N F R K L V V Y L K G A 1mbd MYWHP - K - G HHE A E L K P L A Q H A T K H - L - H K I P I K Y E F I E A I I H V L H R < MYG_CAFI - K - G HHEAE I K PLAQH A TKH - L - H K IPIKYE F I EA I I H VLQK MYHU - K - G HHEAE I K PLAQH A TKH - L - H K IPVKYE F I EC I I Q VLQK MYBAO - K - G HHEA E I K P L A Q H A TKH - L - H K I P V K Y E L I E I I Q V L QK Consensus Profile - c - - d L P A E h p A h p h? H A? K h - h - d c h p h c Y p h h? C h L V v L h p p <
28 ome imilarities are Readily Apparent others are more ubtle Easy: Globins 125 res., ~1.5 Å Tricky: Ig C & V 85 res., ~3 Å Very ubtle: G3P-dehydrogenase, C-term. Domain >5 Å 28 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
29 Automatically Comparing Protein tructures Given 2 tructures (A & B), 2 Basic Comparison Operations 1 Find an Alignment between A and B based on their 3D coordinates 2 Given an alignment optimally UPERIMPOE A onto B Find Best R & T to move A onto B 29 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu
30 Distance Matices Provide a 2D Represenation of the 3D tructure
31 Explain Concept of Distance Matrix on Blackboard N x N distance matrix Antiparallel beta strands Parallel beta strands Helices N dimensional space Metric matrix M ij = D ij2 -D io2 -D jo 2 M Eigenvectors (M = 3 for 3D structure)
32 DALI: Protein tructure Comparison by Alignment of Distance Matrices L. Holm and C. ander J. Mol. Biol. 233: 123 (1993) Generate Cα-Cα distance matrix for each protein A and B Decompose into elementary contact patterns; e.g. hexapeptidehexapeptide submatrices ystematic comparisons of all elementary contact patterns in the 2 distance matrices; similar contact patterns are stored in a pair list Assemble pairs of contact patterns into larger consistent sets of pairs (alignments), maximizing the similarity score between these local structures A Monte-Carlo algorithm is used to deal with the combinatorial complexity of building up alignments from contact patterns Dali Z score - number of standard deviations away from mean pairwise similarity value
33
34
35 Dali Domain Dictionary Deitman, Park, Notredame, Heger, Lappe, and Holm Nucleic Acids Res. 29: 5557 (2001) Dali Domain Dictionary is a numerical taxonomy of all known domain structures in the PDB Evolves from Dali / FP Database Holm & ander, Nucl. Acid Res. 25: (1997) Dali Domain Dictionary ept ,532 PDB enteries 17,101 protein chains 5 supersecondary structure motifs (attractors) 1375 fold types 2582 functional families 3724 domain sequence families
36 Explain Concept of Distance Matrix on Blackboard N x N distance matrix N dimensional space Metric matrix M ij = D ij2 -D io2 -D jo 2 Eigenvectors of metric matrix Principal component analysis
37 A Global Representation of Protein Fold pace Hou, ims, Zhang, Kim, PNA 100: (2003) Database of 498 COP Folds or uperfamilies The overall pair-wise comparisons of 498 folds lead to a 498 x 498 matrix of similarity scores ij s, where ij is the alignment score between the ith and jth folds. An appropriate method for handling such data matrices as a whole is metric matrix distance geometry. We first convert the similarity score matrix [ ij ] to a distance matrix [D ij ] by using D ij = max - ij, where max is the maximum similarity score among all pairs of folds. We then transform the distance matrix to a metric (or Gram) matrix [M ij ] by using M ij = D ij2 -D io2 -D jo 2 where D i0, the distance between the ith fold and the geometric centroid of all N = 498 folds. The eigen values of the metric matrix define an orthogonal system of axes, called factors. These axes pass through the geometric centroid of the points representing all observed folds and correspond to a decreasing order of the amount of information each factor represents.
38 A Global Representation of Protein Fold pace Hou, ims, Zhang, Kim, PNA 100: (2003)
Large-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Structural Informatics Homology Modeling Sequence Alignment Structure Classification Gene
More informationLarge-Scale Genomic Surveys
Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction
More informationProtein Structure: Data Bases and Classification Ingo Ruczinski
Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationHeteropolymer. Mostly in regular secondary structure
Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!
More informationNumber sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence
Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationCS612 - Algorithms in Bioinformatics
Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available
More informationProtein structure analysis. Risto Laakso 10th January 2005
Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM
More information2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.
Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationAmino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1
Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings
More information1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)
Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein
More informationSequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5
Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationAnalysis and Prediction of Protein Structure (I)
Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng
More informationStructure to Function. Molecular Bioinformatics, X3, 2006
Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationEBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013
EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice
More informationIntro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models
Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL
More informationGenome Databases The CATH database
Genome Databases The CATH database Michael Knudsen 1 and Carsten Wiuf 1,2* 1 Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark 2 Centre for Membrane Pumps in Cells and Disease
More informationToday. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure
Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr
More informationMSAT a Multiple Sequence Alignment tool based on TOPS
MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationProcheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.
Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond
More informationThe CATH Database provides insights into protein structure/function relationships
1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 275 279 The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray,
More informationProtein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society
1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family
More informationHomology and Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The
More informationAnalysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases
Sliding helices and strands in structural comparisons 921 Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases V S GOWRI, K ANAMIKA, S GORE 1 and
More information2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon
A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction
More informationIdentification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach
Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology
More informationALL LECTURES IN SB Introduction
1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL
More informationUnderstanding Sequence, Structure and Function Relationships and the Resulting Redundancy
Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationEBI web resources II: Ensembl and InterPro
EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course
More informationAutomated Identification of Protein Structural Features
Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, INDIA. chandra
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationThe molecular functions of a protein can be inferred from
Global mapping of the protein structure space and application in structure-based inference of protein function Jingtong Hou*, Se-Ran Jun, Chao Zhang, and Sung-Hou Kim* Department of Chemistry and *Graduate
More informationProtein structure alignments
Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives
More informationBioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing
Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.
More informationStructural Alignment of Proteins
Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE
More informationSupporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB
Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment
More informationComputational Molecular Biology (
Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationWe used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the
SUPPLEMENTARY METHODS - in silico protein analysis We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the Protein Data Bank (PDB, http://www.rcsb.org/pdb/) and the NCBI non-redundant
More informationProtein Structure & Motifs
& Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationproteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs
J_ID: Z7E Customer A_ID: 21783 Cadmus Art: PROT21783 Date: 25-SEPTEMBER-07 Stage: I Page: 1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS SHORT COMMUNICATION MALIDUP: A database of manually constructed
More informationCAP 5510 Lecture 3 Protein Structures
CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationPROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES
PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2
More informationHomology. and. Information Gathering and Domain Annotation for Proteins
Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology
More informationBioinformatics. Macromolecular structure
Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain
More informationComparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List
Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List Mark Gerstein * & Hedi Hegyi Department of Molecular Biophysics & Biochemistry 266 Whitney Avenue, Yale University PO Box
More informationAutomated Identification of Protein Structural Features
Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, and Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, India
More informationA profile-based protein sequence alignment algorithm for a domain clustering database
A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationPrediction of protein function from sequence analysis
Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:
More informationGoals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions
Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute,
More informationComprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein
More informationAmino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)
Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure
More informationCOMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University
COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018
More informationDomain-based computational approaches to understand the molecular basis of diseases
Domain-based computational approaches to understand the molecular basis of diseases Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC http://bioinf.umbc.edu Research at Kann s Lab.
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationThe PDB is a Covering Set of Small Protein Structures
doi:10.1016/j.jmb.2003.10.027 J. Mol. Biol. (2003) 334, 793 802 The PDB is a Covering Set of Small Protein Structures Daisuke Kihara and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University
More informationSecondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure
Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted
More informationData Mining in Protein Binding Cavities
In Proc. GfKl 2004, Dortmund: Data Mining in Protein Binding Cavities Katrin Kupas and Alfred Ultsch Data Bionics Research Group, University of Marburg, D-35032 Marburg, Germany Abstract. The molecular
More informationSCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like
SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,
More informationGetting To Know Your Protein
Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationBuilding a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor
Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor Presented by Stephanie Lee Research Mentor: Dr. Rob Coalson Glycine Alpha 1 Receptor (GlyRa1) Member of the superfamily
More informationFoldMiner: Structural motif discovery using an improved superposition algorithm
FoldMiner: Structural motif discovery using an improved superposition algorithm JESSICA SHAPIRO 1 AND DOUGLAS BRUTLAG 1,2 1 Biophysics Program and 2 Department of Biochemistry, Stanford University, Stanford,
More informationGrouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationA General Model for Amino Acid Interaction Networks
Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationHomology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB
Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded
More informationIMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS
IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS Aslı Filiz 1, Eser Aygün 2, Özlem Keskin 3 and Zehra Cataltepe 2 1 Informatics Institute and 2 Computer Engineering Department,
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature12890 Supplementary Table 1 Summary of protein components in the 39S subunit model. MW, molecular weight; aa amino acids; RP, ribosomal protein. Protein* MRP size MW (kda) Sequence accession
More informationContact map guided ab initio structure prediction
Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:
More informationA NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION
A NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION C. GUDA, E. D. SCHEEFF, P. E. BOURNE 1,2, I. N. SHINDYALOV San Diego Supercomputer Center, University of
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationComparing Protein Structures. Why?
7.91 Amy Keating Comparing Protein Structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures classify
More informationSupporting Online Material for
www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*
More informationD Dobbs ISU - BCB 444/544X 1
11/7/05 Protein Structure: Classification, Databases, Visualization Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses
More informationStatistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics
Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia
More informationProtein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1
Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1 Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationMotifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC
Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved
More informationBioinformatics: Secondary Structure Prediction
Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationProtein function prediction based on sequence analysis
Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005
More informationAlpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University
Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and
More information