Large-Scale Genomic Surveys

Size: px
Start display at page:

Download "Large-Scale Genomic Surveys"

Transcription

1 Bioinformatics ubtopics Fold Recognition econdary tructure Prediction Docking & Drug Design Protein Geometry tructural Informatics Homology Modeling equence Alignment tructure Classification Gene Prediction Function Classification Database Design Genome Annotation E-literature Expression Clustering Large-cale Genomic urveys 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu

2 Databases ome pecific Informatics tools NCBI GenBank- Protein and DNA sequence NCBI Human Map - Human Genome Viewer NCBI Ensembl - Genome browsers for human, mouse, zebra fish, mosquito TIGR - The Institute for Genome Research wissprot - Protein equence and Function ProDom - Protein Domains Pfam - Protein domain families Proite - Protein equence Motifs Protein Data Base (PDB) - Coordinates for Protein 3D structures COP Database- Domain structures organized into evolutionary families HP - Domain database using Dali FlyBase WormBase PubMed / MedLine of Bioinformatics equence Alignment Tools BLAT Clustal MAs FATA PI-Blast Hidden Markov Models 3D tructure Alignments / Classifications Dali VAT PRIM CATH COP

3 Databases ome pecific Informatics tools NCBI GenBank- Protein and DNA sequence NCBI Human Map - Human Genome Viewer NCBI Ensembl - Genome browsers for human, mouse, zebra fish, mosquito TIGR - The Institute for Genome Research wissprot - Protein equence and Function ProDom - Protein Domains Pfam - Protein domain families Proite - Protein equence Motifs Protein Data Base (PDB) - Coordinates for Protein 3D structures COP Database- Domain structures organized into evolutionary families HP - Domain database using Dali CATH Database FlyBase WormBase PubMed / MedLine of Bioinformatics equence Alignment Tools BLAT Clustal MAs FATA PI-Blast Hidden Markov Models 3D tructure Alignments / Classifications Dali CATH COP VAT PRIM

4 Dynamic Programming Algorithm: Alternate Tracebacks Correspond to Alternative Alighments A B C - N Y R Q C L C R - P M A Y C Y N - R - C K C R B P

5 equence imilarity May Miss Functional Homologies Which Can Be Detected by 3D tructural Analysis % equence Identity } Twilight Homologous 3D tructure Non-homologous 3D tructure zone Residues Aligned Adapted from Chris ander

6 tructural Validation of Homology 19% eq ID Z = 12.2 Adenylate Kinase Guanylate Kinase

7 CspA Asp trna ynthetase taphylococcal Nuclease CspB Gene 5 ssdna Binding Protein Topoisomerase I

8 Protein Domains Independent Folding Units residues Mean size residues Alpha folds; Beta Folds; Alpha+Beta Folds; Alpha/Beta Folds

9 COG 272, BRCT family P. Bork et al

10 CDH-4 CDH-3 Cadherin Proteins in Caenorhabditis elegans CDH CDH CDH CDH CDH CDH CDH CDH T01D Y37E11A.94.a 411 Cadherins Fat CG7749 CDH tan 3017 HMR-1a Ds 1223 HMR-1b CadN CG14900 CG3389 CG4655/CG4509 CG15511/CG7805 CG6445 CG7527 hg CG6977 CG11059 CG10421 Cadherin Proteins in Drosophila melanogaster ? ? ? Ret 518 CG10244/HD ignal peptide Cadherin EGF EGF_CA Laminin G Transmembrane Helix 7 Pass Transmembrane Domain HormR GP Merge Position Classic cytoplasmic domain Tyrosine Kinase cytoplasmic domain Type 1 Cytoplasmic domain Type 2 Cytoplasmic domain Other Cytoplasmic domain courtesy of C. Chothia

11 11 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Principal Protein Fold Classes All alpha All beta alpha + beta alpha / beta

12 Classification of Protein Folds - COP -CATH - DALI / FP

13 Most proteins in biology have been produced by the duplication, divergence and recombination of the members of a small number of protein families. courtesy of C. Chothia

14 Average Domain ize: 170 residues Domain Combinations in Genome equences In bacteria close to 1/3 of proteins consist of one domain and 2/3 consist of two or more domains. In eukaryotes close to 1/4 of proteins consist of one domain and 3/4 consist of two or more domains. courtesy of C. Chothia

15 COP - Protein Fold Hierarchy Manually Curated Database of Domain tructures Class - 5 Fold - ~500 uperfamily - ~ 700 Family ~ 1000 Family - domains with common evolutionary origin Homology: Derived by evolutionary divergence

16 Five Principal Fold Classes All α folds All β folds α + β folds α / β folds small irregular folds

17 COP the tructural Classification Of Proteins database This contains all proteins, and protein domains, of known structure classified in terms of their structure and evolutionary relationships. UPERFAMILY This database contains: (a) hidden Markov models (HMMs) of all the proteins and protein domains in COP (b) a list of the matches made by these HMMs to the sequences of 56 genomes classsified by family. courtesy of C. Chothia

18 UPERFAMILY matches to genome sequences Genes Genome hs at ce dm mk sc pa eo ec mu bs bh mb vc cc cs dr ss xf sa af ll nn ph hb nm pm mt tm pb mj hi sq cj ml hp aa tv hq ta cq cp cr tp cm ct bb rp mq mp uu bn mg Genomes courtesy of C. Chothia

19 UPERFAMILY Results for Buchnera and Human Genome equences Buchnera Humans Number of sequences equences matched by UPERFAMILY Coverage of genome 61% 41% Number of matched domains Number of families Mean family size Number of large families that form half the matched domains courtesy of C. Chothia

20 UPERFAMILY Results for Buchnera and Human Genome equences: Top Five Domain Families Buchnera P-loop containing nucleotide triphosphate hydrolases Nucleic acid binding proteins NAD-binding Rossman domains Nucleotidylyl transferases Class II aar synthetases Humans Classic zinc fingers Immunoglobulin superfamily P-loop containing nucleotide triphosphate hydrolases EGF/Laminin Cadherin courtesy of C. Chothia

21 30000 Eukaryotes Other families dm+ce+hs: 45 families at+dm+ce+hs: 56 families All: 381 families sc at dm ce hs courtesy of C. Chothia

22 Bacteria Total EC +B mk pa eo ec bs bh mb mu cc vc sm ca cs dr au ss sa af ll pm av xf st sr tm hb nn nm mt hi ml aa sq pb ph cj mj ap ta tv hq hp tp cp cr cq cm ct rp bb bn mq mp uu mg Genome courtesy of C. Chothia

23 CATH Protein Domain Database Partially Automatic Fold Classificaiton CATH is a hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Orengo, C.A., Michie, A.D., Jones,., Jones, D.T., windells, M.B., and Thornton, J.M. (1997) CATH- A Hierarchic Classification of Protein Domain tructures. tructure. Vol 5. No 8. p Pearl, F.M.G, Lee, D., Bray, J.E, illitoe, I., Todd, A.E., Harrison, A.P., Thornton, J.M. and Orengo, C.A. (2000) Assigning genomic sequences to CATH Nucleic Acids Research. Vol 28. No

24 CATH Protein Domain Database Partially Automatic Fold Classification Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures according to their topological connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to topology families and homologous superfamilies are made by sequence and structure comparisons.

25 25 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu Representations of Protein tuctures a - full atom b,c - strands / helices d - Topology diagrams

26 26 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu tructural Alignment of Two Globins

27 Hb tructure-based equence Alignments Alignment of Individual tructures Fusing into a ingle Fold Template Mb Hb VLPADKTNVKAAWGKVGAHAGEYGAEALERMFLFPTTKTYFPHF-DL-----HGAQVKGHGKKVADALTNAV Mb VLEGEWQLVLHVWAKVEADVAGHGQDILIRLFKHPETLEKFDRFKHLKTEAEMKAEDLKKHGVTVLTALGAIL Hb AHVD-DMPNALALDLHAHKLRVDPVNFKLLHCLLVTLAAHLPAEFTPAVHALDKFLAVTVLTKYR Mb KK-KGHHEAELKPLAQHATKHKIPIKYLEFIEAIIHVLHRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG Elements: Domain definitions; Aligned structures, collecting together Non-homologous equences; Core annotation Previous work: Remington, Matthews 80; Taylor, Orengo 89, 94; Artymiuk, Rice, Willett 89; ali, Blundell, 90; Vriend, ander 91; Russell, Barton 92; Holm, ander 93; Godzik, kolnick 94; Gibrat, Madej, Bryant 96; Falicov, F Cohen, 96; Feng, ippl 96; G Cohen 97; ingh & Brutlag, (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu tructure equence Core Core 2hhb HAHU - D M P N A L A L D L H A H K L - F - - R V D P V NKL L H C L L V T L A A H < HADG - D LPGA L A L D L H A YKL - F - - RV D PVNKLLHCL LVT L ACH HAT - D L P T A L A L D L H A H K L - F - - R V D P A NK L L H C I L V T L A C H HABOKA - D LPGA L D L D L H A H K L - F - - RV D PVNKLLHL LVT L A H HTOR - D L P H A L A L H L H A C Q L - F - - R V D P A Q L L G H C L L V T L A R H HBA_CAIMO - D I A G A L KL D L H A QKL - F - - R V D PVNKFLGHC F LVVVA I H HBAT_HO - E L P R A L A L R H R H V R E L - L - - R V D P A Q L L G H C L L V T P A R H 1ecd GGICE3 P N I E A D V NT F V A H K P R G - L - N - - T H D Q N N F R A G F V Y M K A H < CTTEE P N I G K H V DA L V A T H K P R G - F - N - - T H A QNN FRA A F I A Y L K G H GGICE1 P T I L A K A K D F G K H K R A - L - T - - P A Q D N F R K L V V Y L K G A 1mbd MYWHP - K - G HHE A E L K P L A Q H A T K H - L - H K I P I K Y E F I E A I I H V L H R < MYG_CAFI - K - G HHEAE I K PLAQH A TKH - L - H K IPIKYE F I EA I I H VLQK MYHU - K - G HHEAE I K PLAQH A TKH - L - H K IPVKYE F I EC I I Q VLQK MYBAO - K - G HHEA E I K P L A Q H A TKH - L - H K I P V K Y E L I E I I Q V L QK Consensus Profile - c - - d L P A E h p A h p h? H A? K h - h - d c h p h c Y p h h? C h L V v L h p p <

28 ome imilarities are Readily Apparent others are more ubtle Easy: Globins 125 res., ~1.5 Å Tricky: Ig C & V 85 res., ~3 Å Very ubtle: G3P-dehydrogenase, C-term. Domain >5 Å 28 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu

29 Automatically Comparing Protein tructures Given 2 tructures (A & B), 2 Basic Comparison Operations 1 Find an Alignment between A and B based on their 3D coordinates 2 Given an alignment optimally UPERIMPOE A onto B Find Best R & T to move A onto B 29 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu

30 Distance Matices Provide a 2D Represenation of the 3D tructure

31 Explain Concept of Distance Matrix on Blackboard N x N distance matrix Antiparallel beta strands Parallel beta strands Helices N dimensional space Metric matrix M ij = D ij2 -D io2 -D jo 2 M Eigenvectors (M = 3 for 3D structure)

32 DALI: Protein tructure Comparison by Alignment of Distance Matrices L. Holm and C. ander J. Mol. Biol. 233: 123 (1993) Generate Cα-Cα distance matrix for each protein A and B Decompose into elementary contact patterns; e.g. hexapeptidehexapeptide submatrices ystematic comparisons of all elementary contact patterns in the 2 distance matrices; similar contact patterns are stored in a pair list Assemble pairs of contact patterns into larger consistent sets of pairs (alignments), maximizing the similarity score between these local structures A Monte-Carlo algorithm is used to deal with the combinatorial complexity of building up alignments from contact patterns Dali Z score - number of standard deviations away from mean pairwise similarity value

33

34

35 Dali Domain Dictionary Deitman, Park, Notredame, Heger, Lappe, and Holm Nucleic Acids Res. 29: 5557 (2001) Dali Domain Dictionary is a numerical taxonomy of all known domain structures in the PDB Evolves from Dali / FP Database Holm & ander, Nucl. Acid Res. 25: (1997) Dali Domain Dictionary ept ,532 PDB enteries 17,101 protein chains 5 supersecondary structure motifs (attractors) 1375 fold types 2582 functional families 3724 domain sequence families

36 Explain Concept of Distance Matrix on Blackboard N x N distance matrix N dimensional space Metric matrix M ij = D ij2 -D io2 -D jo 2 Eigenvectors of metric matrix Principal component analysis

37 A Global Representation of Protein Fold pace Hou, ims, Zhang, Kim, PNA 100: (2003) Database of 498 COP Folds or uperfamilies The overall pair-wise comparisons of 498 folds lead to a 498 x 498 matrix of similarity scores ij s, where ij is the alignment score between the ith and jth folds. An appropriate method for handling such data matrices as a whole is metric matrix distance geometry. We first convert the similarity score matrix [ ij ] to a distance matrix [D ij ] by using D ij = max - ij, where max is the maximum similarity score among all pairs of folds. We then transform the distance matrix to a metric (or Gram) matrix [M ij ] by using M ij = D ij2 -D io2 -D jo 2 where D i0, the distance between the ith fold and the geometric centroid of all N = 498 folds. The eigen values of the metric matrix define an orthogonal system of axes, called factors. These axes pass through the geometric centroid of the points representing all observed folds and correspond to a decreasing order of the amount of information each factor represents.

38 A Global Representation of Protein Fold pace Hou, ims, Zhang, Kim, PNA 100: (2003)

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Structural Informatics Homology Modeling Sequence Alignment Structure Classification Gene

More information

Large-Scale Genomic Surveys

Large-Scale Genomic Surveys Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

More information

Protein Structure: Data Bases and Classification Ingo Ruczinski

Protein Structure: Data Bases and Classification Ingo Ruczinski Protein Structure: Data Bases and Classification Ingo Ruczinski Department of Biostatistics, Johns Hopkins University Reference Bourne and Weissig Structural Bioinformatics Wiley, 2003 More References

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Heteropolymer. Mostly in regular secondary structure

Heteropolymer. Mostly in regular secondary structure Heteropolymer - + + - Mostly in regular secondary structure 1 2 3 4 C >N trace how you go around the helix C >N C2 >N6 C1 >N5 What s the pattern? Ci>Ni+? 5 6 move around not quite 120 "#$%&'!()*(+2!3/'!4#5'!1/,#64!#6!,6!

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

CS612 - Algorithms in Bioinformatics

CS612 - Algorithms in Bioinformatics Fall 2017 Databases and Protein Structure Representation October 2, 2017 Molecular Biology as Information Science > 12, 000 genomes sequenced, mostly bacterial (2013) > 5x10 6 unique sequences available

More information

Protein structure analysis. Risto Laakso 10th January 2005

Protein structure analysis. Risto Laakso 10th January 2005 Protein structure analysis Risto Laakso risto.laakso@hut.fi 10th January 2005 1 1 Summary Various methods of protein structure analysis were examined. Two proteins, 1HLB (Sea cucumber hemoglobin) and 1HLM

More information

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity. A global picture of the protein universe will help us to understand

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1

Amino Acid Structures from Klug & Cummings. 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 1 Amino Acid Structures from Klug & Cummings 10/7/2003 CAP/CGS 5991: Lecture 7 2 Amino Acid Structures from Klug & Cummings

More information

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB)

1. Protein Data Bank (PDB) 1. Protein Data Bank (PDB) Protein structure databases; visualization; and classifications 1. Introduction to Protein Data Bank (PDB) 2. Free graphic software for 3D structure visualization 3. Hierarchical classification of protein

More information

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

Analysis and Prediction of Protein Structure (I)

Analysis and Prediction of Protein Structure (I) Analysis and Prediction of Protein Structure (I) Jianlin Cheng, PhD School of Electrical Engineering and Computer Science University of Central Florida 2006 Free for academic use. Copyright @ Jianlin Cheng

More information

Structure to Function. Molecular Bioinformatics, X3, 2006

Structure to Function. Molecular Bioinformatics, X3, 2006 Structure to Function Molecular Bioinformatics, X3, 2006 Structural GeNOMICS Structural Genomics project aims at determination of 3D structures of all proteins: - organize known proteins into families

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013

EBI web resources II: Ensembl and InterPro. Yanbin Yin Spring 2013 EBI web resources II: Ensembl and InterPro Yanbin Yin Spring 2013 1 Outline Intro to genome annotation Protein family/domain databases InterPro, Pfam, Superfamily etc. Genome browser Ensembl Hands on Practice

More information

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models Last time Domains Hidden Markov Models Today Secondary structure Transmembrane proteins Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL

More information

Genome Databases The CATH database

Genome Databases The CATH database Genome Databases The CATH database Michael Knudsen 1 and Carsten Wiuf 1,2* 1 Bioinformatics Research Centre, Aarhus University, DK-8000 Aarhus C, Denmark 2 Centre for Membrane Pumps in Cells and Disease

More information

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/15/07 CAP5510 1 EM Algorithm Goal: Find θ, Z that maximize Pr

More information

MSAT a Multiple Sequence Alignment tool based on TOPS

MSAT a Multiple Sequence Alignment tool based on TOPS MSAT a Multiple Sequence Alignment tool based on TOPS Te Ren, Mallika Veeramalai, Aik Choon Tan and David Gilbert Bioinformatics Research Centre Department of Computer Science University of Glasgow Glasgow,

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics. Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics Iosif Vaisman Email: ivaisman@gmu.edu ----------------------------------------------------------------- Bond

More information

The CATH Database provides insights into protein structure/function relationships

The CATH Database provides insights into protein structure/function relationships 1999 Oxford University Press Nucleic Acids Research, 1999, Vol. 27, No. 1 275 279 The CATH Database provides insights into protein structure/function relationships C. A. Orengo, F. M. G. Pearl, J. E. Bray,

More information

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society

Protein Science (1997), 6: Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society 1 of 5 1/30/00 8:08 PM Protein Science (1997), 6: 246-248. Cambridge University Press. Printed in the USA. Copyright 1997 The Protein Society FOR THE RECORD LPFC: An Internet library of protein family

More information

Homology and Information Gathering and Domain Annotation for Proteins

Homology and Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline Homology Information Gathering for Proteins Domain Annotation for Proteins Examples and exercises The concept of homology The

More information

Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases

Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases Sliding helices and strands in structural comparisons 921 Analysis on sliding helices and strands in protein structural comparisons: A case study with protein kinases V S GOWRI, K ANAMIKA, S GORE 1 and

More information

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon

2 Dean C. Adams and Gavin J. P. Naylor the best three-dimensional ordination of the structure space is found through an eigen-decomposition (correspon A Comparison of Methods for Assessing the Structural Similarity of Proteins Dean C. Adams and Gavin J. P. Naylor? Dept. Zoology and Genetics, Iowa State University, Ames, IA 50011, U.S.A. 1 Introduction

More information

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach

Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Identification of Representative Protein Sequence and Secondary Structure Prediction Using SVM Approach Prof. Dr. M. A. Mottalib, Md. Rahat Hossain Department of Computer Science and Information Technology

More information

ALL LECTURES IN SB Introduction

ALL LECTURES IN SB Introduction 1. Introduction 2. Molecular Architecture I 3. Molecular Architecture II 4. Molecular Simulation I 5. Molecular Simulation II 6. Bioinformatics I 7. Bioinformatics II 8. Prediction I 9. Prediction II ALL

More information

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy many slides by Philip E. Bourne Department of Pharmacology, UCSD Agenda Understand the relationship between sequence,

More information

Some Problems from Enzyme Families

Some Problems from Enzyme Families Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

More information

EBI web resources II: Ensembl and InterPro

EBI web resources II: Ensembl and InterPro EBI web resources II: Ensembl and InterPro Yanbin Yin http://www.ebi.ac.uk/training/online/course/ 1 Homework 3 Go to http://www.ebi.ac.uk/interpro/training.htmland finish the second online training course

More information

Automated Identification of Protein Structural Features

Automated Identification of Protein Structural Features Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, INDIA. chandra

More information

Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

More information

The molecular functions of a protein can be inferred from

The molecular functions of a protein can be inferred from Global mapping of the protein structure space and application in structure-based inference of protein function Jingtong Hou*, Se-Ran Jun, Chao Zhang, and Sung-Hou Kim* Department of Chemistry and *Graduate

More information

Protein structure alignments

Protein structure alignments Protein structure alignments Proteins that fold in the same way, i.e. have the same fold are often homologs. Structure evolves slower than sequence Sequence is less conserved than structure If BLAST gives

More information

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

More information

Structural Alignment of Proteins

Structural Alignment of Proteins Goal Align protein structures Structural Alignment of Proteins 1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE

More information

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment

More information

Computational Molecular Biology (

Computational Molecular Biology ( Computational Molecular Biology (http://cmgm cmgm.stanford.edu/biochem218/) Biochemistry 218/Medical Information Sciences 231 Douglas L. Brutlag, Lee Kozar Jimmy Huang, Josh Silverman Lecture Syllabus

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the

We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the SUPPLEMENTARY METHODS - in silico protein analysis We used the PSI-BLAST program (http://www.ncbi.nlm.nih.gov/blast/) to search the Protein Data Bank (PDB, http://www.rcsb.org/pdb/) and the NCBI non-redundant

More information

Protein Structure & Motifs

Protein Structure & Motifs & Motifs Biochemistry 201 Molecular Biology January 12, 2000 Doug Brutlag Introduction Proteins are more flexible than nucleic acids in structure because of both the larger number of types of residues

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs

proteins SHORT COMMUNICATION MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs J_ID: Z7E Customer A_ID: 21783 Cadmus Art: PROT21783 Date: 25-SEPTEMBER-07 Stage: I Page: 1 proteins STRUCTURE O FUNCTION O BIOINFORMATICS SHORT COMMUNICATION MALIDUP: A database of manually constructed

More information

CAP 5510 Lecture 3 Protein Structures

CAP 5510 Lecture 3 Protein Structures CAP 5510 Lecture 3 Protein Structures Su-Shing Chen Bioinformatics CISE 8/19/2005 Su-Shing Chen, CISE 1 Protein Conformation 8/19/2005 Su-Shing Chen, CISE 2 Protein Conformational Structures Hydrophobicity

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2

More information

Homology. and. Information Gathering and Domain Annotation for Proteins

Homology. and. Information Gathering and Domain Annotation for Proteins Homology and Information Gathering and Domain Annotation for Proteins Outline WHAT IS HOMOLOGY? HOW TO GATHER KNOWN PROTEIN INFORMATION? HOW TO ANNOTATE PROTEIN DOMAINS? EXAMPLES AND EXERCISES Homology

More information

Bioinformatics. Macromolecular structure

Bioinformatics. Macromolecular structure Bioinformatics Macromolecular structure Contents Determination of protein structure Structure databases Secondary structure elements (SSE) Tertiary structure Structure analysis Structure alignment Domain

More information

Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List

Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List Comparing Genomes in terms of Protein Structure: Surveys of a Finite Parts List Mark Gerstein * & Hedi Hegyi Department of Molecular Biophysics & Biochemistry 266 Whitney Avenue, Yale University PO Box

More information

Automated Identification of Protein Structural Features

Automated Identification of Protein Structural Features Automated Identification of Protein Structural Features Chandrasekhar Mamidipally 1, Santosh B. Noronha 1, and Sumantra Dutta Roy 2 1 Dept. of Chemical Engg., IIT Bombay, Powai, Mumbai - 400 076, India

More information

A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

More information

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bioinformatics. Dept. of Computational Biology & Bioinformatics Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Prediction of protein function from sequence analysis

Prediction of protein function from sequence analysis Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy The omic era Genome Sequencing Projects: Archaea: 74 species In Progress:52 Bacteria:

More information

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions

Goals. Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Structural Analysis of the EGR Family of Transcription Factors: Templates for Predicting Protein DNA Interactions Jamie Duke 1,2 and Carlos Camacho 3 1 Bioengineering and Bioinformatics Summer Institute,

More information

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space

Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Published online February 15, 26 166 18 Nucleic Acids Research, 26, Vol. 34, No. 3 doi:1.193/nar/gkj494 Comprehensive genome analysis of 23 genomes provides structural genomics with new insights into protein

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinff18.html Proteins and Protein Structure

More information

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University COMP 598 Advanced Computational Biology Methods & Research Introduction Jérôme Waldispühl School of Computer Science McGill University General informations (1) Office hours: by appointment Office: TR3018

More information

Domain-based computational approaches to understand the molecular basis of diseases

Domain-based computational approaches to understand the molecular basis of diseases Domain-based computational approaches to understand the molecular basis of diseases Dr. Maricel G. Kann Assistant Professor Dept of Biological Sciences UMBC http://bioinf.umbc.edu Research at Kann s Lab.

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

The PDB is a Covering Set of Small Protein Structures

The PDB is a Covering Set of Small Protein Structures doi:10.1016/j.jmb.2003.10.027 J. Mol. Biol. (2003) 334, 793 802 The PDB is a Covering Set of Small Protein Structures Daisuke Kihara and Jeffrey Skolnick* Center of Excellence in Bioinformatics, University

More information

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure Bioch/BIMS 503 Lecture 2 Structure and Function of Proteins August 28, 2008 Robert Nakamoto rkn3c@virginia.edu 2-0279 Secondary Structure Φ Ψ angles determine protein structure Φ Ψ angles are restricted

More information

Data Mining in Protein Binding Cavities

Data Mining in Protein Binding Cavities In Proc. GfKl 2004, Dortmund: Data Mining in Protein Binding Cavities Katrin Kupas and Alfred Ultsch Data Bionics Research Group, University of Marburg, D-35032 Marburg, Germany Abstract. The molecular

More information

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like

SCOP. all-β class. all-α class, 3 different folds. T4 endonuclease V. 4-helical cytokines. Globin-like SCOP all-β class 4-helical cytokines T4 endonuclease V all-α class, 3 different folds Globin-like TIM-barrel fold α/β class Profilin-like fold α+β class http://scop.mrc-lmb.cam.ac.uk/scop CATH Class, Architecture,

More information

Getting To Know Your Protein

Getting To Know Your Protein Getting To Know Your Protein Comparative Protein Analysis: Part III. Protein Structure Prediction and Comparison Robert Latek, PhD Sr. Bioinformatics Scientist Whitehead Institute for Biomedical Research

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor Presented by Stephanie Lee Research Mentor: Dr. Rob Coalson Glycine Alpha 1 Receptor (GlyRa1) Member of the superfamily

More information

FoldMiner: Structural motif discovery using an improved superposition algorithm

FoldMiner: Structural motif discovery using an improved superposition algorithm FoldMiner: Structural motif discovery using an improved superposition algorithm JESSICA SHAPIRO 1 AND DOUGLAS BRUTLAG 1,2 1 Biophysics Program and 2 Department of Biochemistry, Stanford University, Stanford,

More information

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

A General Model for Amino Acid Interaction Networks

A General Model for Amino Acid Interaction Networks Author manuscript, published in "N/P" A General Model for Amino Acid Interaction Networks Omar GACI and Stefan BALEV hal-43269, version - Nov 29 Abstract In this paper we introduce the notion of protein

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB Homology Modeling (Comparative Structure Modeling) Aims of Structural Genomics High-throughput 3D structure determination and analysis To determine or predict the 3D structures of all the proteins encoded

More information

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS

IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS IMPORTANCE OF SECONDARY STRUCTURE ELEMENTS FOR PREDICTION OF GO ANNOTATIONS Aslı Filiz 1, Eser Aygün 2, Özlem Keskin 3 and Zehra Cataltepe 2 1 Informatics Institute and 2 Computer Engineering Department,

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature12890 Supplementary Table 1 Summary of protein components in the 39S subunit model. MW, molecular weight; aa amino acids; RP, ribosomal protein. Protein* MRP size MW (kda) Sequence accession

More information

Contact map guided ab initio structure prediction

Contact map guided ab initio structure prediction Contact map guided ab initio structure prediction S M Golam Mortuza Postdoctoral Research Fellow I-TASSER Workshop 2017 North Carolina A&T State University, Greensboro, NC Outline Ab initio structure prediction:

More information

A NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION

A NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION A NEW ALGORITHM FOR THE ALIGNMENT OF MULTIPLE PROTEIN STRUCTURES USING MONTE CARLO OPTIMIZATION C. GUDA, E. D. SCHEEFF, P. E. BOURNE 1,2, I. N. SHINDYALOV San Diego Supercomputer Center, University of

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Comparing Protein Structures. Why?

Comparing Protein Structures. Why? 7.91 Amy Keating Comparing Protein Structures Why? detect evolutionary relationships identify recurring motifs detect structure/function relationships predict function assess predicted structures classify

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

D Dobbs ISU - BCB 444/544X 1

D Dobbs ISU - BCB 444/544X 1 11/7/05 Protein Structure: Classification, Databases, Visualization Announcements BCB 544 Projects - Important Dates: Nov 2 Wed noon - Project proposals due to David/Drena Nov 4 Fri PM - Approvals/responses

More information

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics Jianlin Cheng, PhD Department of Computer Science University of Missouri, Columbia

More information

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1

Protein Structures. Sequences of amino acid residues 20 different amino acids. Quaternary. Primary. Tertiary. Secondary. 10/8/2002 Lecture 12 1 Protein Structures Sequences of amino acid residues 20 different amino acids Primary Secondary Tertiary Quaternary 10/8/2002 Lecture 12 1 Angles φ and ψ in the polypeptide chain 10/8/2002 Lecture 12 2

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

More information

Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Protein function prediction based on sequence analysis

Protein function prediction based on sequence analysis Performing sequence searches Post-Blast analysis, Using profiles and pattern-matching Protein function prediction based on sequence analysis Slides from a lecture on MOL204 - Applied Bioinformatics 18-Oct-2005

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information