Last time Today Domains Hidden Markov Models Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK GGLRFAPSVNLSIMKFLGFEQAFKDSLTTL PMGGAKGGSDFDPNGKSDREVMRFCQAFMT ELYRHIGPDIDVPAGDLGVGAREIGYMYGQ YRKIVGGFYNGVLTGKARSFGGSLVRPEAT GYGSVYYVEAVMKHENDTLVGKTVALAGFG NVAWGAAKKLAELGAKAVTLSGPDGYIYDP EGITTEEKINYMLEMRASGRNKVQDYADKF GVQFFPGEKPWGQKVDIIMPCATQNDVDLE QAKKIVANNVKYYIEVANMPTTNEALRFLM QQPNMVVAPSKAVNAGGVLVSGFEMSQNSE RLSWTAEEVDSKLHQVMTDIHDGSAAAAER YGLGYNLVAGANIVGFQKIADAMMAQGIAW Structure What it is Diff. Qual. Primary Sequence Easy Precise Secondary Structure elements Fair Tertiary Atomic coordinates Hard What s in between?
Example of secondary structure Elements, definitions Alpha helix: The classic spiral Beta strand: strands form sheets Turn, bend: Sudden change Coil, loop: Everything else DSSP H B,E S,T C, L,_ Assigned by principles. Coded in DSSP, Stride, etc Defining secondary structure Prediction of secondary structure Principle: Structure affect amino acids distribution. Bad news: No good explicit model for determining secondary structure. Good news: Artificial Neural Networks give decent implicit model. To determine sec. str. of residue i, look at window around i. R i 7 R i 6 R i 1 R i R i+1 R i+6 R i+7
Prediction trick Prediction quality Use homologs! 1. Collect very similar sequences 2. Build profile 3. Use a predictor for profiles Good effect in sec. str. prediction General trick for various predictions problems. One sequence vs A profile Pos 17 has a C Pos 17 is always a C Pos 18 has a A Pos 18 is rarely an A Predictor Accuracy PHD 70% PSIpred 77% Common problem:... EEEEHEEEE... Not an active research area today. 20-30% of proteins in any organism are TM. 70% of drug targets are TM proteins (Pestourie et al, 2006) Bad news: Hard to determine structure for TM-proteins. Less than 1% of PDB contains TM structures. Good news: Regular and clear structure, perfect for HMMs! Classic structure: rhodopsin Sensory rhodopsin (1gue) embedded in the membrane and transducing beneath.
Intro Function End Intro Modern view Function End Beta barrel structure Not studied in this course Image created by Opabinia Regalis. Image from Kauko-Illergård-Elofsson, 2008 Intro Goals Classify proteins: TM or not? Determine TM regions Determine TM topology Function End Intro Function Properties of TM proteins Transmembrane helices are hydrophobic TM regions are 15-30 aa Loops on cytoplasmic side are positive: positive inside rule (Gunnar von Heijne) End
First attempt: TopPred Identify the hydrophobic regions in PSN1_HUMAN. TMHMM: Predictor using an HMM Look at window of 21 aa. Prediction quality Sonnhammer, von Heijne, Krogh, 1998 Signal peptides Good quality Generally correct when 3 TM regions Common problems: Lose a TM region Flip in-out topology Problem discerning signal peptides Short (15-30 aa?) peptide addressing protein to organelles 16% of human proteome have a SP Some SP cleaved from its host protein One hydrophobic TM-segment, 7-15 aa Special predictor for SP: SignalP Common problem for TM predictors
Phobius: including signal peptides TM prediction example Käll, Krogh, Sonnhammer, 2004 Function prediction What is gene/protein function? Why predict structure? Real goal (?): Function Problem 1: What is function? Problem 2: What data do you need? Is protein sequence enough? Chemical reactions? Interactions? Pathway activity? Cell localization? Activity details?
Enzyme Commission number From 1961! Hierarchical classification of enzymes Specifies reactions Example from Wikipedia: EC 3 enzymes are hydrolases EC 3.4 are hydrolases that act on peptide bonds EC 3.4.11 are those hydrolases that cleave off the amino-terminal amino acid from a polypeptide EC 3.4.11.4 are those that cleave off the amino-terminal end from a tripeptide Too limited for Bioinformatics GO example Gene Ontology Controlled vocabulary for function annotation Non-hierarchical is a and part of relationships between terms GO applications Facilitates enrichment studies We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses, whereas growth-related genes show selection against such changes. (Wapinski et al, Nature 2007) Baldock and Burger, Genome Biology 2005
Predicting function? Predict localization Given a gene/protein, can we predict a GO term? Approach: Expert systems Collect homologs Collect orthologs Domain and motif analysis Study other features Study network connections Examples: ProtFun (http://www.cbs.dtu.dk/services/protfun/) FunCoup (http://funcoup.sbc.su.se/) Modest goal! Is the target... mitochondria? peroxisome? endoplasmic reticulum? golgi? Study signal peptide Olof Emanuelsson: TargetP (http://www.cbs.dtu.dk/services/targetp/) Next: Computational genomics Introduction Genome sequencing and assembly EST analysis