Last time Domains Hidden Markov Models
Today Secondary structure Transmembrane proteins
Structure prediction NAD-specific glutamate dehydrogenase Hard Easy >P24295 DHE2_CLOSY MSKYVDRVIAEVEKKYADEPEFVQTVEEVL SSLGPVVDAHPEYEEVALLERMVIPERVIE FRVPWEDDNGKVHVNTGYRVQFNGAIGPYK GGLRFAPSVNLSIMKFLGFEQAFKDSLTTL PMGGAKGGSDFDPNGKSDREVMRFCQAFMT ELYRHIGPDIDVPAGDLGVGAREIGYMYGQ YRKIVGGFYNGVLTGKARSFGGSLVRPEAT GYGSVYYVEAVMKHENDTLVGKTVALAGFG NVAWGAAKKLAELGAKAVTLSGPDGYIYDP EGITTEEKINYMLEMRASGRNKVQDYADKF GVQFFPGEKPWGQKVDIIMPCATQNDVDLE QAKKIVANNVKYYIEVANMPTTNEALRFLM QQPNMVVAPSKAVNAGGVLVSGFEMSQNSE RLSWTAEEVDSKLHQVMTDIHDGSAAAAER YGLGYNLVAGANIVGFQKIADAMMAQGIAW What s in between?
Secondary structure Structure What it is Diff. Qual. Primary Sequence Easy Precise Secondary Structure elements Fair Tertiary Atomic coordinates Hard
Example of secondary structure
Elements, definitions Alpha helix: The classic spiral Beta strand: strands form sheets Turn, bend: Sudden change Coil, loop: Everything else DSSP H B,E S,T C, L,_ Assigned by principles. Coded in DSSP, Stride, etc
Defining secondary structure
Prediction of secondary structure Principle: Structure affect amino acids distribution.
Prediction of secondary structure Principle: Structure affect amino acids distribution. Bad news: No good explicit model for determining secondary structure.
Prediction of secondary structure Principle: Structure affect amino acids distribution. Bad news: No good explicit model for determining secondary structure. Good news: Artificial Neural Networks give decent implicit model.
Prediction of secondary structure Principle: Structure affect amino acids distribution. Bad news: No good explicit model for determining secondary structure. Good news: Artificial Neural Networks give decent implicit model. To determine sec. str. of residue i, look at window around i. R i 7 R i 6 R i 1 R i R i+1 R i+6 R i+7
Prediction trick Use homologs! 1. Collect very similar sequences 2. Build profile 3. Use a predictor for profiles Good effect in sec. str. prediction General trick for various predictions problems.
Prediction trick Use homologs! 1. Collect very similar sequences 2. Build profile 3. Use a predictor for profiles Good effect in sec. str. prediction General trick for various predictions problems. One sequence vs A profile Pos 17 has a C Pos 17 is always a C Pos 18 has a A Pos 18 is rarely an A
Prediction quality Predictor Accuracy PHD 70% PSIpred 77% Common problem:... EEEEHEEEE... Not an active research area today.
Transmembrane proteins 20-30% of proteins in any organism are TM.
Transmembrane proteins 20-30% of proteins in any organism are TM. 70% of drug targets are TM proteins (Pestourie et al, 2006)
Transmembrane proteins 20-30% of proteins in any organism are TM. 70% of drug targets are TM proteins (Pestourie et al, 2006) Bad news: Hard to determine structure for TM-proteins. Less than 1% of PDB contains TM structures.
Transmembrane proteins 20-30% of proteins in any organism are TM. 70% of drug targets are TM proteins (Pestourie et al, 2006) Bad news: Hard to determine structure for TM-proteins. Less than 1% of PDB contains TM structures. Good news: Regular and clear structure, perfect for HMMs!
Classic structure: rhodopsin Sensory rhodopsin (1gue) embedded in the membrane and transducing beneath.
Intro Secondary structure Transmembrane proteins Function Modern view Image from Kauko-Illergård-Elofsson, 2008 End
Beta barrel structure Not studied in this course Image created by Opabinia Regalis.
Goals Classify proteins: TM or not? Determine TM regions Determine TM topology
Properties of TM proteins Transmembrane helices are hydrophobic TM regions are 15-30 aa Loops on cytoplasmic side are positive: positive inside rule (Gunnar von Heijne)
First attempt: TopPred Identify the hydrophobic regions in PSN1_HUMAN. Look at window of 21 aa.
TMHMM: Predictor using an HMM
TMHMM: Predictor using an HMM
Prediction quality Good quality Generally correct when 3 TM regions Common problems: Lose a TM region Flip in-out topology Problem discerning signal peptides
Signal peptides Short (15-30 aa?) peptide addressing protein to organelles 16% of human proteome have a SP Some SP cleaved from its host protein One hydrophobic TM-segment, 7-15 aa
Signal peptides Short (15-30 aa?) peptide addressing protein to organelles 16% of human proteome have a SP Some SP cleaved from its host protein One hydrophobic TM-segment, 7-15 aa Special predictor for SP: SignalP Common problem for TM predictors
Phobius: including signal peptides
Phobius: including signal peptides Käll, Krogh, Sonnhammer, 2004
TM prediction example
Function prediction Why predict structure? Real goal (?): Function
Function prediction Why predict structure? Real goal (?): Function Problem 1: What is function? Problem 2: What data do you need? Is protein sequence enough?
What is gene/protein function? Chemical reactions? Interactions? Pathway activity? Cell localization? Activity details?
Enzyme Commission number From 1961! Hierarchical classification of enzymes Specifies reactions
Enzyme Commission number From 1961! Hierarchical classification of enzymes Specifies reactions Example from Wikipedia: EC 3 enzymes are hydrolases EC 3.4 are hydrolases that act on peptide bonds EC 3.4.11 are those hydrolases that cleave off the amino-terminal amino acid from a polypeptide EC 3.4.11.4 are those that cleave off the amino-terminal end from a tripeptide Too limited for Bioinformatics
Gene Ontology Controlled vocabulary for function annotation Non-hierarchical is a and part of relationships between terms
GO example Baldock and Burger, Genome Biology 2005
GO applications Facilitates enrichment studies We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses, whereas growth-related genes show selection against such changes. (Wapinski et al, Nature 2007)
Predicting function? Given a gene/protein, can we predict a GO term? Approach: Expert systems Collect homologs Collect orthologs Domain and motif analysis Study other features Study network connections Examples: ProtFun (http://www.cbs.dtu.dk/services/protfun/) FunCoup (http://funcoup.sbc.su.se/)
Predict localization Modest goal! Is the target... mitochondria? peroxisome? endoplasmic reticulum? golgi?
Predict localization Modest goal! Is the target... mitochondria? peroxisome? endoplasmic reticulum? golgi? Study signal peptide Olof Emanuelsson: TargetP (http://www.cbs.dtu.dk/services/targetp/)
Next: Computational genomics Introduction Genome sequencing and assembly EST analysis