Supporting online material

Similar documents
Tellurite resistance protein/ethidium efflux transporter/ proflavin transporter. Putative inner membrane protein: function unknown

Reliability Measures for Membrane Protein Topology Prediction Algorithms

Public Database 의이용 (1) - SignalP (version 4.1)

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

TMHMM2.0 User's guide

Today. Last time. Secondary structure Transmembrane proteins. Domains Hidden Markov Models. Structure prediction. Secondary structure

Full-length GlpG sequence was generated by PCR from E. coli genomic DNA. (with two sequence variations, D51E/L52V, from the gene bank entry aac28166),

Intro Secondary structure Transmembrane proteins Function End. Last time. Domains Hidden Markov Models

SUPPLEMENTARY INFORMATION

Genome Annotation Project Presentation

A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm

Structure of the SPRY domain of human DDX1 helicase, a putative interaction platform within a DEAD-box protein

Dynamic optimisation identifies optimal programs for pathway regulation in prokaryotes. - Supplementary Information -

Scale in the biological world

7.06 Cell Biology EXAM #3 April 21, 2005

An Introduction to Sequence Similarity ( Homology ) Searching

Sequence analysis and comparison

Supporting Information

Markov Models & DNA Sequence Evolution

lac permease of Escherichia coli: Topology and sequence elements promoting membrane insertion

Improved membrane protein topology prediction by domain assignments

Direct detection of antibodies in blood plasma using bioluminescent

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

Structure Prediction of Membrane Proteins. Introduction. Secondary Structure Prediction and Transmembrane Segments Topology Prediction

Evidence for cyclic-di-gmp-mediated signaling pathway in Bacillus subtilis by Chen Y. et al.

High-resolution crystal structure of ERAP1 with bound phosphinic transition-state analogue inhibitor

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Enhanced membrane protein topology prediction using a hierarchical classification method and a new scoring function

SUPPLEMENTARY MATERIALS

A transcription activator-like effector induction system mediated by proteolysis

Supramolecular stabilization of the acid tolerant L-arabinose isomerase from the food-grade Lactobacillus sakei

A Machine Text-Inspired Machine Learning Approach for Identification of Transmembrane Helix Boundaries

EST1 Homology Domain. 100 aa. hest1a / SMG6 PIN TPR TPR. Est1-like DBD? hest1b / SMG5. TPR-like TPR. a helical. hest1c / SMG7.

Chapter 12: Intracellular sorting

Production of Recombinant Annexin V from plasmid pet12a-papi

Supporting Online Material for

Table S1. Overview of used PDZK1 constructs and their binding affinities to peptides. Related to figure 1.

Optimization of the heme biosynthesis pathway for the production of. 5-aminolevulinic acid in Escherichia coli

Green uorescent protein as an indicator to monitor membrane protein overexpression in Escherichia coli

Supporting Information

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

1. Statement of the Problem

Prediction of signal peptides and signal anchors by a hidden Markov model

Lipid transfer proteins confer resistance to trichothecenes

Resonance Assignment of the RGS Domain of Human RGS10

Introduction to protein alignments

Illegitimate translation causes unexpected gene expression from on-target out-of-frame alleles

Supporting Information

Topology Prediction of Helical Transmembrane Proteins: How Far Have We Reached?

SUPPLEMENTARY INFORMATION

chapter 5 the mammalian cell entry 1 (mce1) operon of Mycobacterium Ieprae and Mycobacterium tuberculosis

A protein oxidase catalysing disulfide bond formation is localized to the chloroplast thylakoids

A Gene (sleb) Encoding a Spore Cortex-Lytic Enzyme from Bacillus subtilis and Response of the Enzyme to

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Transmembrane Domains (TMDs) of ABC transporters

It s really this simple.

7.06 Cell Biology EXAM #3 KEY

Helical Macrofiber Formation in Bacillus subtilis: Inhibition by Penicillin G

A hidden Markov model for predicting transmembrane helices in protein sequences

Analysis of Escherichia coli amino acid transporters

Supporting Online Material. On-Chip Dielectrophoretic Co-Assembly of Live Cells and. Particles into Responsive Biomaterials

Cell-free and in vivo characterization of Lux, Las, and Rpa quorum activation systems in E. coli Supporting Information

09/07/16 12/07/16: 14/07/16:

GFP-based pipelines for the overexpression and purification of membrane proteins. David Drew

Supporting information for

MOLECULAR CELL BIOLOGY

Supplementary materials Quantitative assessment of ribosome drop-off in E. coli

ydci GTC TGT TTG AAC GCG GGC GAC TGG GCG CGC AAT TAA CGG TGT GTA GGC TGG AGC TGC TTC

BA, BSc, and MSc Degree Examinations

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

SUPPLEMENTARY INFORMATION

Arginase Assay Kit. Catalog Number KA assays Version: 05. Intended for research use only.

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Statistical Mechanics of Integral Membrane Protein Assembly

Computational Genomics and Molecular Biology, Fall

Encarna Pucheta-Martinez+, Nicola D Amelio+, Moreno Lelli, Jorge L. Martinez-Torrecuadrada, Marius Sudol, Giorgio Saladino* and Francesco L.

Topology of RbsC, the Membrane Component of the Escherichia coli Ribose Transporter

In-Depth Assessment of Local Sequence Alignment

Quantification of Protein Half-Lives in the Budding Yeast Proteome

Building a Homology Model of the Transmembrane Domain of the Human Glycine α-1 Receptor

Mini-Tn7 Derivative Construction and Characterization. Mini-Tn7 derivatives for

7.06 Problem Set

Supplementary materials. Crystal structure of the carboxyltransferase domain. of acetyl coenzyme A carboxylase. Department of Biological Sciences

Neurobiology Biomed 509 Ion Channels

SUPPLEMENTARY INFORMATION

Optimization of Immunoblot Protocol for Use with a Yeast Strain Containing the CDC7 Gene Tagged with myc

Motif Prediction in Amino Acid Interaction Networks

Eppendorf Plate Deepwell 96 and 384: RecoverMax

Supporting Information for. Initial Biochemical and Functional Evaluation of Murine Calprotectin Reveals Ca(II)-

Lecture 4: Transcription networks basic concepts

Supplementary Materials: Localization and Spectroscopic Analysis of the Cu(I) Binding Site in Wheat Metallothionein Ec-1

Mass Spectrometry and Proteomics - Lecture 5 - Matthias Trost Newcastle University

RUBIC Buffer Screen For stable, happy proteins From purification all the way through to characterization by NMR, SAXS or Crystallography.

T H E J O U R N A L O F C E L L B I O L O G Y

A SURPRISING CLARIFICATION OF THE MECHANISM OF ION-CHANNEL VOLTAGE- GATING

Transcription:

Supporting online material Materials and Methods Target proteins All predicted ORFs in the E. coli genome (1) were downloaded from the Colibri data base (2) (http://genolist.pasteur.fr/colibri/). 737 proteins longer than 100 residues and with 2 or more transmembrane helices predicted by TMHMM (3) (v. 2.0) were retained. 23 of the corresponding genes contained restriction sites that prevented cloning into either of the three standard phoa or two gfp vectors (4) used, reducing the final collection of target proteins to 714. Cloning ORFs encoding the selected membrane proteins were amplified by PCR from the E. coli strain MG1655 (1). Three different combinations of primer-introduced restriction sites and correspondingly digested vectors were used: (i) 5 XhoI / 3 KpnI, (ii) 5 XhoI / 3 BamHI, (iii) 5 NdeI / 3 BamHI. Cloning was performed in E. coli strains MC1061 or TOP10F. The three phoa fusion vectors contain the gene of interest, followed by a short linker sequence and the region coding for the phoa gene. The gfp fusion vectors contain the gene of interest, followed by a linker sequence encoding a TEV protease site, the gfp gene (S65T, F64L + Cycle 3 mutant) and a His 8 tag at the 3 end. All genes are preceded by the same ribosome binding site and the start codon is always ATG. The vectors and enzymes used are described in detail elsewhere (4). All constructs were confirmed by sequencing from both the 5 and 3 ends. Protein expression and experimental determination of C-terminal locations PhoA and GFP assays were repeated at least 3, but in general 4 times. Constructs encoding PhoA fusions were transformed into the CC118 strain and assayed as described previously (4). Cell density and PhoA activity were measured using a SpectraMaxPlus384 (Molecular Devices, California). Constructs encoding GFP fusions were transformed into the BL21(DE3)pLysS strain and assayed as described (4, 5) with minor adaptations. Overnight cultures were back-diluted into 5 ml of LB media in 24 well growth plates. Cultures were grown at 37 C to an OD 600 of approximately 0.4-0.6, then induced with 0.4 mm IPTG, and grown for an additional 2 h. The cell pellet was resuspended in GFP resuspension buffer (50 mm Tris-HCl ph 8.0, 200 mm NaCl, 15 mm EDTA), incubated at room temperature for 2 h, and assayed for GFP fluorescence. Fluorescence was measured with an excitation wavelength of 485 nm, emission wavelength of 512 nm, and a 495 nm cutoff filter in a SpectraMaxGemini EM (Molecular Devices, California).

Normalization of obtained values was carried out to allow for a quantitative comparison between the GFP and PhoA measurements. The raw GFP activity value for each fusion was first divided by the cell density (OD 600 ). PhoA and GFP activities were then divided by the median activity of the active PhoA or GFP fusions (347 units for PhoA, 3924 arbitrary units for GFP). Cutoff values for C-terminal assignments were determined by the following procedure (see Fig. 1 in the main text): First, all pairs of C-terminally aligned homologues among the 573 proteins for which both PhoA and GFP clones were available were identified. Homologues were defined by proteins with a pair-wise BLAST E-value < 10-4 and for which the BLAST alignment reached to within 25 residues of the C-termini to ensure that no extra C-terminal TMHs are present in one or the other protein. To define the cutoffs, the two 45 o lines in Fig. 1 were moved symmetrically towards the main diagonal; for each location of the lines (defined by the intersections (a,0) and (0,a) with the x- and y-axis), all proteins located above the upper cutoff line were assigned as C in, and all proteins located below the lower cutoff line were assigned as C out. The a-value was reduced from its starting value a = 1.5 until a pair of C-terminally aligned homologues (as defined above) was found where one was assigned C out and the other C in ; this happened for a = 0.2 (the YdgQ-YdgL pair (6) and proteins in the SMR family were excluded), and the final cutoff value was set to a = 0.3. For proteins where only one fusion (PhoA or GFP) was available, the cutoff value was set to a = 0.75, as < 1% of the proteins for which both PhoA and GFP fusions were available would have been mis-assigned with this cutoff had only one of the two fusions been available, Fig. 1. Topology prediction Unconstrained topology predictions were done using TMHMM (3) (v. 2.0). Constrained topology predictions were done as described (7) by fixing the C-terminus of the protein to its experimentally determined location before running TMHMM (http://www.sbc.su.se/tmhmmfix/). S3 reliability scores were calculated as described (7). Functional assignments Proteins were grouped into one of nine functional categories (biogenesis (B), channel (C), transport/efflux (E) flagellar (F), lipid (L), bioenergetics/metabolism (M), signaling (S), transport/influx (T) or unknown (U)) depending on their known or predicted function. Initially, functional annotations were collected from the Colibri (2) (http://genolist.pasteur.fr/colibri/) and SwissProt (8) (http://www.expasy.org/sprot/) databases. Those proteins whose function was still unknown were then searched against the literature and were assigned to the functional categories based on published information.

References 1. F. R. Blattner et al., Science 277, 1453 (1997). 2. C. Medigue, A. Viari, A. Henaut, A. Danchin, Microbiol Rev 57, 623 (1993). 3. A. Krogh, B. Larsson, G. von Heijne, E. Sonnhammer, J Mol Biol 305, 567 (2001). 4. M. Rapp et al., Prot Sci 13, 937 (2004). 5. D. Drew et al., Proc Natl Acad Sci USA 99, 2690 (2002). 6. A. Sääf, M. Johansson, E. Wallin, G. von Heijne, Proc Natl Acad Sci USA 96, 8540 (1999). 7. K. Melén, A. Krogh, G. von Heijne, J Mol Biol 327, 735 (2003). 8. A. Bairoch, B. Boeckmann, Nucl Acids Res 19, 2247 (1991).

Figure S1. TMHMM topology prediction for YjfL before (top) and after (bottom) the C-terminus has been fixed to its experimentally determined location (www.sbc.su.se/tmhmmfix/). Probabilities for inside loop is in blue, for outside loop in pink, and for transmembrane helix in red. The overall reliability score (S3) is shown (K. Melén, A. Krogh, G. von Heijne, J Mol Biol 327, 735 (2003).).

Figure S2. Normalised PhoA and GFP activity data for dual topology candidates and pairs of homologous proteins with opposite topologies (connected by lines), c.f. Fig. 1 in the main text. The YdgQ-YdgL pair has been described earlier (A. Sääf, M. Johansson, E. Wallin, G. von Heijne (1999) Proc Natl Acad Sci USA 96, 8540).

Table S2. Sequence characteristics and their correlation coefficients (R) against overexpression levels (GFP/ml) and ΔOD 600. GFP/ml ΔOD 600 Sequence Length 0-0.15 Number of TM helices 0.09-0.26 Reliability score -0.06 0.13 Membrane content (residues in membrane/sequence length) 0.01-0.18 Length of N-tail (residues before first TM) 0.05 0.14 Length of N-tail/sequence length 0.09 0.21 Average hydrophobicity, GES scale (Engelman, D.M., Steitz,T.A. & Goldman, A. Annu. Rev. Biophys. Biophys. Chem. 1986 15: 321-353) 0.08 0.24 Minimum hydrophobicity (19 consecutive most hydrophilic residues) 0.15 0.21 Max hydrophobic region (41 residues) 0.12 0.16 Min hydrophobic region (41 residues) -0.03 0.11 Average Codon Usage (CU) (Calculated using GenBank release 140, Nucl Acids Res 2004 32:23-26) E. coli K12 CU table from http://www.kazusa.or.jp/codon/e.html -0.02-0.11 Min CU over 5 codons 0.01-0.02 Min CU over 10 codons 0.03 0.06 Average CU of first 40 codons -0.05-0.11 Average CU of first 20 codons -0.09-0.07 Positive residues in N-tail 0.06 0.13 Negative residues in N-tail 0.06 0.16 Length of longest inside loop (The longest loop between two TMs that is predicted to be on the inside) -0.1-0.11 Length of longest outside loop -0.2-0.14 Length of longest inside loop/sequence length -0.1-0.07 Length of longest outside loop/sequence length -0.25-0.11 Average hydrophobicity of N-tail 0.08 0.01 Positive residues inside (number of positive amino acids predicted to be inside) 0.14 0.11 Negative residues inside 0.1 0.16 Pos residues inside/length 0.25 0.35 Neg residues inside/length 0.19 0.31 Positive residues outside -0.18-0.15 Negative residues outside -0.19-0.21 Pos residues outside/length -0.21-0.13 Neg residues outside/length -0.24-0.21 Pos residues in longest inside loop -0.1-0.13 Neg residues in longest inside loop -0.06-0.12 Pos residues in longest outside loop -0.17-0.09 Neg residues in longest outside loop -0.18-0.12