Supplementary Methods Protein expression and purification Full-length GlpG sequence was generated by PCR from E. coli genomic DNA (with two sequence variations, D51E/L52V, from the gene bank entry aac28166), and cloned into pet-41b(+) in frame with a C-terminal octahistidine tag (Novagen, Inc.). The recombinant protein was expressed in E. coli. BL21(DE3) cells by induction at an O.D. 600nm value of 0.6 with 0.2mM isopropyl-β-dthiogalactopyranoside (IPTG). After overnight growth at 22 o C, cells were harvested, homogenized and lysed by French Press in a 50mM phosphate buffer (ph 7.0) with 300mM NaCl, 0.1mM phenylmethylsulphonyl fluoride (PMSF), 2µg/ml pepstatin and EDTA-free protease inhibitor cocktail (Roche, Inc.). The membrane protein was extracted, by directly adding, 0.1g per gram of cells, solid decylmaltoside (DM) (Anatrace, Inc.) to the lysate, for three hours at room temperature. Unlysed cells and cell debris were discarded by centrifugation. The resulting supernatant was applied onto a Talon Co(2+) affinity column (Clontech, Inc.), and eluted with 5mM DM, 50mM phosphate buffer (ph 7.0), 300mM NaCl and 200mM imidazole. The yield of the octahistidine-tagged protein was approximately 1mg per gram of cells, as determined by the Bradford method. The purified protein was incubated with α-chymotrypsin (Sigma, Inc.), 0.5 unit enzyme per mg membrane protein, at room temperature for two days, which removed the octahistidine tag and the N-terminal soluble domain of GlpG (Supplementary Fig. 2a). N-terminal sequencing indicated that the resulting core 1
domain started from Ala-87 or Arg-90. The truncated protein was further purified on a Superdex G-200 column (Amersham Biosciences, Inc.) in 5mM DM, 50mM Tris-HCl (ph 7.6) and 150mM NaCl (Supplementary Fig. 2b). Seleno-methionine substituted protein was purified similarly, with the exception that 5mM β- mercaptoethanol was included in all solutions. Proteolytic activity of GlpG Detergent-solubilized GlpG is capable of cleaving dye-labeled casein, causing an increase of fluorescence intensity 21. The mechanism of this reaction, and whether or not it truly mimics that of intramembrane proteolysis in vivo, are not understood at this time. Despite these limitations, we have used this assay to examine the proteolytic activity of GlpG core domain in the detergent used for crystallization. To measure the overall increase of fluorescence intensity (Supplementary Fig. 2c), 5µg BODIPY FL casein (Invitrogen, Inc.) was mixed at 37 C with 1.6µM enzyme in a 0.2ml assay buffer containing 50mM Tris-HCl (ph 8.1), 16mM NG, 7.5% glycerol, 10mM β-mercaptoethanol and 100mM KCl. Fluorescence emission at 513nm was measured at 37 C with an excitation wavelength of 503nm using a Tecan Safire multi-detection microplate reader. To directly demonstrate that the increase of fluorescence intensity was due to proteolysis of the dye-labeled substrate (Supplementary Fig. 2d), the reaction mixture was resolved by SDS-PAGE and visualized on a BioImaging Systems (UVP, Inc.). Both methods showed that the core domain was less active than the full-length protein, which raised the possibility that removing the N-terminal 2
soluble domain could have slightly disturbed the structure, or modified the property, of the transmembrane core region. Crystallographic analysis. GlpG crystallized in space group R32. Protein occupied ~40% of the crystal. The electron density map based on experimental phases obtained from multiwavelength anomalous dispersion (MAD) clearly defined the conformation for most of the polypeptide (Supplementary Fig. 3a). We have refined the model to an R free value of 25%. The final model included residues 91 to 272 of GlpG, 12 detergent and 31 water molecules (Supplementary Fig. 4). Since 86 N- terminal residues of GlpG were removed by chymotrypsin, only 4 N-terminal residues of the core domain were disordered in the crystal. The exact chymotrypsin cutting site near the C-terminus was not known. However, since clear electron density could be seen up to Ala-272, only 4 C-terminal residues of GlpG were missing from the model. The electron density for the protein main chain contained no breaks. The side chain of a surface glutamine (Gln-220) was not visible, probably disordered. Only one residue, Ser-248, appeared in the disallowed region of the Ramachandran plot. This residue is in the cap (L5) region, between two methionines that cover the catalytic diad, which could have a strained conformation in the closed state of the enzyme. 3
Supplementary Table S1 Data collection statistics. Data collection Native (X6A) Se-Met (X29) Inflection Peak Remote Space group R32 Cell dimensions (Å) a=110.8 a=110.8 a=110.8 a=110.9 c=127.6 c=127.9 c=128.1 c=128.1 Wavelength (Å) 1.0000 0.9794 0.9792 0.9500 a Resolution (Å) 40.0-2.1 20.0-2.8 20.0-2.8 20.0-3.2 (2.18-2.10) (2.90-2.80) (2.90-2.80) (3.31-3.20) Observed reflections 195,922 96,094 238,653 135,559 Unique reflections 17,817 7,628 7,543 5,069 Redundancy 11.0 12.6 31.6 26.7 a Completeness (%) 99.9 (100.0) 100.0 (100.0) 99.1 (94.2) 99.9 (99.0) a <I/σ> 11.6 (3.5) 10.8 (2.7) 10.2 (2.9) 8.6 (3.1) a,b R merge 0.073 (0.414) 0.122 (0.295) 0.128 (0.385) 0.134 (0.424) a Highest resolution shell is shown in parentheses. b R merge = I i - <I> / I i 4
Supplementary Figure Legends Figure S1 Structure-based sequence alignment of selected rhomboid proteases. These include E. coli GlpG (EcGlpG), P. stuartii AarA (PsAarA), D. melanogaster rhomboid-1 (DmRMB1), H. sapiens RHBDL2 (HsRMB2), H. sapiens presenilinassociated rhomboid-like (HsPARL), S. cerevisiae Pcp1/Rbd1p (ScPcp1), T. gondii rhomboid-5 (TgROM5), A. thaliana rhomboid-2 (AtRBL2), P. horikoshii rhomboid-like (PhRMBL). Residue numbers above the sequence correspond to those of GlpG; residue numbers for each species are given at the beginning of the line. Colored bars above the sequence indicate the transmembrane helices S1-S6 (same color scheme as that in Fig. 2a). The short helices on L1 are also indicated (a1-a5). The conserved residues of the active site are highlighted in red; residues that function in gating highlighted in pink; residues that mediates strong intramolecular helix-helix association highlighted in green. TgROM5 and AtRBL2 have longer L1 loops than the rest: 100 residues in TgROM5 at (*) and 40 residues in AtRBL2 at (**) are omitted in the alignment. Figure S2 Protein purification and activity. a, Chymotrypsin cut. (Left) SDS- PAGE gel stained by Coomassie. (Right) Western blot by a monoclonal antibody against polyhistidine tag. The sample (-) without chymotrypsin treatment contained a tiny amount of N, only obvious by Western, that was probably derived from the full-length protein (missing the N-terminal region, but containing the core and C-terminal histidine tag). Chymotrypsin trimmed the full-length 5
protein first to N, then to the core where the histidine tag was also removed. Chymotrypsin did not generate any nicks in the core domain used for crystallization. b, Elution profile of the core domain from a size-exclusion column. The elution volume (13.6ml) corresponds to a molecular weight of 81kDa. c, The proteolytic activity assay. Sixty data points were collected over a period of 2 hours, and measurements were performed in triplicate with multiple readings averaged for each well. Both full-length GlpG (red) and the core domain (green) were active in NG, the detergent used for crystallization. d, The increase of fluorescence was due to proteolysis of the dye-labeled casein (S, substrate), which generated smaller and highly fluorescent peptide fragments (P, products). In this experiment, 5µg BODIPY FL casein was incubated with different amounts of enzyme in a 50µl assay buffer at 37 C for 1 hour. The reaction mixtures were resolved by a 16% Tris-Tricine gel. Figure S3 The electron density maps. a, Experimental map calculated with F obs s and density-modified MAD phases, and contoured at 1.5σ level. The Cα trace of the final GlpG model is also shown (yellow). b, The final 2Fo-Fc map, contoured at 1.5σ level, at the active site of GlpG. The side chains of the four conserved residues, His-150, Asn-154, Ser-201 and His-254, from membrane spanning segments S2, S4 and S6 are shown. The red spheres represent nearby bound water molecules. 6
Figure S4 A stereo diagram of the Cα trace of GlpG core domain (the front view). The black dots mark every tenth residue. This illustration, as well as those in Fig. S5, are generated by MOLSCRIPT 47. Figure S5 GlpG trimer in the crystal. a, The front view of the trimer. Bound detergents are shown as ball-and-stick models. b, The top view of the trimer from the extracellular side. 7
8
9
10
11
12