Nature Structural and Molecular Biology: doi: /nsmb.2938

Similar documents
Presenter: She Zhang

Nitrogenase MoFe protein from Clostridium pasteurianum at 1.08 Å resolution: comparison with the Azotobacter vinelandii MoFe protein

Supplementary Figure 1 Crystal contacts in COP apo structure (PDB code 3S0R)

Supplementary Figure 1. Aligned sequences of yeast IDH1 (top) and IDH2 (bottom) with isocitrate

SUPPLEMENTARY INFORMATION

Supporting Online Material for

SUPPLEMENTARY INFORMATION. doi: /nature07461

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY FIGURES

Supplementary Figures

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

THE CRYSTAL STRUCTURE OF THE SGT1-SKP1 COMPLEX: THE LINK BETWEEN

SUPPLEMENTARY INFORMATION

Table S1. Overview of used PDZK1 constructs and their binding affinities to peptides. Related to figure 1.

Structure and evolution of the spliceosomal peptidyl-prolyl cistrans isomerase Cwc27

SUPPLEMENTARY INFORMATION

Diphthamide biosynthesis requires a radical iron-sulfur enzyme. Pennsylvania State University, University Park, Pennsylvania 16802, USA

SUPPLEMENTARY INFORMATION

Prediction and refinement of NMR structures from sparse experimental data

Orientational degeneracy in the presence of one alignment tensor.

SUPPLEMENTARY INFORMATION

Computational engineering of cellulase Cel9A-68 functional motions through mutations in its linker region. WT 1TF4 (crystal) -90 ERRAT PROVE VERIFY3D

Supporting Information

SUPPLEMENTARY INFORMATION

Supplemental Data SUPPLEMENTAL FIGURES

Supplementary Figure 1 Crystal packing of ClR and electron density maps. Crystal packing of type A crystal (a) and type B crystal (b).

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Basics of protein structure

Protein Structure Prediction, Engineering & Design CHEM 430

Performing a Pharmacophore Search using CSD-CrossMiner

SUPPLEMENTARY INFORMATION

Identifying Interaction Hot Spots with SuperStar

Table 1. Crystallographic data collection, phasing and refinement statistics. Native Hg soaked Mn soaked 1 Mn soaked 2

of the Guanine Nucleotide Exchange Factor FARP2

SUPPLEMENTARY INFORMATION

Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1. Definition and assessment of ciap1 constructs.

SUPPLEMENTARY FIGURES. Figure S1

Structure, mechanism and ensemble formation of the Alkylhydroperoxide Reductase subunits. AhpC and AhpF from Escherichia coli

Acta Crystallographica Section D

Introduction to" Protein Structure

Bioengineering & Bioinformatics Summer Institute, Dept. Computational Biology, University of Pittsburgh, PGH, PA

Supporting Information

SUPPLEMENTARY INFORMATION

Supporting Information How does Darunavir prevent HIV-1 protease dimerization?

Lecture 11: Protein Folding & Stability

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Supporting Information

Comparing crystal structure of M.HhaI with and without DNA1, 2 (PDBID:1hmy and PDBID:2hmy),

Protein Structure. Hierarchy of Protein Structure. Tertiary structure. independently stable structural unit. includes disulfide bonds

Cryo-EM data collection, refinement and validation statistics

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Supporting Information

Structural characterization of NiV N 0 P in solution and in crystal.

4 Proteins: Structure, Function, Folding W. H. Freeman and Company

Viewing and Analyzing Proteins, Ligands and their Complexes 2

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Modeling for 3D structure prediction

Nature Structural and Molecular Biology: doi: /nsmb Supplementary Figure 1. Experimental approach for enhancement of unbiased Fo Fc maps.

Bio nformatics. Lecture 23. Saad Mneimneh

Lipid Regulated Intramolecular Conformational Dynamics of SNARE-Protein Ykt6

Detailed description of overall and active site architecture of PPDC- 3dThDP, PPDC-2HE3dThDP, PPDC-3dThDP-PPA and PPDC- 3dThDP-POVA

Supplementary Information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

ml. ph 7.5 ph 6.5 ph 5.5 ph 4.5. β 2 AR-Gs complex + GDP β 2 AR-Gs complex + GTPγS

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

SUPPLEMENTARY INFORMATION

Structural basis for catalytically restrictive dynamics of a high-energy enzyme state

Supplementary Figure 3 a. Structural comparison between the two determined structures for the IL 23:MA12 complex. The overall RMSD between the two

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Plasmid Relevant features Source. W18N_D20N and TrXE-W18N_D20N-anti

Esser et al. Crystal Structures of R. sphaeroides bc 1

Crystal Structure of Fibroblast Growth Factor 9 (FGF9) Reveals Regions. Implicated in Dimerization and Autoinhibition

Supplementary figure 1. Comparison of unbound ogm-csf and ogm-csf as captured in the GIF:GM-CSF complex. Alignment of two copies of unbound ovine

Sunhats for plants. How plants detect dangerous ultraviolet rays

Nature Structural & Molecular Biology: doi: /nsmb Supplementary Figure 1

Sensitive NMR Approach for Determining the Binding Mode of Tightly Binding Ligand Molecules to Protein Targets

Homologous proteins have similar structures and structural superposition means to rotate and translate the structures so that corresponding atoms are

Procheck output. Bond angles (Procheck) Structure verification and validation Bond lengths (Procheck) Introduction to Bioinformatics.

Secondary and sidechain structures

Week 10: Homology Modelling (II) - HHpred

Supplementary Figure S1. Urea-mediated buffering mechanism of H. pylori. Gastric urea is funneled to a cytoplasmic urease that is presumably attached

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Packing of Secondary Structures

Cks1 CDK1 CDK1 CDK1 CKS1. are ice- lobe. conserved. conserved

Secondary Structure. Bioch/BIMS 503 Lecture 2. Structure and Function of Proteins. Further Reading. Φ, Ψ angles alone determine protein structure

Homology modeling of Ferredoxin-nitrite reductase from Arabidopsis thaliana

SI Text S1 Solution Scattering Data Collection and Analysis. SI references

Supplementary Figure 1

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Transcription:

Supplementary Figure 1 Characterization of designed leucine-rich-repeat proteins. (a) Water-mediate hydrogen-bond network is frequently visible in the convex region of LRR crystal structures. Examples are shown for the idealized L24 (DLRR_B) and L24 L28 fusion structure (DLRR_G3). Water molecules participating in the hydrogen bond (yellow dots) network are represented by spheres. (b) Super-helical shapes of the three idealized building block repeats. For clear visualization, dots tracing the global super-helix defined by the fitted parameters are overlaid with the LRR structures (rotation angle < 720 ). The highly conserved leucine residues used for the parameter fitting are represented by spheres. See Supplementary Table 1 for the helical parameter estimation. (c) Structural alignments of the partial Ncap-L24 5 structure in DLRR_B (top) and L22 5 structure in DLRR_A (bottom) into the crystal structure of DLRR_E. Cα r.m.s. deviations for the alignments of DLRR_B and DLRR_A are 0.4 Å and 0.3 Å, respectively. (d) Structural defects in the initial fusion model of DLRR_G3. The crystal structure (magenta) of the junction module in DLRR_G3 is aligned with the initial model structure before design (gray) and the final model structure after design (green). The initial model contains large cavity and side chain clashes in the junction module, which are improved in the subsequent design procedure as shown in the final model structure (green). (e) SEC-MALS experiments for DLRR_D, DLRR_E, DLRR_I, DLRR_J, DLRR_K, and DLRR_L. Most of designs are monomeric even though some soluble aggregates/oligomers are observed in DLRR_I and DLRR_K.

Supplementary Figure 2 Experimental characterization of six L22 L28 designs (DLRR_F). In the top row, structure alignment (left) and sequence alignment (right) of the six junction module designs are represented. The building block sequences (L22 + L28) are shown in the first row of the sequence alignment for comparison. Far-UV CD spectra, thermal denaturation at 218 nm, and SEC-MALS are shown from left to right for each design.

Supplementary Figure 3 Experimental characterization of six L24 L28 designs (DLRR_G). In the top row, structure alignment (left) and sequence alignment (right) of the six junction module designs are represented. The building block sequences (L24 + L28) are shown in the first row of the sequence alignment for comparison. Far-UV CD spectra, thermal denaturation at 218 nm, and SEC-MALS are shown from left to right for each design. DLRR_G6 has one less {L28 L29} module than the others. The crystal structure of DLRR_G3 is shown in Figure 3d.

Supplementary Figure 4 Experimental characterization of four L24 L32 L24 designs (DLRR_H). In the top row, structural alignment of the four wedge module designs is represented with the structure. Sequence alignment of the four wedge module designs is shown with the building block and the native L32 module sequence (L24 + L32 + L24) in the first row of the alignment for comparison. Far-UV CD spectra, thermal denaturation at 218 nm, and SEC-MALS are shown from left to right for each design. Design DLRR_I has two identical L32 modules derived from DLRR_H1 (Supplementary Table 2). In SEC-MALS experiments, some soluble aggregates/oligomers are observed in addition to the monomeric status. The crystal structure of DLRR_H2 is shown in Figure 3e.

Supplementary Figure 5

Characterization of designed junction modules. (a) Sequence alignments between the designed junction modules and the top 3 naturally occurring sequences (square block) found in BLAST 1 search for the non-redundant (NR) database. There are numerous sequence differences between the designed modules and the closest sequence in NR. Indeed, BLAST fails to find full length alignments for most of the junction sequences. (b) Comparison of structures of designed and naturally occurring junctions between LRR modules. Left: designed junction modules, Middle: the closest structural matches found in the PDB using TMalign 2, Right: structural alignment. The TMalign searches were carried out with the twounit junction module structures (green) and one or two module structures next to the junction module are shown for both designed and natural structures (yellow) to make the ideality (lack of ideality) of the different structures clearer. Most junctions between different length LRR modules in the native structures occur near the caps where the structure becomes much less regular. This irregularity, evident in the right side of the images from native structures, makes it not possible to generate novel LRR s with controlled curvature by combining multiple different types of modules simply using junctions already existing in the PDB. (c) Structural comparison between crystal structures and model structures generated by the iterative module assembly protocol described in Method. All model structures show high consistency to the crystal structures (r.m.s. deviation g in Table 2). (d) Native LRR proteins, internalin A (InlA, PDB ID: 1O6S, top left) and ribonuclease inhibitor (RI, PDB ID: 1A4Y, bottom left), achieve high affinity and specificity by having shapes closely conforming to the surfaces of the target proteins (human E-cadherin and ribonuclease A, respectively). Each protein has a curvature optimized to its target, resulting in well-packed complementary protein-protein interfaces with hot-spot clusters (shown by red sticks) at both the N and C termini. In contract, swapping the respective target for each of the LRR proteins (i.e. RI:E-cadherin, orange-cyan complex in the top right and InlA:ribonuclease, green-yellow complex in the bottom right) makes the clashes and large gaps in the binding interface.

Supplementary Table 1 Super-helical parameters of building block modules LRR type Rise (Å) Radius (Å) Rotation angle (radian) Number of repeat units used for fitting Fitted RMSD (Å) L22 2.34 18.67 0.24 8 0.09 L24 1.41 24.62 0.20 9 0.13 {L28 L29} 0.82 16.52 0.31 10 0.17 The L22, L24 and {L28 L29} repeats form unique solenoid shapes which can be described by three super-helical parameters (radius: distance to the helical axis, rise: projected distance along the helical axis between adjacent units, and rotation angle: rotation angle about the helical axis between units). The global helical shapes and parameters are estimated by fitting the three parameters to the repeat protein structures. For the parameter fitting, one of the highly conserved positions, the second Leu in LxxLxLxxN/C motif, is used as a representative for each repeat module. The Cα coordinates of the representative positions are obtained from the crystal structures of DLRR_A (L22) and DLRR_B (L24), and from the model structure of DLRR_C ({L28 L29}). Eight to ten Cα coordinates are used to fit the same number of coordinates arbitrary generated from the three helical parameters. RMSD between the two coordinate sets is minimized by using non-linear optimization algorithm (constroptim.nl) in alabama R package 3,4. Initial helical parameters, the input of the optimization procedure, are inferred from the transformation matrix between the first two modules of the building block structures. After performing the optimization procedure, the parameter of the lowest RMSD is used to represent the global helical shape of the idealized building block structures (Supplementary Fig. 1b).

Supplementary Table 2 Module organization and module origins of the multiple fusion designs in Figure 4c. Design name Module organization Individual modules Original design Ncap L24 2 DLRR_B DLRR_I Ncap L24 2 JN L24 L32 L24 JN L24 L32 L24 L24 2 DLRR_J Ncap L22 4 L24 2 JN L24 L28 L29 [L28 L29] 2 JN L24 L32 L24 JN L24 L32 L24 L24 2 Ncap L22 4 L24 2 JN L24 L28 L29 [L28 L29] 2 Ncap L24 2 JN L24 L32 L24 DLRR_H1 DLRR_H1 DLRR_B DLRR_A DLRR_B DLRR_G3 DLRR_G3 DLRR_G3 DLRR_B DLRR_H2 DLRR_K Ncap L24 2 JN L24 L32 L24 L24 3 JN L24 L28 L29 [L28 L29] 2 L24 3 JN L24 L28 DLRR_B DLRR_G6 L29 DLRR_G6 [L28 L29] 2 DLRR_G6 Ncap L22 3 DLRR_A L24 3 DLRR_B DLRR_L Ncap L22 3 L24 3 JN L24 L32 L24 L24 3 JN L24 L28 L29 [L28 L29] 2 JN L24 L32 L24 L24 3 DLRR_H2 DLRR_B JN L24 L28 DLRR_G6 L29 DLRR_G6 [L28 L29] 2 DLRR_G6

Supplementary Table 3 Number of possible fusion LRR structures with respect to the number of repeat units. Number of repeat units Number of possible LRR structures Fold change (i) (i+1) 5 64 6 145 2.266 7 327 2.255 8 736 2.251 9 1,655 2.249 10 3,720 1.976 11 8,360 2.247 12 18,786 2.247 13 42,213 2.247 14 94,853 2.247 15 213,134 2.247 16 478,909 2.247 17 1,076,100 2.247 18 2,417,996 2.247 19 5,433,237 2.247 LRR structures are generated by recursively following the edges of the network in Figure 4a. The general module assembly starts from Ncap-L22 or Ncap-L24 in the network except {L28 L29} n and each assembly (transition in the network) adds one repeat unit to the structure. The number of repeat units in the table only considers the internal repeat units excluding N-terminal capping domain.

Supplementary Table 4 Crystallization conditions Design names DLRR_A Crystallization conditions 22% PEG3350 w/v, 0.1 M MES ph 6.0, 0.2 M NaCl DLRR_E 20% PEG 1000 v/v, 0.1 M Na/K phosphate ph 6.2 DLRR_G3 2 M ammonium sulfate, 0.1 M Bis-Tris Ph 5.5 DLRR_H2 DLRR_I DLRR_K 22% PEG 3350 w/v, 300 mm Ammonium sulfate, unbuffered 24% PEG 3350 w/v, 0.2 M ammonium sulfate, 0.1 M HEPES ph 7.5, 0.1 M proline 20% PEG-3000, 0.1 M Tris ph 7.0, 0.2 M Ca(OAc)2

Supplementary Table 5 Designed sequences > DLRR_A ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGITDISALKNLDNLETLDLRNNGITDKSALKNLNNLKgslehhhhhh >DLRR_B KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPQGVFERLTNLKTL NLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_C LDLSNQNKTKEDCREIARELKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEG AAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDL SNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_D ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLP QGVFERLTSLTTLNLSNNQLTSLPQGVFERLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_E KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLHTLDLSNNGITDISALKNLDNLHTLDL SNNGITDISALKNLDNLHTLDLSNNGITDISALKNLTSLHTLDLSNNGITDISALKNLDNLETLDLRNNGITDKSALKN LNNLKgslehhhhhh >DLRR_F1

ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGIENFSAMSNLENLKTLNLSNNRVTKEACKAIAKALKRATSLHELHLSNNNIGE EGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETL DLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_F2 ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGIENFNALRNLENLKTLNLSNNRVTKDACEAIAEALKRATSLHELHLSNNNIGE EGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETL DLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_F3 ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGIENFEAMRNLENLKTLNLSNNRLTKEACKAVAEALKRATSLHELHLSNNNIG EEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLET LDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_F4 ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGITNVSALKNLENLKTLNLSNNNITKEACKAIAEALKRATSLHELHLSNNNIGEE GAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLD LSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQ ATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_F5 ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL

KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGIRNLEAMRNLENLKTLNLSNNNVTKEACSALAEALKRATSLHELHLSNNNIG EEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLET LDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_F6 ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLHTLDLSNNGIRNFEAMRNLENLKTLNLSNNNFTKEACSALAEALKRATSLHELHLSNNNIG EEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLET LDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_G1 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPDGVLERLTNLKTL NLSNNQITKEVCRHVAKILKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGA AELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshh hhhh >DLRR_G2 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPDGVFERLTNLKTL NLSNNQLTKEACRIVAKMLKQLASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGA AELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshh hhhh >DLRR_G3

KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPDGVFERLTNLKTL NLSNNQLTKEACRAVANALKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEG AAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgsh hhhhh >DLRR_G4 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPDGVLERLTNLKTL NLSNNQITKEVCRLVAKFLKQLASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEGA AELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshh hhhh >DLRR_G5 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPDGVFERLTNLKTL NLSNNQITKEVCRMVAKVLKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARAL KQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLHELHLSNNNIGEEG AAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGKAWLEEARRHPGSTLETgsh hhhhh >DLRR_G6 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNQLTSLPQGVFERLT NLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPKGVLERLTNLKTL NLSNNQITKEVCRHVAELLKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALK QATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIGEEGK AWLEEARRHPGSTLETgshhhhhh

>DLRR_H1 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNIANINDQMLEGLT NLTTLNLSHNNLARLWKHANPGGPIYFLKGLTNLTTLNLSSNGFDEIPREVFKDLTSLTTLNLSNNQLTSLPQGVFE RLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_H2 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNLANLNDKVFEGLT NLTTLNLSNNNLARLWKHANPGGPIYFLKGLTNLTTLNLSNNGFDEFPKEVFKDLTSLTTLNLSNNQLTSLPQGVF ERLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_H3 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNLANLNDRLLEGLT NLTTLNLSNNNLARLWKHANPGGPIYFLKGLTNLTTLNLSNNGFDEFPREVFKDLTSLTTLNLSNNQLTSLPQGVF ERLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_H4 KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNLANLNDRVFEGLT NLTTLNLSNNNLARLWKHANPGGPIYFLKGLTNLTTLNLSNNGFDELPKEVFKDLTSLTTLNLSNNQLTSLPQGVF ERLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_I KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNIANINDQMLEGLT NLTTLNLSHNNLARLWKHANPGGPIYFLKGLTNLTTLNLSSNGFDEIPREVFKDLTSLTTLNLSNNNIANINDQMLE GLTNLTTLNLSHNNLARLWKHANPGGPIYFLKGLTNLTTLNLSSNGFDEIPREVFKDLTSLTTLNLSNNQLTSLPQG VFERLTNLKTLNLSNNQLQSLPTGVDEKLTQLTgshhhhhh >DLRR_J ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL

KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLDNLHTLDLS NNGITDISALKNLTSLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLP QGVFERLTNLKTLNLSNNQLTKEACRAVANALKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCN LTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATTLH ELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_K KELTNLGWLNLSNNQLETLPQGVFEKLTNLTTLNLSNNQLTSLPQGVFERLASLTTLNLSNNNLANLNDRVFEGLT NLTTLNLSNNNLARLWKHANPGGPIYFLKGLTNLTTLNLSNNGFDELPKEVFKDLTSLTTLNLSNNQLTSLPQGVF ERLTNLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLTSLPKGVLERLTN LKTLNLSNNQITKEVCRHVAELLKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIA RALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATSLHELHLSNNNIG EEGKAWLEEARRHPGSTLETgshhhhhh >DLRR_L ETITVSTPIKQIFPDDAFAETIKANLKKKSVTDAVTQNELNSIDQIIANNSDIKSVQGIQYLPNLKTLKLSNNKITDISAL KQLNNLGWLDLSNNGITDISALKNLASLHTLDLSNNGITDISALKNLDNLHTLDLSNNGITDISALKNLTSLTTLNLSN NQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNNLA NLNDRVFEGLTNLTTLNLSNNNLARLWKHANPGGPIYFLKGLTNLTTLNLSNNGFDELPKEVFKDLTSLTTLNLSN NQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTNLTTLNLSNNQLTSLPQGVFERLTSLTTLNLSNNQLT SLPKGVLERLTNLKTLNLSNNQITKEVCRHVAELLKQAASLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSN CNLTKEACREIARALKQATTLHELHLSNNNIGEEGAAELVEALLHPGSTLETLDLSNCNLTKEACREIARALKQATS LHELHLSNNNIGEEGKAWLEEARRHPGSTLETgshhhhhh *C-terminal linkers and 6x His tags are shown in lower case. *AS or TS in the regular repeat sequences are for inserting the restriction sites (NheI and SpeI).

Supplementary References 1. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-402 (1997). 2. Y. Zhang, J. Skolnick, TM-align: A protein structure alignment algorithm based on TM-score, Nucleic Acids Research 33, 2302-09 (2005) 3. R Core Team R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.r-project.org/ (2012) 4. Ravi Varadhan, alabama: Constrained nonlinear optimization. R package version 2011.9-1. http://cran.r-project.org/package=alabama (2012)