SUPPLEMENTARY INFORMATION - PDF Free Download

DOI: 10.1038/NCHEM.1821 Cloud-based simulations on Google Exacycle reveal ligand-modulation of GPCR activation pathways Kai J. Kohlhoff 1,4 *, Diwakar Shukla 1,2, Morgan Lawrenz 2, Gregory R. Bowman 2, David E. Konerding 4, Dan Belov 4, Russ B. Altman 1,3*, Vijay S. Pande 2* Affiliations: Departments of 1 Bioengineering, 2 Chemistry, and 3 Genetics, Stanford University, 450 Serra Mall, Stanford, CA 94305, USA. 4 Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA. *Correspondence to: kohlhoff@google.com, russ.altman@stanford.edu, pande@stanford.edu. Authors contributed equally to this work. NATURE CHEMISTRY www.nature.com/naturechemistry 1

Table of Contents Supplementary Methods...4 MD simulation settings...4 Markov state models of receptor dynamics....4 Transition path theory...5 Mutual information...6 Relative helix movement analysis...6 Small molecule docking to pathway states...6 Supplementary Discussion...7 Diverse pathways for GPCR (de)activation...7 Intramolecular waters...8 Ionic lock formation...8 Extracellular salt bridges...9 Closed extracellular gate conformation...10 Extracellular region transmembrane helix movement...10 Core region transmembrane helix movement...10 Intracellular region transmembrane helix movement...11 Ion density...11 Supplementary Figures...12 Supplementary Figure S1. Ligand modulated landscapes (H3-6 and inactive NPxxY rmsd)...12 Supplementary Figure S2. Ligand modulated landscapes (H3-6 and active NPxxY rmsd)...13 Supplementary Figure S3. Ligand modulated landscapes (H3-6 and inactive connector rmsd)...14 Supplementary Figure S4. Ligand modulated landscapes (H3-6 and active connector rmsd)...15 Supplementary Figure S5. Implied timescales...16 Supplementary Figure S6. Mean first passage times for (de)activation...17 Supplementary Figure S7. Metric histograms for agonist-bound 10 state MSM...17 Supplementary Figure S8. Metric histograms for inv. agonist- bound 10 state MSM...18 Supplementary Figure S9. Metric histograms for apo 10 state MSM...19 Supplementary Figure S10. TPT pathway fluxes...20 Supplementary Figure S11. Mutual information cross-correlation plots...19 Supplementary Figure S12. Mutual information cutoff values...20 Supplementary Figure S13. Autocorrelation functions for metrics...23 Supplementary Figure S14. Time-series of structural changes along MSM trajectories...24 Supplementary Figure S15. Histograms of ion lock residue distances...27 Supplementary Figure S16. Histograms of H5 bulge RMSD values...27 Supplementary Figure S17. Histograms of connector-interacting residue distances...28 Supplementary Figure S18. Histograms of NPxxY-interacting residue distances...29 Supplementary Figure S19. Histograms of extracellular region residue distances...30 Supplementary Figure S20. Projections of MSM built-in metrics along TPT pathways...31 Supplementary Figure S21 Projections of other metrics along TPT pathways...32 Supplementary Figure S22. AUCs for small molecule docking results....34 Supplementary Figure S23. Tally of agonist chemotypes found in pathways....35 Supplementary Figure S24. Tally of antagonist chemotypes found in pathways....36 Supplementary Figure S25. Dynamics of β2ar extracellular region....37 NATURE CHEMISTRY www.nature.com/naturechemistry 2

Supplementary Figure S26. Distributions of transmembrane helix positions...38 Supplementary Figure S27. Ion Densities....39 Supplementary Tables...40 Supplementary Table S1. Description of high mutual information residue pairs...40 Supplementary Table S2. Experimentally validated high mutual information residues...41 Supplementary Table S3. High mutual information residues not yet validated...42 Supplementary Table S4. Summary of p values for MSM AUC score comparisons...42 Supplementary Table S5. Salt bridge partners with extracellular K305...43 References...44 NATURE CHEMISTRY www.nature.com/naturechemistry 3

Supplementary Methods MD simulation settings Structures were embedded in a bilayer of POPC lipid molecules in a triclinic box with side lengths 10.0 nm 10.0 nm 8.5 nm. The system was solvated in TIP3P water molecules interspersed with Na + and Cl ions to balance charges and obtain a final ion concentration of 0.15 M. Protein, water, and ions were parameterized with the AMBER03 force field(1) and lipids with the Berger united atom force field. Ligands carazolol and BI-167107 were extracted from PDB entries 2RH1 and 3P0G, respectively, and parameterized for the General Amber force field (GAFF)(2)with acpype(3) and antechamber(4). For simulations with agonist and partial inverse agonist switched (2RH1 with BI-167107, and 3P0G with carazolol), ligand positions were swapped after superimposing the two crystal structures using all protein residues with atoms within 6 Å of either ligand. Resulting molecular dynamics system sizes range from 58406 to 59044 atoms. The generated systems from the crystal structures were energy-minimized using steepest descent with a cut-off of 1000.0 kj/(mol nm). All simulations were carried out with a 2 fs timestep. The system was equilibrated for 100 ps as a canonical ensemble (NVT) using the v-rescale thermostat with a time constant of 0.1 ps and a reference temperature of 300 K, followed by 100 ps as an isothermal-isobaric ensemble simulation (NPT), temperature-controlled by a Nose- Hoover thermostat (5, 6) at 300 K with a time constant of 0.2 ps and two temperature groups (protein, ligand, and lipids as one, solvent and ions as the other) for improved accuracy; pressure was controlled with semiisotropic coupling to a Parrinello-Rahman barostat (7) with a time constant of 5 ps, a reference pressure of 1.0 bar, and isothermal compressibility of 4.5 10 5 per bar. For long-range electrostatics, Particle Mesh Ewald with cubic interpolation and a 0.16 nm grid spacing for FFT was applied. The LINCS algorithm was used to restrain all bonds(8). The neighbor list was updated with a grid search using the switch algorithm with a van-der-waals cut-off of 1.1 nm and short-range neighbor list and electrostatic cut-offs of 1.4 ns for equilibration, and 1.3 ns otherwise. This was done at an interval of 10 fs during equilibration and 20 fs during all molecular dynamics production runs. Dispersion correction was enabled to correct for effects from cut-offs. Center-of-mass motions were removed independently for two groups: solvent and ions as one; protein, lipids, and, if present, ligand as the other. Periodic boundary conditions were used for all simulations and randomized starting velocities were assigned from a Maxwell-Boltzmann distribution. Markov state models of receptor dynamics. MSMs were built by first clustering the simulation data at an interval of 2.5 ns along four key regions on the GPCR: the root mean squared deviation (RMSD) of the ligand binding pocket residues within 4 Å of both carazolol and BI- 16707. (Residue ID: 3, 109, 110, 113, 114, 117, 118, 191, 192, 193, 195, 199, 200, 203, 204, 207, 286, 289, 290, 293, 305, 308, 309, 312, and 316); the RMSD of the connector region (I121 and F282), the RMSD of the NPxxY region (N322, P323, L324, I325, Y326, and C327), and the distance between Ca atoms of R131 on Helix 3 and L272 on Helix 6. The four criteria were inversely weighted according to their relative magnitudes, as the latter movement is significantly larger (6 Å) than the others (1-2 Å). The total data was then assigned to these clusters and used to construct a transition count matrix (Cij = the number of observed transitions from state i at time t to state j at time t+τ, where τ is the lag time of the model) and corresponding transition probability matrix (Pij = probability of transitioning from state i at time t to state j at time t+τ, where τ is the lag time of the model). NATURE CHEMISTRY www.nature.com/naturechemistry 4

The Markov lag time, or smallest time interval in which the data can be demonstrated as Markovian, was determined by plotting rates (k) from eigenvalues µ of the transition probability matrix at varied lag times τ, as k = "# lnµ. This equation comes from the equivalence between discrete time MSMs and continuous time master equations (9, 10). These rates, or implied time scales, should be unchanged when a system is Markovian (11), satisfying the Chapman-Kolmogorov test, and were monitored for 2000, 3000, and 4000 state systems, seen below in Supplementary Figure S5 with MFPTs shown in Supplementary Figure S6. The 3000 state model with lag time τ =7.5 ns demonstrates Markovian behavior for all systems. A 10 state macro state model was also generated by applying the PCCA+ algorithm(12 14). Kinetic Monte Carlo sampling of the 3000 state MSM transition probability matrix, starting from a random active state, was used to create 150 µs MSM trajectories, where each step corresponds to the lag time τ. A bootstrap approach was used to resample the transition probability matrix to obtain uncertainties on the state populations. For each row of the matrix, new counts were assigned after sampling the multinomial distribution, using the original state populations as weights, for a quantity of outcomes equal to the number of simulation frames in the raw data. A new matrix was computed 60 times to obtain bootstrap errors on the populations. Transition path theory. Transition path theory was applied to the Markov State Models (MSMs) of agonist-bound, inverse agonist-bound, and apo simulations. For this analysis, active and inactive states are defined according to the order parameters used to construct the MSMs. The following cutoffs were used to define active states: Helix 3- Helix 6 distance of above 12.0 Å, below 0.8 Å for NPxxY RMSD to active crystal structure 3SN6, and above 1.0 Å to inactive crystal structure 2RH1 (a cutoff of 0.8 Å for RMSD to 2RH1 NPxxY was used for the apo simulations due to rare sampling above this cutoff as in Supplementary Figure S1), below 1.0 Å for the I121-F242 connector RMSD to the active crystal structure 3SN6 and above 1.0 Å RMSD to 2RH1. Inactive states were defined with the following cutoffs: below 9.0 Å for Helix 3- Helix 6 distance, below 0.8 Å for NPxxY RMSD to 2RH1 and above 1.0 Å to 3SN6, below 1.0 Å for connector RMSD to 2RH1 and above 1.0 Å RMSD to 3SN6. For further TPT analysis with residue functional group distances, additional criteria was added to restrict the end states to their reference crystal structure distances. Uncertainties on the populations of these states were computed from the transition count matrix bootstrap and are reported as the ratio of the uncertainty to the population value: agonist bound active states 0.003, inactive 0.00006; inverse agonist bound active states 0.009, inactive 0.0001; apo active states 0.013, inactive 0.00002. Next, the transition matrix from the MSM is used to assign committers for each state, the probability that a given MSM state will proceed to the final (in this case active) state. These committers are used to define the flux from the initial (inactive) to final (active) states for varied activation pathways. The relative probability of each pathway is given by the magnitude of the flux along it. One can also compute the mean first passage time (MFPT) between the defined active and inactive states to approximate the rate for the transition(10). The errors reported for MFPTs are the standard deviation of the MFPTs computed for all inactive/active state combinations for a given system. NATURE CHEMISTRY www.nature.com/naturechemistry 5

Mutual information. The excess mutual information(15) was computed for all protein χ 1 torsion angles throughout the simulations. This metric allows us to capture non-linear correlations of torsion angles, and imposes a noise filter by subtracting off mutual information computed from 10 iterations of scrambled data. The mutual information values are shown in a cross-correlation plot for all residue pairs in Supplementary Figure S11. A cutoff for significant mutual information was determined by plotting the number of significant residues with a variety of cutoffs (Supplementary Figure S12). $ $ 2# 2# MI R1,R 2 = p (" R1," R 2 )ln p "," R1 R 2 0 0 p " R1 ( ) ( ) p (" R 2 ) d" R1" R 2 Relative helix movement analysis. To compute the centers at either end of each transmembrane helix, we averaged over the Cα location of the second to fourth residues closest to the end. Only for Helix 5, we used residues 224-226 instead of 226-228 for the intracellular end, because of missing residues in the 3P0G crystal structure. We then averaged over the intra- and extracellular centers for all seven helices, respectively, to define two local end-of-helix centers-of-mass. Likewise, we took three adjacent residues halfway along each helix to compute helix positions and center-of-mass for the core region. Supplementary Figure S26 shows the simulationaveraged densities for these helix positions. Small molecule docking to pathway states. MSM states from TPT pathways with flux > 30% of the maximum flux were selected for small molecule docking. For the agonist-bound MSM, this corresponds to 20 MSM states, for the inverse-agonist bound 43 states, and for the apo, 102 MSM states. As a control, we also docked to both the active (3P0G) and inactive (2RH1) crystal structures and to 20 randomly selected snapshots from previously performed long-timescale agonist-bound GPCR deactivation simulations from D.E. Shaw Research(16). All snapshots were aligned to the same active crystal structure (3P0G) before docking. The Surflex(17, 18) docking program was used to dock structures from the β 2 AR set with the GPCR Decoy and Ligand Database(19), which is comprised of ~200 known β 2 AR agonists and antagonists, and ~8000 structurally similar decoys drawn from the compound library ZINC(20) by property matching the true ligands by molecular weight, formal charge, hydrogen bond donors and acceptors, rotatable bonds and logp. Ligands were prepared using Surflex protonation tools, and OpenEye Omega(21) was used to enumerate stereoisomers up to 4 chiral centers. The Surflex docking option to perform a pre and post minimization of each pose was used (-pscreen), and ligands were docked and scored to 20 snapshots from each of these MSM States, with the best score reported. For computing ROC plots, the best scoring stereoisomer was assigned to the ligand. Receiving Operator Curves (ROC) evaluate enrichment of true ligands over decoys by plotting the true positive rate with the false positive rate for a variety of docking score cutoffs. The Area under these Curves (AUC) indicate the performance, where AUC=1.0 indicates perfect ranking of true ligands over decoys, and AUC=0.5 is random performance. Supplementary Figure S22 shows these AUCs and their 95% confidence intervals (CI) for the TPT pathway state docking to the three MSMs, to random agonist-bound MD snapshot docking, and to active and inactive crystal docking. For the crystal docking, the CI was obtained by percentile NATURE CHEMISTRY www.nature.com/naturechemistry 6

bootstrapping, because there is a single AUC. For the MSM and random docking AUCs, we used the 95% CI for the standard normal distribution. We see statistically significant improvement for both agonist and antagonist docking when docking to MSM states, compared to crystal structure docking and random MD snapshot docking. As seen in Supplementary Figure S22, the CI for agonist docking performance (0.81, 0.84) for active crystal is improved to (0.86, 0.88) in the agonist-bound MSM docking, and is compared to a (0.77, 0.82) CI for random snapshot docking. Also, a substantial improvement for the antagonist docking is seen from the (0.75, 0.79) CI for the best crystal AUC to the (0.83, 0.84) CI for antagonist-bound MSM docking, compared to the random snapshot docking CI (0.74, 0.78). Thus, in summary, the agonist-bound MSM docking performs best for discriminating agonists over agonist decoys, and inverse-agonist-bound MSM docking performs best for discriminating antagonists over antagonist decoys. Tests of proportions for AUC scores greater than the crystal AUC between the two populations were computed for all docking sets and the p values for the corresponding Z statistics are summarized in Supplementary Table S4. The top 10% scoring true ligands (decoys were excluded) were selected for each MSM state, resulting in 3300 compounds for each agonist and antagonist docking set. These compounds were clustered by their chemotype with a k-centers algorithm, evaluated by combined Tanimoto values from unaligned (preserving the docked conformation) shape and chemistry overlap computations with OpenEye program ROCS(22). Tanimoto scores are computed as: O AB T AB = O AA + O BB " O AB Where perfect overlap=1. A cutoff for the clustering was selected as the value separating the top 5% of all Tanimotos computed for all compound pair overlaps, 0.416 for agonists and 0.5 for agonists. This resulted in 935 clusters for antagonist chemotypes and 497 agonist chemotypes (including different stereoisomers). The chemotype clusters were then assigned a progress variable ξ based on the MSM state that selected (highly ranked) the chemotype cluster center. This progress variable is a linear combination of the previously described combination of structural metrics used to build the MSM, and describing GPCR activation. The H5 bulge, connector, and NPxxY RMSD from active is scored a point in the range [0, 2.0, 0.5] Å, and the H3-6 distance in the range [8, 12, 1] Å. The scores are then normalized by the maximum, and adjusted in Figure 4 in the main text and in Supplementary Figures S23 and S24, such that the end states correspond to the inactive (ξ=0.0) and active (ξ=1.0) crystal structures. In Supplementary Figure S23 and S24, chemotypes were given binary counts across all states within each progress variable ξ, with a tally=1 if a member from the cluster was found at a given ξ. For Figure 4 in the main text, four examples of chemotypes that are enriched by select MSM states along the activation pathways are plotted as the percentage of the chemotype cluster ligands out of the total ligands discovered at a given ξ. Supplementary Discussion: Diverse pathways for GPCR (de)activation. The simulations yield a multitude of pathways with similar flux (Supplementary Figure S12). The agonist-bound simulations have a reduced number of high flux pathways compared to the inverse agonist-bound and apo simulations, and find a predominant pathway consistent with previous simulations of agonist-bound GPCR, in NATURE CHEMISTRY www.nature.com/naturechemistry 7

which an increase in the Helix 3- Helix 6 distance precludes an NPxxY change and connector flip(16). Additionally, changes in H5 around S207 and F208, seen in the active crystal structure and referred to as the H5 bulge, is also seen to occur concomitantly with these projections during activation in the simulations, and is stabilized only by the presence of agonist (main text Figure 1)(23). This sequence of events is well described by 3-D projections along the Helix 3-Helix 6 distance, the NPxxY region, and the connector region in Figure 2 of the main text and Supplementary Figure S20. However, for different ligands, pathways of (de)activation show this helix change occuring prior to other metric changes, or more gradually, concomitant with NPxxY and connector changes. Supplementary Figure S21 gives additional structural insights into the diverse pathways found from TPT and illustrates how ligand-modulation of the receptor landscape gives rise to different predominant pathways from active to inactive states. These metrics were found using mutual information analysis of correlated residue pairs (Supplementary Tables S1-3), and plots of residue distance distributions in going from active to inactive conformations (Supplementary Figures S15-19). Intramolecular waters. Intramolecular waters can mediate the dynamics of biomolecules through the formation of hydrogen bonds and hydrogen bond networks. In the case of β 2 AR, comparative analysis of several GPCRs has revealed a network of conserved water clusters capable of modulating receptor dynamics(24), of which eight (clusters 2, 3, 4, 7, 9, 11, 12, 13, and 15) were found to be present in mutant β2ar, (PDB 2RH1). In our simulations, water molecules were not placed explicitly, but were allowed to migrate freely throughout the simulation system and in and out of the receptor during equilibration and production runs. To test for the faithful reproduction of conserved water clusters, 26,000 β2ar apo structures were sampled randomly from the final data set. Applying the same cut-off of 3.8 Å and the locations listed in Table S1 of (24), we found that for seven out of eight water clusters (clusters 2, 3, 4, 7, 11, 12, 13, and 15), hydrogen-bond capable waters were present in, on average, over 97% of structures. In particular, the important hydrogen-network formed by clusters 2, 3, 4, 11, 12, and 13 is intact in the vast majority of structures in agreement with the crystal structures. For water cluster 9, located at the ligand binding pocket, we found that in the absence of a ligand, water molecules were present at the suggested location in just over 23% of the structures, which is in line with findings that water cluster 9 was absent from the only unligated receptor used in the comparative analysis(24). Ionic lock formation. The formation of an salt bridge between intracellular residues E268 and R131 is a feature of the receptor s inactive state and disruption of this ionic lock is involved in receptor activation (25). It has previously been shown that the inactive state shows a mixture of ionic-lock formed and broken at equilibrium(26). In our simulations we see the ionic lock formed in 40.04%, 37.85%, and 46.6% of conformations for apo-2rh1, inverse agonist-bound 2RH1, and agonist-bound 2RH1, respectively. In the apo-2rh1 variant with ICL3, ionic lock formation is markedly increased, with an occurrence of 67.64%, indicating that ICL3 may have a stabilizing effect on the state with ionic lock formed relative to the state in which it is broken. It must be noted, however, that given the limited sampling of the conformational space of the loop and the absence of a G protein, this effect might simply be a bias introduced by the loop s initial starting conformation. In contrast, for the active starting structure 3P0G, ionic lock formation occurs in a minority of conformations, 1.11%, 1.1%, and 0.08% for apo-3p0g, inverse agonist- NATURE CHEMISTRY www.nature.com/naturechemistry 8

bound 3P0G, and agonist-bound 3P0G, respectively. Distributions for the distance between the charge groups of the ionic lock residues are summarized in Supplementary Figure S15, where in each case close distances are not consistently observed for active states, and a variety of distances are sampled in the inactive and intermediate states. Extracellular salt bridges. A prominent feature of the extracellular exposed ligand binding pocket is a salt bridge between K305 and D192 that crosses the cleft between helices 6 and 7 and the stable helix-bundle of helices 3 through 5. This salt bridge connects extracellular loops 2 and 3. The gaps on either side of this bridge have been shown to permit the ligand carazolol to enter the binding pocket, and exhibits rates of 109 s-1 and 108 s-1 for bond formation and breakage, respectively(27). Our simulations confirm that the salt bridge is not stable. It persists in 75-80% of the cases for ligand-bound β 2 AR depending on simulated system. In contrast, in apo β 2 AR, the bridge is closed in only about 60% of conformations, suggesting greater conformational flexibility. A study of the correlation between salt bridges in apo-2rh1 reveals that about half (12 out of 25) of the statistically significant correlations (absolute value of the Pearson correlation coefficient r > 0.1, p-factor < 10-16, with a sample size of 257,531 structures (Supplementary Table S5) involve the two residues of the K305-D192 salt bridge. In particular, there are small to medium positive correlations between the formation of the K305-D113 bridge and a salt bridge between R304 and D192 (r = 0.309) and E180 (r = 0.243), respectively. In fact, in 85.371% of the structures in which K305 binds to D113, R304 has a salt bridge formed with either one of the two. Supplementary Table S5 shows salt bridge binding partners of K305. E180 is an alternative bridge that crosses the cleft and limits the size of the opening. A binding with D300 is observed more rarely. This residue is situated on extracellular loop 3 close to helix 6. Structures that have this bridge formed show a deformation of helix 7, with helix 6 and 7 retreating from the cleft, leaving it more accessible. Lastly, we observe formation of a salt bridge between K305 and D113. The latter is the only negatively charged residue within the ligand binding pocket, and is known to interact with, and stabilize ligands(28). No such salt bridge formation is observed in the ligand-bound receptors, where the ligand s positions are tightly coupled to D113 and so prevent access to the residue. It is worth noting that the reported values in Supplementary Table S5 are not giving the distribution at equilibrium. In fact, low percentages can indicate conformational changes that take place at long timescales and that we began to sample. MSMs give a better picture of expected equilibrium properties. The K305-D192 salt bridge is not a conserved feature in GPCRs. In β1ar (PDB ID 2Y02), the long arginine residue is absent from the corresponding location on ECL3. Instead, it has an Rinine residue (R317) inserted at the location of D300 in β 2 AR that may act as a hub in a similar way. We found in a separate study of apo 2Y02 (unpublished data) that this residue forms salt bridges with surrounding residues as follows (% of total structures in brackets): D184 (29.92%), D186 (8.44%), D200 (13.79%), D322 (11.97%). The first two are located close to helix 4 on ECL2 and therefore help stabilizing ECL3 towards ECL2. D200 is homologous to D192, whereas the connection between R317 and D322 is equivalent to the K305-D192 bridge, but with opposite charges, connecting ECL3 to itself and opening up the cleft. NATURE CHEMISTRY www.nature.com/naturechemistry 9

Closed extracellular gate conformation. By analyzing salt bridges throughout the receptor we noticed strong correlations for several salt bridges forming a network on the extracellular side. To better understand the relevance of the salt bridge correlations on the extracellular side, we built a 100-state MSM for the relevant charged residues of apo 2RH1. 2 of the 10 most populated states showed the K305-D113 bridge formed, indicating a relevant motif for the apo structure of β2ar. To test this hypothesis, we carried out two molecular dynamics runs, at 310 K and 330 K, respectively, starting from a randomly selected structure with the motif formed. Both simulations show a stable conformation for 100 ns, in which the extracellular end of helix 7 is bent inwards towards helices 3 and 4, a gating motion previously indicated by principal component analysis (29), and a network of salt bridges forms across the cleft: K305-D113, R304-(D192/E180), D300-H178, and H93-(E306/D192). This network resembles a zipper motif, where charged side chains from each side of the cleft interlock to form a pattern of alternating charged residues (Supplementary Figure S25) closing the gate leading into the ligand binding pocket and preventing access. Distance plots for relevant residue pairs are shown in Supplementary Figure S19. For better sampling, we started additional simulations on Exacycle, two for each structure in apo 2RH1 and apo 3P0G, in which the K305-D113 bridge is formed, for a total of 10204 additional trajectories and 25 µs of chemical time. A second, more detailed Markov State Model with 250 states shows the dynamics taking place on the extracellular side. We randomly sampled 2000 structures from the states with state transitions guided by the transition probability matrix. The resulting trajectory shows the extracellular end of helix 7 repeatedly flipping in and out of the cleft region, assuming and transiently maintaining the closed conformation, at a rate that indicates that this state is more meaningful than Supplementary Table S5 suggests. The histogram in Supplementary Figure S25 suggests a clear two-state behavior for open and closed gate conformations. Extracellular region transmembrane helix movement. In terms of the spread around the average helix position, the crystal structures for active and inactive conformation on the extracellular side are almost identical in the displacement of helices 2 and 3 (with a delta of less than 1%, or <=.11 Å), while the remaining five helices are displaced by 0.379 Å (helix 4) to 0.773 Å (helix 1), relative to one another. The active structure shows a tighter helix formation than the inactive structure. During simulation, a compaction of helices 6 and 7 takes place across all systems, whereas helices 4 and 5 move slightly outwards. Helix 1 shows the largest relative movement within simulations, in particular for the inactive structure. A density plot (Supplementary Figure S26) reveals that the extracellular end of helix 1 is the most flexible with significant movement in the membrane plane towards or away from Helices 2 and 7. Helices 2 through 5 show a stable cluster relative to helices 1, 6, and 7, which is stabilized by an aromatic network between helices 2 and 3, the extracellular loop 2 between helices 4 and 5, the two disulfide bridges between the two cysteine pairs 184 and 190, and 191 and 106, and a salt bridge between residues that are located on helices 3 and 4. Core region transmembrane helix movement. In Supplementary Figure S26, the core region shows the highest degree of stability compared to both intra- and extracellular side. This region is generally highly compact. The largest differences between active and inactive structure are again found at helices 1, 6, and 7. 1 and 7 have moved inwards by 0.781 Å and 0.75 Å in the NATURE CHEMISTRY www.nature.com/naturechemistry 10

active structure, respectively, while helix 6 has moved outwards by 1.208 Å. During simulation, helix 6 shows some convergence towards the inactive state in the simulations started from the active conformation. Helix 1 moves outwards across all systems. Intracellular region transmembrane helix movement. In Supplementary Figure S26, the intracellular side shows the most significant structural differences. Helices 6 and 7 in particular differ strongly between inactive and active structure, with 6.951 Å and 3.47 Å relative displacement, respectively. Helices 1, 2, and 4 are further displaced from the center in the active state, while helix 3 is significantly closer (relative displacements range from 1.4 Å to 2.277 Å). During simulation, helix 6 moves inwards in the active state simulations, while helix 7 moves outwards. Surprisingly, when comparing simulations started from the active with those started from the inactive state, the relative displacements of helices 1 through 4 between active and inactive remain almost constant, which indicates a significance of their rearrangement as a distinguishing element of receptor activation. Ion density. Analysis of ion distributions in our simulations of β 2 AR correctly identify the known Na + -binding site on extracellular loop 2, between the C184-C190 disulfide bridge and the charged residue E188. Additional high Na + -density is found adjacent to the ligand binding pocket exposed negatively charged D113 on helix 3, and outside of helix 3 near E107. Highdensity Cl - -regions are exclusive to the intracellular side, and indicate interactions with residues around the intracellular end of helix 4, and the ends of helix 6 and 7. We analyzed the apo 2RH1 data set to identify high-density regions in the distribution of ions (Supplementary Figure S27). We correctly identify the known Na + -binding site on ECL2, between the C184-C190 disulfide bridge and the charged residue E188. In addition, there is high Na + -density within the ligand binding pocket, adjacent to the negatively charged D113 on helix 3, and outside of helix 3, near E107. The latter is surprising, because of the presence of two positively charged residues HIS172 and R175 on the extracellular side of helix 4. While we find high-density Na + -regions exclusively on the extracellular side, high-density Cl - -regions are exclusive to the intracellular side, and are spread out around the intracellular end of helix 4, indicating interaction with residues K147, K149, and R151, and the ends of helix 6 and 7, indicating interaction with K270, K273, R328, and R333. NATURE CHEMISTRY www.nature.com/naturechemistry 11

Supplementary Figures Supplementary Figure S1. Ligand modulated landscapes obtained from the raw simulation data. Free energy (kcal/mol) landscape of β 2 AR without ligand and with agonist (BI-167107) and inverse-agonist (carazolol) bound to the receptor. The order parameters used for generating the landscape are the distance between Helix 3 and Helix 6 (measured as R131-L272 distance) and the root mean square deviation of NPxxY region (N322-C327) in Helix 7 from the inactive crystal structure (2RH1). NATURE CHEMISTRY www.nature.com/naturechemistry 12

Supplementary Figure S2. Ligand Modulated landscapes obtained from the raw simulation data. Free energy (kcal/mol) landscape of β 2 AR without ligand and with agonist (BI-167107) and inverse-agonist (carazolol) bound to the receptor. The order parameters used for generating the landscape are the distance between Helix 3 and Helix 6 (measured as R131-L272 distance) and the root mean square deviation of NPxxY region (N322-C327) in Helix 7 from the active crystal structure (3P0G). NATURE CHEMISTRY www.nature.com/naturechemistry 13

Supplementary Figure S3. Ligand modulated landscapes obtained from the raw simulation data. Free energy (kcal/mol) landscape of β 2 AR without ligand and with agonist (BI-167107) and inverse-agonist (carazolol) bound to the receptor. The order parameters used for generating the landscape are the distance between Helix 3 and Helix 6 (measured as R131-L272 distance) and the root mean square deviation of connector region (I121-F282) from the inactive crystal structure (2RH1). NATURE CHEMISTRY www.nature.com/naturechemistry 14

Supplementary Figure S4. Ligand modulated landscapes obtained from the raw simulation data. Free energy (kcal/mol) landscape of β 2 AR without ligand and with agonist (BI-167107) and inverse-agonist (carazolol) bound to the receptor. The order parameters used for generating the landscape are the distance between Helix 3 and Helix 6 (measured as R131-L272 distance) and the root mean square deviation of connector region (I121-F282) from the active crystal structure (3P0G). NATURE CHEMISTRY www.nature.com/naturechemistry 15

Supplementary Figure S5. Implied timescales. Shown for 2000, 3000, and 4000 state Markov State models for the three sets of simulations at different lag times. A lag time of 15 steps, or 7.5 ns, was chosen for the 3000 state MSM for this study. NATURE CHEMISTRY www.nature.com/naturechemistry 16

Supplementary Figure S6. Mean first passage times (MFPT) for the activation and deactivation transitions. Shown as a function of lag time for the 3000 state MSM. A lag time of 15 steps (7.5 ns) was chosen, consistent with the implied timescales in Supplementary Figure S5. Supplementary Figure S7. Histograms of agonist-bound β 2 AR raw trajectories showing the key structural metrics of the inactive (I) and intermediate (R ) states identified using the 10 state Markov state models. The percentage of the receptor population was found to be 93.77 in the inactive (I) state and 5.67% in the intermediate (R ) state. NATURE CHEMISTRY www.nature.com/naturechemistry 17

Supplementary Figure S8. Histograms of inverse agonist-bound β 2 AR raw trajectories showing the key structural metrics of the inactive (I) and intermediate (R ) states identified using the 10 state Markov state models. The percentage of the receptor population was found to be 96% in the inactive (I) state and 3.5% in the intermediate (R ) state. NATURE CHEMISTRY www.nature.com/naturechemistry 18

Supplementary Figure S9. Histograms of apo raw trajectories showing the key structural metrics of the inactive (I) and intermediate (R ) states identified using the 10 state Markov state models. The percentage of the receptor population was found to be 95.08% in the inactive (I) state and 4.52% in the intermediate (R ) state. NATURE CHEMISTRY www.nature.com/naturechemistry 19

Supplementary Figure S10. Pathways from TPT analysis are enumerated and plotted with the corresponding flux as percent of the maximum. All pathways to the right of the dashed line (with >30% of the maximum flux) were used for small molecule docking analysis. NATURE CHEMISTRY www.nature.com/naturechemistry 20

Supplementary Figure S11. Cross correlation of residue pairs using mutual information. Mutual information values (in bits) are colormapped for each residue pair, where high mutual information (red-black color) indicates high correlation of the residue pair in the MD simulations. Helical regions are labeled and key correlated structural features noted. NATURE CHEMISTRY www.nature.com/naturechemistry 21

Supplementary Figure S12. Mutual information (MI) values for residues pairs. Number of residue pairs is plotted with the MI cutoff value in bits. The black dashed line denotes the cutoff used for ligands, red dashed line denotes cutoff used for apo analysis, which had significantly reduced MI values. NATURE CHEMISTRY www.nature.com/naturechemistry 22

Supplementary Figure S13. Comparison of the autocorrelation for key observables in the MSM trajectories shown in red with that from long agonist bound trajectories taken from Dror, et al.(16) shown in blue. NATURE CHEMISTRY www.nature.com/naturechemistry 23

NATURE CHEMISTRY www.nature.com/naturechemistry 24

NATURE CHEMISTRY www.nature.com/naturechemistry 25

NATURE CHEMISTRY www.nature.com/naturechemistry 26

Supplementary Figure S14. Structural changes in MSM trajectories. Variation of metrics denoting key structural differences between active (3P0G) and inactive (2RH1) crystal structures along 150 µs MSM kinetic Monte Carlo trajectories (see Supplementary Methods). These trajetcories are shown in the 3 images above, for GPCR with inverse agonist bound to the receptor, with agonist bound to the receptor, and the ligand-free apo-receptor. All the distances reported in the above are measured between the Cα atoms of the residues. Supplementary Figure S15. Histograms of ion lock residue distances comparing active, inactive, and intermediate GPCR states. Distance distributions for the ionic lock residues R131-E268 are shown. Active and inactive states are determined according to the criteria described for TPT analysis (see Supplementary Methods), and intermediates include all states excluded from these criteria. Histograms are computed from a 150 µs MSM trajectory for 10 bins and normalized. Supplementary Figure S16. Histograms of H5 bulge RMSD values comparing active, inactive, and intermediate GPCR states. Distance distributions for the H5 bulge, defined by the F208, S207 backbone RMSD values are shown. Active and inactive states are determined according to the criteria described for TPT analysis (see Supplementary Methods),, and intermediates include all states excluded from these criteria. Histograms are computed from a 150 µs MSM trajectory for 10 bins and normalized. NATURE CHEMISTRY www.nature.com/naturechemistry 27

Supplementary Figure S17. Histograms of connector-interacting residue distances comparing active, inactive, and intermediate GPCR states. Distance distributions for residues found in mutual information analysis (Supplementary Tables S1, S2, and S3) that involve the connector residues (I121 or F282) and involve a clear change in the activation transition are shown. Active and inactive states are determined according to the criteria described for TPT analysis (see Supplementary Methods), and intermediates include all states excluded from these criteria. Histograms are computed from a 150 µs MSM trajectory for 10 bins and normalized. NATURE CHEMISTRY www.nature.com/naturechemistry 28

Supplementary Figure S18. Histograms of NPxxY-interacting residue distances comparing active, inactive, and intermediate GPCR states. Distance distributions for residues found in mutual information analysis (Supplementary Tables S1, S2, and S3) that involve NPxxY residues (N322, P323, Y326). and involve a clear change in the activation transition are shown. Active and inactive states are determined according to the criteria described for TPT analysis (see Supplementary Methods), and intermediates include all states excluded from these criteria. Histograms are computed from a 150 µs MSM trajectory for 10 bins and normalized. NATURE CHEMISTRY www.nature.com/naturechemistry 29

Supplementary Figure S19. Histograms of extracellular region residue distances comparing active, inactive, and intermediate GPCR states. Distance distributions for residues contributing to the open/closed extracellular conformations are shown. Active and inactive states are determined according to the criteria described for TPT analysis (see Supplementary Methods), and intermediates include all states excluded from these criteria. Histograms are computed from a 150 µs MSM trajectory for 10 bins and normalized. NATURE CHEMISTRY www.nature.com/naturechemistry 30

Supplementary Figure S20. Diverse activation mechanisms from transition path theory: MSM built-in metrics. 3D (left) and 2D (right) projections of top flux pathways for β 2 AR simulations. We plot the Helix 3-6 distance (x axis), the NPxxY RMSD (y axis), and the connector (I121-F282) RMSD from inactive (z axis). Pathway connections are scaled by the path flux relative to the top flux shown in black; for inverse agonist (top), red 61% of the max, orange 51%; for agonist (middle) red 48%, orange 35%; apo (bottom): red 89%, orange 72%. State coordinates are determined from the centroid of the state. RMSD from this centroid fall within the displayed circles and are omitted (see Supplementary Figure S21 error bars). NATURE CHEMISTRY www.nature.com/naturechemistry 31

NATURE CHEMISTRY www.nature.com/naturechemistry 32

Supplementary Figure S21 Diverse activation mechanisms from transition path theory: other metrics. 3D projections of top flux pathways for β 2 AR simulations, monitoring the Helix 3-6 distance (x axis), the NPxxY RMSD (y axis), and four different residue functional group distances. These residue distances were chosen from metrics in Supplementary Figures S15-S19 that demonstrate distinct changes from active to inactive receptor states along difference pathways. The black curve in each graph corresponds to the top flux pathway from the TPT analysis in Supplementary Figure S20, using the four criteria used for MSM construction. The red and orange pathways impose additional criteria restricting the third distance within 2 Å of the active and inactive crystal structure reference values for active and inactive states, respectively. Red and orange pathway connections are scaled by the path flux relative to the top flux shown in black, as follows: for R131-E268, agonist red: 28% of the max flux, orange, 25%; inverse agonist: red 33%, orange 29%; apo: red 19%, orange 18%; for I121-M215, agonist red 40%, orange 40%; inverse agonist red 57%, orange 57%; apo red 50%, orange 50%; F208-F282, agonist red 52%, orange 44%; inverse agonist red 66%, orange 47%; apo red 19%, orange 18%; Y219-T326 agonist red 19%, orange 17%; inverse agonist red 20%, orange 13%; apo red 90%, orange 68%. Error bars are the RMSD from the state centroid. NATURE CHEMISTRY www.nature.com/naturechemistry 33

Supplementary Figure S22. AUCs for evaluating docking performance of the MSM states. AUC values and 95% confidence intervals from the docking results of the GPCR Decoy and Ligand Database(19). AUCs for docking to MSM states with different ligand conditions, to random agonist-bound MD snapshots(16), and to inactive (2RH1) and active (3P0G) crystal structures are shown. NATURE CHEMISTRY www.nature.com/naturechemistry 34

Supplementary Figure S23. Binary tally of agonist chemotype cluster centers found along activation pathways. Panels are displayed in order of inactive (ξ=0) to active (ξ=1), where MSM state progress corresponds to ξ=(0.08, 0.17, 0.33, 0.42, 0.5, 0.58, 0.67, 0.75, 0.83, 0.92). See Supplementary Methods for details. 497 chemotypes (520 including chemotypes discovered by the crystal structures) are sorted in order of increasing atom count, and are counted as 1 if a member of the cluster appears in the top 10% scoring docked compounds of a state assigned to a given progress variable. NATURE CHEMISTRY www.nature.com/naturechemistry 35

Supplementary Figure S24. Binary tally of antagonist chemotype cluster centers found along activation pathways. Panels are displayed in order of inactive (ξ=0) to active (ξ=1), where MSM state progress corresponds to ξ=(0.08, 0.17, 0.33, 0.42, 0.5, 0.58, 0.67, 0.75, 0.83, 0.92). 935 (946 including those discovered by the crystal structures) chemotypes are sorted in order of increasing atom count, and are counted as 1 if a member of the cluster appears in the top 10% scoring docked compounds of a state assigned to a given progress variable NATURE CHEMISTRY www.nature.com/naturechemistry 36

Supplementary Figure S25. Dynamics of β2ar extracellular region. Positively and negatively charged residues are colored in red and blue, respectively. Approximate locations for helices 1 through 7 are labeled. The gate leading to the ligand binding site shows a distinct twostate behavior as indicated by histogram of distances between helices 3 and 7 for apo simulations. 27.04 % of structures show helix 7 moved inwards to a distance below 15 Å. A) A representative structure for the closed state where helix 7 has moved inward to close the cleft, with salt bridges forming between charged residues in a zipper-like motif. B) In contrast, the β 2 AR crystal structure (PDB ID 2RH1) shows a cleft running across the surface. The structure is shown with ligand carazolol (yellow) trapped in the binding pocket to indicate the accessibility of the pocket to surrounding medium. This cleft is bridged by a single arginine residue K305 (pointing upwards from helix 7). An overlay of the closed conformation (green) with the crystal structure (light gray) is shown on top with side chains for residues involved in salt bridge formation. K305 is now extended inside into the ligand binding pocket and forms a salt bridge with the buried D113. NATURE CHEMISTRY www.nature.com/naturechemistry 37

Supplementary Figure S26. Distributions of trans-membrane helix positions. Helix positions for apo simulations started from inactive (PDB 2RH1, top) and active β 2 AR (PDB 3P0G, bottom) are shown for the extracellular (left), core (center), and intracellular (right) regions. Colors change for each order-of-magnitude increase in density from light yellow to red. All structures were aligned to the respective crystal structures, whose helix positions are indicated as numbered dots. For the black contours, corresponding contours for agonist (dotted green lines) and inverse-agonist structures (dashed blue lines) are shown for comparison. NATURE CHEMISTRY www.nature.com/naturechemistry 38

A B C Supplementary Figure S27. Ion Densities. Regions of high ion density for Na + (magenta) and Cl - (cyan). Views correspond to 90 rotations around the horizontal with helix 1 to the right, and are from extracellular side (A), from within the membrane plane with extracellular side up (B), and the intracellular side (C). Positively and negatively charged residues are colored in red and blue, respectively. NATURE CHEMISTRY www.nature.com/naturechemistry 39

Supplementary Tables Supplementary Table S1. Description of high mutual information (MI) residue pairs. Residue pairs with MI above the cutoffs indicated in Supplementary Figure S12 were investigated. The experimentally confirmed column indicates a high MI residue that is experimentally confirmed, or is paired with a confirmed residue. The Other column lists the number of residues found in the high MI dataset but not yet implicated in the literature as important for ligand binding or G-protein activation. Experimental evidence includes (i) mutagenesis experiments(28, 30-37) (ii) fluorescence studies (38) (iii) NMR studies (39, 40) with residues having significant signal change upon activation, and (iv) protein crystallography, with residues which differ by > 5 Å in comparing inactive and G protein-bound structures (23, 41 43). NATURE CHEMISTRY www.nature.com/naturechemistry 40

Supplementary Table S2. List of experimentally validated high mutual information residue pairs. Residue pairs with MI above the cutoffs indicated in Supplementary Figure S12 and previously found to be important in receptor function or ligand binding, as described in Supplementary Table S1, are listed. NATURE CHEMISTRY www.nature.com/naturechemistry 41

Supplementary Table S3. List of high mutual information residue not yet implicated in the literature as important for ligand binding or G-protein activation. Residue pairs with MI above the cutoffs indicated in Supplementary Figure S12 are listed. Supplementary Table S4. Summary of p values for MSM and MD docking AUC scores greater than the crystal docking result. A one-sided Z test statistic was computed for the test of proportions of AUC values in a given docking set that are greater than the crystal structure docking AUC. The agonist-bound MSM state gives significantly higher AUC scores compared to all other docking sets, meaning superior discrimination of agonists from agonist decoys. Inverse agonist-bound MSM docking performs significantly better than all other docking sets at discriminating antagonists from antagonist decoys, with a weakly significant improvement over the agonist-bound MSM. NATURE CHEMISTRY www.nature.com/naturechemistry 42