Using Dimensionality Reduction to Better Capture RNA and Protein Folding Motions

Size: px
Start display at page:

Download "Using Dimensionality Reduction to Better Capture RNA and Protein Folding Motions"

Transcription

1 Using Dimensionality Reduction to Better Capture RNA and Protein Folding Motions Lydia Tapia Shawna Thomas Nancy M. Amato Technical Report TR8-5 Parasol Lab. Department of Computer Science Texas A&M University October 5, 28

2 Abstract Molecular motions, including both protein and RNA, play an essential role in many biochemical processes. Simulations have attempted to study these detailed large-scale molecular motions, but they are often limited by the expense of representing complex molecular structures. For example, enumerating all possible RNA conformations with valid contacts is an exponential endeavor, and the complexity of protein motion increases with the model s detail and protein length. In this paper, we explore the use of dimensionality reduction techniques to better approximate protein and RNA motions. We present two new methods to study motions: (1) an evaluation technique to compare different distributions of conformations and (2) a way to identify likely local motion transitions. We combine these two methods in an existing motion framework to study large-scale motions for both proteins and RNA. We show that dimensionality reduction can be effectively applied, even to discrete conformation spaces (as for RNA secondary structure) that do not typically lend themselves to reduction techniques. 1 Introduction Molecular motions are critical for many biological processes. For example, ribonucleic acid (RNA) motions are responsible for synthesizing proteins, catalyzing reactions, splicing introns, and regulating cellular activities [49, 24, 3]. Also, proteins are well known for their conformational flexibility when it comes to binding with other proteins, ligands, sugars, or other small molecules. The different conformations, or molecular shape, that each of these molecules undertakes influences their function. While experimental methods have been highly successful at determining some dynamics and many static three-dimensional structures of proteins and RNA, they do not operate at the time scales necessary to record detailed large-scale protein motions. Methods that simulate the folding in silico have attempted to fill this gap, but they are often limited by the complexity of representing detailed molecular structures. For example, enumerating all possible RNA conformations with valid contacts is an exponential endeavor and has been shown to be impractical for sequences longer than 4 nucleotides [6]. Protein conformations are also represented in a complex space. Even just considering a protein conformation as a set of 3-D atom coordinates, a conformation of size N is represented by a vector of size 3N. Due to the high complexity of representing molecular motion, the study of many systems is limited. However, if the complex data can be summarized by fewer, more likely possible motions, then it may be possible to study larger systems. Mathematical techniques of dimensionality reduction give a clear way of reducing a high-dimensional data set to a lowdimensional representation. They have been applied in many domains including computational biology [7, 12, 14, 15, 29, 34, 35, 47]. Our Contribution. In this paper we explore the use of dimensionality reduction to study the motions of both RNA and proteins. We contribute two novel methods to the study of motions: a way to evaluate and compare different distributions of conformations and a way identify local motion transitions. The combination of these two new methods enables the study of biologically-relevant large scale motions. In order to evaluate different distributions of samples, we use dimensionality reduction to identify the principal features of a large conformational space. The coverage of different distributions on this reduced manifold can then be calculated. We demonstrate this new technique on an RNA reduction where many different conformation sets can be evaluated. This is particularly interesting due to the fact that RNA folding is often studied in a discrete setting, and dimensionality reduction techniques have not typically been applied to discrete problems. Small, local motions from one nearby conformation to another are often used as a way to piece together larger, more interesting motions. However, identifying this concept of nearby conformations is not always easy or inexpensive. We demonstrate how dimensionality reduction can be applied to identify candidates for this localized motion. This new method is applied to a set of 35 proteins and demonstrates a significant improvement over previous methods. Both of these contributions, sampling distributions and local transitional motions, are general and can be applied to any conformational set. We demonstrate their use within the framework of probabilistic roadmap methods (PRMs) [22]. 2 Related work For years, mathematical dimensionality reduction techniques have been applied to a variety of problems that exist in a complex space. Often, the data from these problems is too large and complex to analyze by hand, so these reduction techniques approximate the complex space with a smaller representation that includes the features of interest. High-dimensional data 1

3 from a variety of domains has been successfully reduced. These domains include areas such as: human subject studies [52], stellar spectra [11], and facial images [5]. Recently, dimensionality reduction has been applied to the biological problems of analyzing protein folding trajectories [7, 12, 14, 15, 29, 34, 35] and protein flexibility [46, 47]. There have been many approaches taken to explore the reduction of high-dimensional molecular data including linear dimensionality reduction [21], non-linear dimensionality reduction [27], and Normal Mode Analysis [51]. One of the most common techniques for dimensionality reduction, principle component analysis (PCA), was used to study the high-amplitude fluctuations in a molecular dynamics simulation of a small 46 residue protein [12]. From there, it has been applied to examine dynamics problems such as identifying protein conformational sub-states [23, 5, 36], extending the timescale of molecular dynamics simulations [1, 25], and performing conformational sampling [9, 8, 45]. PCA has also been applied to compare interpretations of the reduced space against experimental data, e.g., as was done with extensive mutation data [34]. Due to the fact that protein motion was shown to be generally non-linear [12], non-linear dimensionality reduction techniques have been applied to proteins. Non-linear techniques were used to analyze hundreds of thousands of conformations generated from a statistical mechanical method in order to define the most relevant reaction coordinates for the system [7]. Later, techniques to speed up the analysis were introduced in [35]. Another common approach to dimensionality reduction, normal mode analysis (NMA), determines the collective modes for dynamic systems. When applied to proteins, it has given insight into the vibrational motions [29, 3]. Recently, adaptations have been made to the standard NMA approach that allows the study of larger systems, e.g., a 12 amino acid protein in water [54]. The combination of PCA and NMA can also provide useful insight when the two measures agree or disagree [23]. Using information gained from the two methods, proteins such as bovine pancreatic trypsin inhibitor [15] and T4 lysozyme [14] have been studied. 3 Methods There are two main classes of methods used for our analysis of protein and RNA landscapes. First, we must have a method of generating molecular conformations and connecting them with local motion transitions. We refer to this process as roadmap construction because it originates from PRMs for studying robot motion [22]. The strength of PRMs lies in the ability to probabilistically approximate the conformations and local transitions required to capture critical events in the folding process. However, to represent the motions of even a small protein using an atomic-level model, thousands of sample conformations and hundreds of thousands of local motion transitions are required. After describing how a set of conformations can be generated, we explore the use of various dimensionality reduction techniques for determining a low-dimensional representation for our conformation sets. We discuss each of these two components in more detail below. 3.1 Roadmaps for Protein and RNA Folding In previous work, we introduced approaches for studying protein folding [2] and RNA folding [4] based on the probabilistic roadmap approach for motion planning [22]. We successfully applied our method to a large number of structures and were able to identify subtle differences that had been experimentally determined in the secondary structure formation order for proteins with very similar structures [38, 48] and kinetic differences for mutated proteins [43] and RNA [41, 42]. Our method is simple and consists of two main steps: (1) sampling conformations on the landscape and (2) making transitions between sampled conformations. In the first step, conformations (roadmap nodes) are sampled on the folding landscape, with a bias to increase density near the known native, folded state (Figure 1(a)). In the second step, connections (roadmap edges) are made between sampled conformations with similar structure (Figure 1 (b)). Weights are assigned to directed edges to reflect the energetic feasibility of transitioning between the two endpoint conformations. This combination of nodes and weighted edges forms a roadmap that approximates the energy landscape. This roadmap encodes thousands of folding pathways. The most energetically feasible pathways in the roadmap can be extracted using these weights (Figure 1(c)). Models. To study protein motion, we model the protein as an articulated linkage. Using a standard modeling assumption for proteins that bond angles and bond lengths are fixed [39], the only degrees of freedom in our model are the backbone s φ and ψ torsional angles. These are represented as revolute joints with values in the range [, 2π). 2

4 (a) (b) (c) Figure 1: A PRM roadmap for molecular folding shown imposed on a visualization of the potential energy landscape: (a) after node generation, (b) after the connection phase, and (c) using it to extract folding paths to the known native structure. In the results shown in this paper, we use a coarse potential function similar to [28]. We use a step function approximation of the van der Waals potential component and model side chains as spheres with zero degrees of freedom. If any two spheres are too close (i.e., less than 2.4Å during sampling and 1.Å during connection), a very high potential is returned such that these conformations are rejected from the landscape model. Otherwise, the potential is: U tot = K d {[(d i d ) 2 + d 2 c ]1/2 d c } + E hp (1) restraints where K d is 1 kj/mol and d = d c = 2Å as in [28]. The first term represents constraints favoring known secondary structure through main-chain hydrogen bonds and disulphide bonds, and the second term (E hp ) is the hydrophobic effect. The hydrophobic effect is computed as follows: if two hydrophobic residues are within 6Å of each other, then the potential is decreased by 2 kj/mol. For more details, please see [37]. To study RNA motion, we focus the RNA model on the formation of secondary structure. Secondary structure is a planar representation of an RNA conformation, which is commonly used to study RNA folding [55, 56, 16]. We adopt the definition in [16] that eliminates other types of contacts that are not physically favored. In the results shown in this paper, we use a common energy function called the Turner or nearest neighbor rules [55]. This method involves determining the types of loops that exist in the molecule and looking up their free energy in a table of experimentally determined values. Intuitively, adjacent contacts typically form stable subunits (called stacks or stems) that have low energy. Sampling. Conformation samples are retained based on their energy. In our protein work, a sample q, with potential energy E q, is accepted with probability: P(acc. q) = 1 if E q < E min if E min E q E max (2) if E q > E max E max E q E max E min where E min is the potential energy of the open chain and E max is 2E min. The roadmap produced by our technique is an approximation of the protein s energy landscape. The quality of the approximation largely depends on the sampling distribution. Generally, we are most interested in regions near the native conformation and so seek to concentrate sampling there. In the results shown here, we use the sampling technique presented in [48] based on rigidity analysis [18, 19, 2, 17, 26]. We have shown that this method provides a denser distribution of samples near the native conformation, increasing the size of the proteins we can study. In our previous work with RNA, we have explored a variety of sampling methods: complete base-pair enumeration (BPE), stack-pair enumeration (SPE), and probabilistic Boltzmann sampling (PBS) [42]. While a BPE roadmap describes the complete energy landscape, it is infeasible for large RNA (e.g., more than 4 nucleotides). SPE roadmaps are smaller (one or two orders of magnitude smaller than BPE roadmaps). PBS roadmaps are the smallest (up to 1 orders of magnitude smaller than BPE roadmaps), and we have shown they scale well for much larger RNA (e.g., with hundreds of nucleotides) [41]. PBS uses Wuchty s method [53] to enumerate suboptimal (low energy) conformations within a given energy threshold. We take these suboptimal conformations as seeds and include additional random conformations. Then, we use a probabilistic filter to retain a subset of the conformations based on their Boltzmann distribution factors. For a given conformation 3

5 q with free energy E q, the probability of accepting it is: { P(acc. q) = e (Eq E ) kt if (E q E ) > 1 if (E q E ) (3) where E is a reference energy threshold that we can use to control the number of samples kept, k is the Boltzmann constant, and T is the temperature. Connection. Connections between two conformations, q i and q j, are labeled with edge weights that reflect the energetic feasibility of transitioning between them. For proteins, this is done by first identifying all the intermediate nodes, q i = c, c 1,..., c n 1, c n = q j, that connect q i to q j. For each pair of consecutive conformations c i and c i+1, the probability P i of transitioning from c i to c i+1 depends on the difference between their potential energies E i = E(c i+1 ) E(c i ): P i = { e E i kt if E i > 1 if E i (4) where k is the Boltzmann constant and T is the temperature. This keeps the detailed balance between two adjacent states and enables the edge weight to be computed by summing the logarithms of the probabilities for all pairs of consecutive conformations in the sequence. With this edge weight definition, we can use simple graph search algorithms to extract the most energetically feasible pathways in the roadmap between two given states (e.g. from the unfolded state to the folded state). Similar to the method described for proteins (above), we calculate a weight w ij for edge (q i, q j ) that reflects the Boltzmann transition probability between q i and q j for RNA. First, we determine the energy barrier (the maximum energetic cost) E b between q i and q j. Then, we calculate the Boltzmann transition probability k ij (or transition rate) of moving from q i to q j using Metropolis rules [1]: k ij = { e E kt if E > 1 if E (5) where E = max(e b, E j ) E i, k is the Boltzmann constant, and T is the temperature. Note that the same energy barrier E b is also used to estimate the transition probability k ji, so the calculation satisfies the detailed balance. As with the proteins, the edge weight w ij is the negative logarithm of the transition probability. 3.2 Dimensionality Reduction Techniques A variety of dimensionality reduction methods have been developed that analyze a set of points (input) and produce a lowdimensional representation for each input point (output). The methods vary in the speed of calculation and the complexity of the data the models are able to represent. As in many data mining techniques, there are two main classes of methods: those that are able to capture data that is linearly representable and those that are able to capture non-linear data. Two popular types of methods for doing linear reduction are the classical techniques of Principal Component Analysis (PCA) [21] and Multidimensional Scaling (MDS) [4]. These methods are very popular because they are easy to implement, compute solutions efficiently, and can guarantee a globally optimal linear subspace reduction of the high-dimensional data. However, if the data being studied is non-linear, then more recent non-linear reduction techniques have been used to obtain better reductions [27]. In this paper we explore two methods for dimensionality reduction: PCA (linear) and Isomap [44] (non-linear). While these two methods both provide a reduction of some given model (see Algorithm 3.1), they differ greatly on how this model is obtained and internally represented. In our description of these methods we will use: n as the size of the original dataset (in our case RNA or protein conformations), D as the size of the dimensionality of the original dataset, R as number of dimensions in the reduced space required to represent the original dataset. PCA. Principal Component Analysis (PCA) is one of the most well known methods for dimensionality reduction. Its popularity stems from the ease of calculation and the longevity of the method [21]. The goal of PCA is to compute the D Principal Components (PCs) of the original data set. Even though there are D resulting PCs, often the variance in the data can be fully represented by a smaller set of the PCs, e.g., of size R. The general algorithm for PCA is briefly outlined in Algorithm 3.2. The critical step of the PCA method is the calculation of the the D PCs for an initial data set of dimensionality D. Each resulting PC is a vector that is aligned with a direction 4

6 Algorithm 3.1 Dimensionality Reduction for Molecules Input. A set of n conformations, represented in D dimensions Output. A set of size n in R dimensions where R << D Algorithm 3.2 Principal Component Analysis for Molecules Input. n D matrix, X Output. Set of R principle components, P C 1: Center the data in X by subtracting the data mean from each point 2: Construct the covariance matrix C = XX D 3: Compute the top D eigenvalues and eigenvectors of C via singular value decomposition (SVD) of C. 4: Set PC as the ordered D eigenvectors of C. 5: return The first R of PC where the variance of the representation of the original dataset is minimized and R < D. of maximal variance in the initial data set. They are ordered, e.g., the first PC represents the direction of maximal variance, the second with the second maximal, etc. Again, despite the fact that there are D resulting PCs, often the variance in the data can be fully represented by a smaller set of the PCs, e.g., of size R. Isomap. A popular non-linear dimensionality reduction technique is Isomap [44]. It retains the features of efficiency and global optimality while being able to represent non-linearity in the data. Isomap has been shown to work well on large and complex data sets [44] and has been applied to proteins [7]. Algorithm 3.3 Isomap for Molecules Input. A set of n conformations. 1: Construct a neighborhood graph G. For each conformation n i, connect it to neighbor n j with edge length d(i, j) if n j is a k nearest neighbor of n i. If n j is not a k nearest neighbor of n i, connect with an edge weight of d(i, j) =. 2: Compute the shortest paths in a matrix D G. For every pair of points, i, j, compute the shortest path distances between those points. E.g., min[d(i, j), (d(i, k) + d(k, j))] for every k from 1 to n. 3: Construct a R-dimensional embedding Apply classical multi-dimensional scaling to the matrix of graph distances D G. This will construct an embedding of the data in an R dimensional Euclidean space while preserving intrinsic geometry. The general algorithm for Isomap is briefly outlined in Algorithm 3.3. The algorithm works obtaining a geometric representation of input data, e.g., distances from one conformation to another. By using these geodesic distances, Isomap can preserve the topology of a complex and non-linear manifold even with a low-dimensional representation, e.g., of size R. 4 Application: Capturing RNA and Protein Landscapes In this section, we explore the application of linear and non-linear dimensionality reduction techniques to both RNA and protein conformation sets. We also investigate the parameters that affect the reduction quality. 4.1 Selecting Linear vs. Non-Linear Reduction Here we compare the efficiency of dimensionality reduction performed by both PCA (linear) and Isomap (non-linear). For the PCA reduction, we take all the roadmap conformations as input. For example, with proteins, each conformation is the series of backbone φ and ψ torsional angles. Then, we run PCA through MATLAB R and plot the variance of the residuals. For the Isomap reduction, we again take all the roadmap conformations as input. Then, we construct a neighborhood graph (see Algorithm 3.3) using a distance measure. For the RNA shown, we use a distance measure calculated from 5

7 base-pair differences [16]. For the proteins shown, we use all backbone atom root mean square distance (RMSD). The implementation of Isomap is from [44]. Figure 2 shows the variance of the residuals for both PCA and Isomap as a function of the number of reduced dimensions. Residual variance decreases rapidly with each additional dimension and then tapers off as the number of dimensions increases for both methods. To completely represent the data, both methods would require greater than 6 dimensions GB1 ISOMAP PCA.5 Variance # Dimensions Figure 2: Variance of the residuals from the dimensionality reduction for Protein G (PDB ID: 1GB1) from PCA and Isomap. Note that the non-linear representation given by Isomap is better able to capture the complexity of the data (as shown by lower and continuously decreasing residuals). This non-linearity in protein folding landscapes also corresponds to previous studies. For example [12, 7], also demonstrated that protein folding landscapes were better represented by non-linear reduction techniques. 4.2 Parameter Setting For the geometric representation required by the Isomap method, we need to define the k nearest neighbors for each conformation. Recall that for the protein results results shown in this paper, the RMSD distance is used to define the distance, and for the RNA results, the number of contact pair differences is used. However, the parameter k also affects the quality of the representation. Figure 3 shows the variance of the residuals Isomap reductions of a 21 nucleotide RNA where k is varied between the values of 8 and 5. Note, there is little difference between the quality of the reductions RNA 21nt k=8 k=7 k=6 k= Figure 3: Variance of the residuals from Isomap reductions for a 21 nucleotide RNA with varying values of k. Similar results were seen for reductions of protein conformations (data not shown). Due to this, a value of k = 8 was used for all reductions. 6

8 4.3 Selecting an Appropriate Number of Dimensions Once a reduction is performed another question arises: How many dimensions appropriately capture the space at the lowest complexity? Obviously, this is determined by the application the reduction is being used for. In the context of the results shown in this paper, we are interested in using simple representations that allow us to capture motions of RNA and proteins. We explore two measures for selecting the number of dimensions. The first, the residual variances, is standard and often used when the highest-quality reduction is required. Ideally, a reduction would exactly capture the complexity of the space (as represented by the residual variances reaching ). However, in complex spaces, extremely low-dimensional representations are not always possible or necessary. The second measure we investigate, the elbow criterion, is a measure commonly used in data clustering techniques to evaluate how well a particular clustering represents the data and to determine an appropriate number of clusters [13, 32]. The elbow criterion monitors the percentage of the variance explained by different clusterings and selects the one where this value no longer significantly changes, i.e., adding additional clusters (or in our case additional dimensions) does not add sufficient information. Given the variance of the data set, σ 2, the percentage of the variance explained is ( R i=1 σ2 i )/σ2 for each residual. In our case of principle dimensions, this measure captures the point at which the growth in the quality of the representation is maximized. Figure 4 demonstrates an elbow calculated from a reduction of the protein Ubiquitin (PDB ID: 1UBI). For this reduction, we would select 4 dimensions to represent the data..3 1UBI Residual Variance.2.15 Elbow Dimensionality Dimensionality Figure 4: The elbow (star) is shown for an example reduction of the protein Ubiquitin. The elbow indicates the point at which the growth in the quality of the representation is maximized. 4.4 Discovering Landscape Characteristics One of the most exciting things about reduced landscapes is the insight they give us as an approximation to the full energy landscape. In this section, we take a full enumeration of the conformations of a 21 nucleotide RNA (5,353 conformations). Note that the residuals clearly indicate that increasing dimensionality more accurately represents this conformation space (see Figure 3). However, even two dimensions reduces the residuals significantly. Figure 5 shows the first two dimensions of the reduction plotted against the potential of the conformations. Despite the low dimensional representation and the fact that potential was not used for the reduction, we see striking characteristics. Conformations of similar potential are clearly grouped together (red=high potential, blue=low potential). This reduction also demonstrates the typical ruggedness of RNA landscapes. 5 Application: Evaluating Sampling In this section, we demonstrate how the reduced space can be used to evaluate the quality and importance of different sample sets. A perfect test-case for this is the 21 nucleotide RNA. Due to the small size of this RNA, we are able to fully enumerate the conformation space. In addition to this Base Pair Enumeration (BPE) set, we can generate samples in two other ways: Stack Pair Enumeration (SPE) and Probabilistic Boltzmann Sampling (PBS) (see Section 3.1). SPE generates conformations such that all contacts in a conformation are part of a stack (a set of consecutive contacts). These conformations are a subset 7

9 Figure 5: The first two dimensions of reduction for a 21 nucleotide RNA plotted against potential energy Dimension 1 (a) Dimension Dimension 2 6 Dimension 2 Dimension 2 of the BPE conformations. The 21 nucleotide RNA has 25 SPE conformations. PBS probabilistically selects a subset of the conformations, favoring those with smaller energies. We can adjust the severity of this bias by altering the reference energy threshold, E. This threshold consequently determines the size of the subset. For this evaluation, we selected two reference energy thresholds: 4 and. The first threshold (labeled higher ) generates more conformations (213) than the second threshold (labeled lower ) with only 58. In previous experiments, we have seen that our BPE, SPE, and PBS roadmaps produce similar simulated kinetics results despite their drastically different roadmap sizes [42]. Figure 6 shows how the different conformation subsets cover a reduction of a full enumeration of the landscape (BPE). The two dimensions displayed here are the same two dimensions in Figure 5. For this 21 nucleotide hairpin, BPE generated 5,353 possible conformations. In Figure 6(a), the gray dots represent a BPE conformation and the star indicates the native state. Even though there are only 25 SPE conformations, it is clear from Figure 6(b) that they cover much of the reduced space. This implies that even though there are only about 5% of the samples, they still capture the general characteristics and distribution of the full set Dimension 1 5 Dimension 1 (b) 2 (c) Dimension 1 (d) Figure 6: (a) First two dimensions of a reduction of full enumeration of all possible conformations (5,353). The native state is indicated with a star. (b-d) Comparison of different subsets of conformations (black circles) overlaid on the reduction (gray dots). Subsets include: (b) 25 SPE conformations, (c) 213 PBS conformations (higher energy threshold), and (d) 58 PBS conformations (lower energy threshold). Figure 6(c) shows a similar plot for the 213 PBS conformations generated with the higher reference energy threshold. Again, even though there are much fewer samples, much of the fully enumerated space is still captured. It is interesting to note that the PBS distribution with the higher threshold and the SPE distribution are not exactly the same. Stack-based conformations have lower energies than conformations with isolated contacts, but they are not guaranteed to have low energies. This becomes apparent as we compare the SPE distribution to the PBS distribution which is probabilistically biased towards lower energy regions of the landscape. The PBS distribution is missing a fraction of the SPE subset (in the lower right quadrant of the reduction) that have higher energy. Finally, we plot the 58 PBS conformations generated with the lower reference energy threshold on the reduced space, see Figure 6(d). Despite the fact that only 58 conformations are generated, they still cover a large portion of the primary dimensions of the reduction. Also, as expected with a low energy threshold, they cover a large portion of space near the native state. A comparison with the higher threshold samples (Figure 6(c)) indicates that the many of high energy 8

10 conformations are eliminated by using this lower energy threshold. However, despite this reduction, there are some samples left to represent the region of higher energy conformations. 6 Application: Capturing Motions It was clear from the previously shown reductions that conformations of similar energetics and structure were grouped together, even at low dimensional representations. One way to take advantage of this grouping is to use the reduction to identify likely motion transitions. In the past, we have identified likely transitions from a conformation by using a distance metric to define nearby conformations. Then, we make connections between them as described in Section Methods We identify likely motion transitions by defining a new distance metric based on the reduction of a set of conformations C. After performing a reduction (as described in Section 3.2), we obtain a vector, r i, of length R for each conformation, c i. Here, the number of dimensions R used from the reduction is computed from the elbow criterion (see Section 4.3). We then calculate the distance between two conformations c a and c b by calculating their distance in the reduced space as (r a d R (c a, c b ) = 1 r1 b)2 + (rd a rb d )2 (6) 2n We call this measure the reduction distance. In previous work, we defined neighbors through a metric based on the amount of rigid structure in two conformations called rigidity distance [48]. This metric provided results that were able to capture experimental findings with two major benefits: fewer required edges and low edge weights. 6.2 Experimental Setup In order to compare the two ways of identifying neighbors for local motion transitions, we applied the two metrics to connect a single set of conformations: the previously developed rigidity distance and our new metric reduction distance. We took the proteins from our protein folding server that includes both our previously published results and user submissions. This set consisted of 35 proteins from 46 to 153 residues of varied secondary structure (Table 1). All proteins listed are referenced by their PDB ID except MMP19. This protein was a submission to our publicly available online folding server ( The conformation sets varied in size from 4, to 1, conformations (as defined previously by the amount needed to maintain a stable secondary structure formation order). Isomap was run on the set of conformations as defined in Section 3.2. As discussed in Section 4.2, the nearest neighbor parameter used by Isomap was set to k = 8. The number of dimensions used to represent the data was automatically defined by the elbow criterion (Section 4.3). The metrics were asked to attempt local connections to each conformation s 2 nearest neighbors. Recall that this is the neighbor rate as defined for roadmap connection (Section 3.1). 6.3 Results Table 1 displays the differences caused by the two different distance metrics for each protein studied. Edge Number Difference is the number of edges in the reduction connected map over the number of edges in the rigidity connected map. Edge Weight Difference is the average edge weight in the reduction connected map over the average edge weight in the rigidity connected map. It is clear that using the reduction distance causes on average a 6% decrease in the number of edges and almost a 1% decrease in the average edge weight. Figure 7(a) demonstrates the difference in number of edges in the roadmaps constructed by the two metrics. Since all 35 points fall below the red line, all maps connected by reduction distance were smaller than maps connected by rigidity distance. This was true even for maps with larger numbers of conformations (reflected in a larger number of edges). Since the edge weight reflects the energetic feasibility of making a local transition from one conformation to another, it is good to examine the changes in edge weight caused by this new connection method. Figure 7(b) shows the average edge weights from the maps connected by the rigidity distance against the maps connected by the reduction distance. Overall, the average edge weights from the reduction distance maps were almost 1% smaller than the original maps. While not all 9

11 Edge Edge PDB Number Weight Identifier Length Structure Nodes Difference Difference 1AB1 46 2α + 2β CCM 46 1α + 3β RDV 52 2α + 3β EGF 53 3β PRB 53 5α IY5 54 1α + 3β SMU 54 3α + 3β FCA 55 2α + 4β VGH 55 1α + 4β GB1 56 1α + 4β MHX 57 1α + 4β MI 57 1α + 4β BPI 58 2α + 2β PTI 58 2α + 2β BDD 6 3α TCP 6 2α + 2β ADR 6 2α + 2β CRS 6 6β PTL 62 1α + 4β COA 64 1α + 5β SRM 64 1α + 5β CI2 65 2α + 5β NYF 67 5β HOE 74 7β AIT 74 7β UBI 76 3α + 5β UBQ 76 1α + 5β O6X 81 2α + 3β A2P 18 4α + 6β YCC 18 5α VYN 117 5α + 8β RBX 124 4α + 7β L 129 7α + 3β AFG 14 4α + 1β MMP19* 153 3α + 7β Average.6.91 Table 1: Comparison of reduction distance connection to previous work for 35 proteins. In all cases, reduction distance connection reduces the number of edges needed, and in many proteins, it decreased the average edge weight. [* User submission without a PDB ID.] reduction connection maps had smaller average edge weight, 3 of 35 maps had averages that were similar to or less than the original average edge weight. In addition to reducing the required number of edges and the average edge weight, using a reduction distance to connect a roadmap dramatically changed the connectivity of the map. The degree for a conformation (or vertex) v in the roadmap is the number of edges connected to v. In the reduction distance maps, the average degree dropped to from More striking differences are seen in the conformations of maximum degree. For example, with the rigidity distance, the maximum degree in all roadmaps was in the range [32, 1,832] while in the reduction distance maps the degree was in the range [36, 47]. From these changes, it is clear that the reduction distance maps are more evenly connected. For example, the reduction of maximum degree implies that massive connectivity hubs are removed, and the average degree change implies that all conformations are more equally connected. From the previous statistics, it is clear that local motion transitions are changing the roadmaps. These changes seem to be for the better: smaller roadmaps, smaller edge weights, and more disperse connectivity. Another, more biologicallyinspired, measure is the order in which secondary structure is formed along the pathways in the roadmap. In previous work [48], we validated a set of roadmaps against experimental results. We showed that our roadmaps, connected by rigidity distance were able to capture the same secondary structure formation orders as found in experiment. Table 2 shows the secondary structure formation orders for 4 proteins with similar folding structure but differing folding behavior from the reduction distance roadmaps. It also indicates the decrease in map size required over the previously build rigidity distance roadmaps. In all cases, the reduction connected maps were able to predict the secondary structure formation order seen in experiment with almost 5% fewer edges than previously required. 1

12 x 1 5 Number of Edges Comparison x Edge Weight Comparison 5 Number of Edges in Reduction Distance Maps Average Edge Weight in Reduction Distance Maps Number of Edges in Rigidity Distance Maps x 1 5 (a) Average Edge Weight in Rigidity Distance Maps x 1 6 (b) Figure 7: (a) Number of edges from original maps vs. maps connected using reduction distance. (b) Average edge weights from original maps vs. maps connected using reduction distance. Size Protein Experimental Order Roadmap Order (%) Decrease G [α,β1,β3,β4], β2 1 α, β3-4, β1-2 (1.) 51% [α,β4], [β1,β2,β3] 2 L [α,β1,β2,β4], β3 1 α, β1-2, β3-4 (1.) 5% [α,β1], [β2,β3,β4] 2 NuG1 β1-2, β3-4 3 α, β1-2, β3-4 (98.) 47% α, β1-2, β3-4 (1.9) NuG2 β1-2, β3-4 3 β1-2, α, β3-4 (99.2) 54% β1-2, α, β3-4 (1.1) β3-4, β1-2, α (1.1) Table 2: Comparison of secondary structure formation orders and ratio of edges needed (Size Decrease) for proteins G, L, NuG1, and NuG2 with known experimental results: 1 hydrogen out-exchange experiments [31], 2 pulsed labeling/competition experiments [31], and 3 Φ-value analysis [33]. Brackets indicate no clear order. In all cases, our new technique predicted the secondary structure formation order seen in experiment with significantly reduced numbers of edges. Only formation orders greater than 1% are shown. 7 Conclusions In this work we proposed two new methods for studying molecular motions based on dimensionality reduction techniques. First, we demonstrated how dimensionality reduction can be used to compare different distributions of conformations. We illustrated this technique with a small RNA which could be fully enumerated. We showed how to evaluate 3 different sampling distributions by looking at the coverage and distribution of samples against the fully enumerated landscape in a reduced space. Second, we developed a new way to identify likely local motion transitions using dimensionality reduction. We define a new distance measure, reduction distance, to select neighboring conformations for localized motions. This new metric yields a significant improvement over previous techniques resulting in a 4% reduction in landscape model size (number of edges) for the 35 proteins studied. Both of these new methods are general and can be applied to any set of conformations. We showcase their utility in an existing motion framework. Acknowledgments We would like to acknowledge Mark Moll of the Physical and Biological Computing Group at Rice University for inspiring us to work with dimensionality reduction. This research supported in part by NSF Grants EIA-13742, ACR-8151, ACR , CCR , ACI , CRI , by the DOE and HP. Computing resources were generously donated by Chevron. Tapia supported in part by a PEO scholarship, NIH Molecular Biophysics Training Grant (T32GM6588) and by a Department of Edu- 11

13 cation (GAANN) Fellowship. Thomas supported in part by an NSF Graduate Research Fellowship, a PEO scholarship, a Dept. of Education Graduate Fellowship (GAANN), and an IBM TJ Watson PhD Fellowship. References [1] A. Amadei, A. Linssen, B. de Groot, D. van Aalten, and H. Berendsen. An efficient method for sampling the essential subspace of proteins. J. Biomol. Struct. Dyn., 13: , [2] N. M. Amato, K. A. Dill, and G. Song. Using motion planning to map protein folding landscapes and analyze folding kinetics of known native structures. J. Comput. Biol., 1(3-4): , 23. Special issue of Int. Conf. Comput. Molecular Biology (RECOMB) 22. [3] D. Bartel. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell, 116: , 24. [4] I. Borg and P. J. Groenen. Modern Multidimensional Scaling Theory and Applications. Springer, New York, NY, 25. [5] L. S. Caves, J. D. Evanseck, and M. Karplus. Locally accessible conformations of proteins: Multiple molecular dynamics simulations of crambin. Protein Sci., 7: , [6] J. Cupal, C. Flamm, A. Renner, and P. F. Stadler. Density of states, metastable states, and saddle points exploring the energy landscape of an RNA molecule. In Proc. Int. Conf. Intelligent Systems for Molecular Biology (ISMB), pages 88 91, [7] P. Das, M. Moll, H. Stamati, L. E. Kavraki, and C. Clementi. Low-dimensional, free-energy landscapes of proteinfolding reactions by nonlinear dimensionality reduction. Proc. Natl. Acad. Sci. USA, 13(26): , 26. [8] B. de Groot, A. Amadei, R. Scheek, N. van Nuland, and H. Berendsen. An extended sampling of the configurational space of HPr from E. coli. Proteins Struct. Funct. Genet., 26: , [9] B. de Groot, A. Amadei, D. van Aalten, and H. Berendsen. Toward an exhaustive sampling of the configurational spaces of the two forms of the peptide hormone guanylin. J. Biomol. Struct. Dyn., 13: , [1] K. A. Dill and H. S. Chan. From Leventhal to pathways to funnels. Nat. Struct. Biol., 4:1 19, [11] P. R. Fiorentin, C. A. L. Bailer-Jones, Y. S. Lee, T. C. Beers, T. Sivarani, R. Wilhelm, C. A. Prieto, and J. E. Norris. Estimation of stellar atmospheric parameters from SDSS/SEGUE spectra. Astronomy & Astrophysics, 467: , 27. [12] A. E. Garcìa. Large-amplitude nonlinear motions in proteins. Physical Review Letters, 68(17): , [13] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 21. [14] S. Hayward, A. Kitao, and H. J. Brendsen. Model-free methods of analyzing domain motions in proteins from simulation: A comparision of normal mode analysis and molecular dynamics simulation of lysozyme. Proteins Struct. Funct. Genet., 27: , [15] S. Hayward, A. Kitao, and N. G ō. Harmonic and anharmonic aspects in the dynamics of BPTI: A normal mode analysis and principal component analysis. Protein Sci., 3: , [16] I. L. Hofacker. RNA secondary structures: A tractable model of biopolymer folding. J. Theor. Biol., 212:35 46, [17] D. Jacobs. Generic rigidity in three-dimensional bond-bending networks. J. Phys. A: Math. Gen., 31: , [18] D. Jacobs and M. Thorpe. Generic rigidity percolation: The pebble game. Phys. Rev. Lett., 75(22): , [19] D. Jacobs and M. Thorpe. Generic rigidity percolation in two dimensions. Phys. Rev. E, 53(4): ,

14 [2] D. J. Jacobs and B. Hendrickson. An algorithm for two dimensional rigidity percolation: The pebble game. J. Comp. Phys, 137: , [21] I. T. Jolliffe. Principal Component Analysis. Springer-Verlag, [22] L. E. Kavraki, P. Švestka, J. C. Latombe, and M. H. Overmars. Probabilistic roadmaps for path planning in highdimensional configuration spaces. IEEE Trans. Robot. Automat., 12(4):566 58, August [23] A. Kitao and N. G ō. Investigating protein dynamics in collective coordinate space. Curr. Op. Str. Biol., 9: , [24] P. Klaff, D. Riesner, and G. Steger. RNA structure and the regulation of gene expression. Plant Mol. Biol., 32:89 16, [25] M. B. Kubitzki and B. L. de Groot. Molecular dynamics simulations using temperature-enhanced essential dynamics replica exchange. Biophys. J., 92: , 27. [26] A. Lee and I. Streinu. Pebble game algorithms and sparse graphs. European Conference on Combinatorics, Graph Theory and Applications, 25. [27] J. A. Lee and M. Verleysen. Nonlinear Dimensionality Reduction. Springer, New York, NY, 27. [28] M. Levitt. Protein folding by restrained energy minimization and molecular dynamics. J. Mol. Biol., 17: , [29] M. Levitt. Real-time interactive frequency filtering of molecular dynamics trajectories. J. Mol. Biol., 22:1 4, [3] R. M. Levy and M. Karplus. Vibrational approach to the dynamics of an α-helix. Biopoly., 18: , [31] R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8(8): , [32] L. Lieu and N. Saito. Automated shapes discrimination in high dimensions. Proc. of SPIE, 671 Wavelets XII(6711W), 27. [33] S. Nauli, B. Kuhlman, and D. Baker. Computer-based redesign of a protein folding pathway. Nature Struct. Biol., 8(7):62 65, 21. [34] S. B. Nolde, A. S. Arseniev, V. Y. Orkhov, and M. Billeter. Essential domain motions in barnase revealed by MD simulations. Proteins Struct. Funct. Genet., 46:25 258, 22. [35] E. Plaku, H. Stamati, C. Clementi, and L. E. Kavraki. Fast and reliable analysis of molecular motion using proximity relations and dimensionality reduction. Proteins: Structure, Function, and Bioinformatics, 67(4):897 97, 27. [36] T. Romo, J. Clarage, D. Sorensen, and G. P. Jr. Automatic identification of discrete substates in proteins: Singular value decomposition analysis of time-averaged crystallographic refinements. Proteins Struct. Funct. Genet., 22: , [37] G. Song. A Motion Planning Approach to Protein Folding. Ph.D. dissertation, Dept. of Computer Science, Texas A&M University, December 24. [38] G. Song, S. Thomas, K. Dill, J. Scholtz, and N. Amato. A path planning-based study of protein folding with a case study of hairpin formation in protein G and L. In Proc. Pacific Symposium of Biocomputing (PSB), pages , 23. [39] M. J. Sternberg. Protein Structure Prediction. OIRL Press at Oxford University Press, [4] X. Tang, B. Kirkpatrick, S. Thomas, G. Song, and N. M. Amato. Using motion planning to study RNA folding kinetics. J. Comput. Biol., 12(6): , 25. Special issue of Int. Conf. Comput. Molecular Biology (RECOMB) 24. [41] X. Tang, S. Thomas, L. Tapia, and N. M. Amato. Tools for simulating and analyzing RNA folding kinetics. In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pages ,

15 [42] X. Tang, S. Thomas, L. Tapia, D. P. Giedroc, and N. M. Amato. Simulating RNA folding kinetics on approximated energy landscapes. J. Mol. Biol., 28. doi: 1.116/j.jmb [43] L. Tapia, X. Tang, S. Thomas, and N. M. Amato. Kinetics analysis methods for approximate folding landscapes. Bioinformatics, 23(13): , 27. Special issue of Int. Conf. on Intelligent Systems for Molecular Biology (ISMB) & European Conf. on Computational Biology (ECCB) 27. [44] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 29: , 2. [45] M. Teodoro. Molecular conformational sampling using collective coordinate expansive spaces. Master s thesis, Dept. of Computer Science, Rice University, 23. [46] M. L. Teodoro, G. N. P. Jr., and L. E. Kavraki. A dimensionality reduction approach to modeling protein flexibility. In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pages , 22. [47] M. L. Teodoro, G. N. Phillips, Jr., and L. E. Kavraki. Understanding protein flexibility through dimensionality reduction. J. of Computational Biology, 1(3 4): , 23. [48] S. Thomas, X. Tang, L. Tapia, and N. M. Amato. Simulating protein motions with rigidity analysis. J. Comput. Biol., 14(6): , 27. Special issue of Int. Conf. Comput. Molecular Biology (RECOMB) 26. [49] I. Tinoco and C. Bustamante. How RNA folds. J. Mol. Biol., 293: , [5] M. Turk and A. P. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71 86, [51] E. B. Wilson, J. Decius, and P. C. Cross. Molecular Vibrations: The Theory of Infrared and Raman Vibrational Spectra. McGraw-Hill, Dover, 198. [52] M. Wish and J. D. Carroll. Multidimensional scaling and its applications. In P. Krishnaiah and L. Kanal, editors, Handbook of Statistics 2: Classification Pattern Recognition and Reduction of Dimensionality, chapter 14, pages North-Holland, Amsterdam, The Netherlands, [53] S. Wuchty. Suboptimal secondary structures of RNA. Master s thesis, University of Vienna, Austria, March [54] L. Zhou and S. A. Siegelbaum. Effects of surface water on protein dynamics studied by a novel coarse-grained normal mode appraoch. Biophys. J., 94: , 28. [55] M. Zuker, D. H. Mathews, and D. H. Turner. Algorithms and thermodynamics for RNA secondary structure prediction: A practical guide. In J. Barciszewski and B. F. C. Clark, editors, RNA Biochemistry and Biotechnology, NATO ASI Series. Kluwer Academic Publishers, [56] M. Zuker and D. Sankoff. RNA secondary structure and their prediction. Bulletin of Mathematical Biology, 46: ,

A Path Planning-Based Study of Protein Folding with a Case Study of Hairpin Formation in Protein G and L

A Path Planning-Based Study of Protein Folding with a Case Study of Hairpin Formation in Protein G and L A Path Planning-Based Study of Protein Folding with a Case Study of Hairpin Formation in Protein G and L G. Song, S. Thomas, K.A. Dill, J.M. Scholtz, N.M. Amato Pacific Symposium on Biocomputing 8:240-251(2003)

More information

Protein Folding by Robotics

Protein Folding by Robotics Protein Folding by Robotics 1 TBI Winterseminar 2006 February 21, 2006 Protein Folding by Robotics 1 TBI Winterseminar 2006 February 21, 2006 Protein Folding by Robotics Probabilistic Roadmap Planning

More information

A MOTION PLANNING APPROACH TO STUDYING MOLECULAR MOTIONS

A MOTION PLANNING APPROACH TO STUDYING MOLECULAR MOTIONS COMMUNICATIONS IN INFORMATION AND SYSTEMS c 2010 International Press Vol. 10, No. 1, pp. 53-68, 2010 004 A MOTION PLANNING APPROACH TO STUDYING MOLECULAR MOTIONS LYDIA TAPIA, SHAWNA THOMAS, AND NANCY M.

More information

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach

FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach FlexSADRA: Flexible Structural Alignment using a Dimensionality Reduction Approach Shirley Hui and Forbes J. Burkowski University of Waterloo, 200 University Avenue W., Waterloo, Canada ABSTRACT A topic

More information

Protein folding by motion planning

Protein folding by motion planning INSTITUTE OFPHYSICS PUBLISHING PHYSICAL BIOLOGY Phys. Biol. 2 () S148 S5 doi:1.188/1478-3975/2/4/s9 Protein folding by motion planning Shawna Thomas 1, Guang Song 2 and Nancy M Amato 1,3 1 Department of

More information

A Multi-Directional Rapidly Exploring Random Graph (mrrg) for Protein Folding

A Multi-Directional Rapidly Exploring Random Graph (mrrg) for Protein Folding A Multi-Directional Rapidly Exploring Random Graph (mrrg) for Protein Folding Shuvra Kanti Nath, Shawna Thomas, Chinwe Ekenna, and Nancy M. Amato Parasol Lab, Department of Computer Science and Engineering

More information

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is the Owner Societies 215 Structural and mechanistic insight into the substrate binding from the conformational

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

A Motion Planning Approach to Folding:

A Motion Planning Approach to Folding: A Motion Planning Approach to Folding: From Paper Craft to Protein Folding Λ Guang Song Nancy M. Amato Department of Computer Science Texas A&M University College Station, TX 77843-32 fgsong,amatog@cs.tamu.edu

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding?

Outline. The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation. Unfolded Folded. What is protein folding? The ensemble folding kinetics of protein G from an all-atom Monte Carlo simulation By Jun Shimada and Eugine Shaknovich Bill Hawse Dr. Bahar Elisa Sandvik and Mehrdad Safavian Outline Background on protein

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information

Motif Prediction in Amino Acid Interaction Networks

Motif Prediction in Amino Acid Interaction Networks Motif Prediction in Amino Acid Interaction Networks Omar GACI and Stefan BALEV Abstract In this paper we represent a protein as a graph where the vertices are amino acids and the edges are interactions

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall How do we go from an unfolded polypeptide chain to a Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2004 How do we go from an unfolded polypeptide chain to a compact folded protein? (Folding of thioredoxin, F. Richards) Structure - Function

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Manifold Learning and it s application

Manifold Learning and it s application Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception

More information

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed.

Many proteins spontaneously refold into native form in vitro with high fidelity and high speed. Macromolecular Processes 20. Protein Folding Composed of 50 500 amino acids linked in 1D sequence by the polypeptide backbone The amino acid physical and chemical properties of the 20 amino acids dictate

More information

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water?

Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Can a continuum solvent model reproduce the free energy landscape of a β-hairpin folding in water? Ruhong Zhou 1 and Bruce J. Berne 2 1 IBM Thomas J. Watson Research Center; and 2 Department of Chemistry,

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

Essential dynamics sampling of proteins. Tuorial 6 Neva Bešker

Essential dynamics sampling of proteins. Tuorial 6 Neva Bešker Essential dynamics sampling of proteins Tuorial 6 Neva Bešker Relevant time scale Why we need enhanced sampling? Interconvertion between basins is infrequent at the roomtemperature: kinetics and thermodynamics

More information

3D HP Protein Folding Problem using Ant Algorithm

3D HP Protein Folding Problem using Ant Algorithm 3D HP Protein Folding Problem using Ant Algorithm Fidanova S. Institute of Parallel Processing BAS 25A Acad. G. Bonchev Str., 1113 Sofia, Bulgaria Phone: +359 2 979 66 42 E-mail: stefka@parallel.bas.bg

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

Lecture 11: Protein Folding & Stability

Lecture 11: Protein Folding & Stability Structure - Function Protein Folding: What we know Lecture 11: Protein Folding & Stability 1). Amino acid sequence dictates structure. 2). The native structure represents the lowest energy state for a

More information

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding

Protein Folding & Stability. Lecture 11: Margaret A. Daugherty. Fall Protein Folding: What we know. Protein Folding Lecture 11: Protein Folding & Stability Margaret A. Daugherty Fall 2003 Structure - Function Protein Folding: What we know 1). Amino acid sequence dictates structure. 2). The native structure represents

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

arxiv:cond-mat/ v1 2 Feb 94

arxiv:cond-mat/ v1 2 Feb 94 cond-mat/9402010 Properties and Origins of Protein Secondary Structure Nicholas D. Socci (1), William S. Bialek (2), and José Nelson Onuchic (1) (1) Department of Physics, University of California at San

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES

Protein Structure. W. M. Grogan, Ph.D. OBJECTIVES Protein Structure W. M. Grogan, Ph.D. OBJECTIVES 1. Describe the structure and characteristic properties of typical proteins. 2. List and describe the four levels of structure found in proteins. 3. Relate

More information

INTELLIGENT MOTION PLANNING AND ANALYSIS WITH PROBABILISTIC ROADMAP METHODS FOR THE STUDY OF COMPLEX AND HIGH-DIMENSIONAL MOTIONS.

INTELLIGENT MOTION PLANNING AND ANALYSIS WITH PROBABILISTIC ROADMAP METHODS FOR THE STUDY OF COMPLEX AND HIGH-DIMENSIONAL MOTIONS. INTELLIGENT MOTION PLANNING AND ANALYSIS WITH PROBABILISTIC ROADMAP METHODS FOR THE STUDY OF COMPLEX AND HIGH-DIMENSIONAL MOTIONS A Dissertation by LYDIA TAPIA Submitted to the Office of Graduate Studies

More information

Using Motion Planning to Study Protein Folding Pathways Λ

Using Motion Planning to Study Protein Folding Pathways Λ Using Motion Planning to Study Protein Folding Pathways Λ Guang Song Department of Computer Science Texas A&M University College Station, TX 77843-3 gsong@cs.tamu.edu Nancy M. Amato Department of Computer

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Computer simulations of protein folding with a small number of distance restraints

Computer simulations of protein folding with a small number of distance restraints Vol. 49 No. 3/2002 683 692 QUARTERLY Computer simulations of protein folding with a small number of distance restraints Andrzej Sikorski 1, Andrzej Kolinski 1,2 and Jeffrey Skolnick 2 1 Department of Chemistry,

More information

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key

Biology Chemistry & Physics of Biomolecules. Examination #1. Proteins Module. September 29, Answer Key Biology 5357 Chemistry & Physics of Biomolecules Examination #1 Proteins Module September 29, 2017 Answer Key Question 1 (A) (5 points) Structure (b) is more common, as it contains the shorter connection

More information

Molecular Mechanics, Dynamics & Docking

Molecular Mechanics, Dynamics & Docking Molecular Mechanics, Dynamics & Docking Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine Larry.Hunter@uchsc.edu http://compbio.uchsc.edu/hunter

More information

Modeling Biological Systems Opportunities for Computer Scientists

Modeling Biological Systems Opportunities for Computer Scientists Modeling Biological Systems Opportunities for Computer Scientists Filip Jagodzinski RBO Tutorial Series 25 June 2007 Computer Science Robotics & Biology Laboratory Protein: πρώτα, "prota, of Primary Importance

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Protein Folding Prof. Eugene Shakhnovich

Protein Folding Prof. Eugene Shakhnovich Protein Folding Eugene Shakhnovich Department of Chemistry and Chemical Biology Harvard University 1 Proteins are folded on various scales As of now we know hundreds of thousands of sequences (Swissprot)

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Protein folding. Today s Outline

Protein folding. Today s Outline Protein folding Today s Outline Review of previous sessions Thermodynamics of folding and unfolding Determinants of folding Techniques for measuring folding The folding process The folding problem: Prediction

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence

Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Number sequence representation of protein structures based on the second derivative of a folded tetrahedron sequence Naoto Morikawa (nmorika@genocript.com) October 7, 2006. Abstract A protein is a sequence

More information

Understanding Protein Flexibility Through. Dimensionality Reduction

Understanding Protein Flexibility Through. Dimensionality Reduction Understanding Protein Flexibility Through Dimensionality Reduction Miguel L. Teodoro mteodoro@rice.edu tel: 713-348-3051 Department of Biochemistry and Cell Biology and Department of Computer Science Rice

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig

Molecular Modelling. part of Bioinformatik von RNA- und Proteinstrukturen. Sonja Prohaska. Leipzig, SS Computational EvoDevo University Leipzig part of Bioinformatik von RNA- und Proteinstrukturen Computational EvoDevo University Leipzig Leipzig, SS 2011 Protein Structure levels or organization Primary structure: sequence of amino acids (from

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

The protein folding problem consists of two parts:

The protein folding problem consists of two parts: Energetics and kinetics of protein folding The protein folding problem consists of two parts: 1)Creating a stable, well-defined structure that is significantly more stable than all other possible structures.

More information

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University

Thermodynamics. Entropy and its Applications. Lecture 11. NC State University Thermodynamics Entropy and its Applications Lecture 11 NC State University System and surroundings Up to this point we have considered the system, but we have not concerned ourselves with the relationship

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Introduction to" Protein Structure

Introduction to Protein Structure Introduction to" Protein Structure Function, evolution & experimental methods Thomas Blicher, Center for Biological Sequence Analysis Learning Objectives Outline the basic levels of protein structure.

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Protein Folding. I. Characteristics of proteins. C α

Protein Folding. I. Characteristics of proteins. C α I. Characteristics of proteins Protein Folding 1. Proteins are one of the most important molecules of life. They perform numerous functions, from storing oxygen in tissues or transporting it in a blood

More information

On the Symmetric Molecular Conjectures

On the Symmetric Molecular Conjectures On the Symmetric Molecular Conjectures Josep M. Porta, Lluis Ros, Bernd Schulze, Adnan Sljoka, and Walter Whiteley Abstract A molecular linkage consists of a set of rigid bodies pairwise connected by revolute

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

PRM Roadmap Query. PRM Roadmap after Node Generation. goal. C obst. C obst. C obst. C obst. C obst. C obst. C obst. C obst. C Space. start.

PRM Roadmap Query. PRM Roadmap after Node Generation. goal. C obst. C obst. C obst. C obst. C obst. C obst. C obst. C obst. C Space. start. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures 1 Nancy M. Amato amato@cs.tamu.edu Guang Song gsong@cs.tamu.edu Technical Report TR1-1 PARASOL

More information

Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion

Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion Stochastic Conformational Roadmaps for Computing Ensemble Properties of Molecular Motion Mehmet Serkan Apaydın, Douglas L. Brutlag, Carlos Guestrin, David Hsu, and Jean-Claude Latombe Stanford University,

More information

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron.

Protein Dynamics. The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Protein Dynamics The space-filling structures of myoglobin and hemoglobin show that there are no pathways for O 2 to reach the heme iron. Below is myoglobin hydrated with 350 water molecules. Only a small

More information

Simulating Folding of Helical Proteins with Coarse Grained Models

Simulating Folding of Helical Proteins with Coarse Grained Models 366 Progress of Theoretical Physics Supplement No. 138, 2000 Simulating Folding of Helical Proteins with Coarse Grained Models Shoji Takada Department of Chemistry, Kobe University, Kobe 657-8501, Japan

More information

The prediction of membrane protein types with NPE

The prediction of membrane protein types with NPE The prediction of membrane protein types with NPE Lipeng Wang 1a), Zhanting Yuan 1, Xuhui Chen 1, and Zhifang Zhou 2 1 College of Electrical and Information Engineering Lanzhou University of Technology,

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Master equation approach to finding the rate-limiting steps in biopolymer folding

Master equation approach to finding the rate-limiting steps in biopolymer folding JOURNAL OF CHEMICAL PHYSICS VOLUME 118, NUMBER 7 15 FEBRUARY 2003 Master equation approach to finding the rate-limiting steps in biopolymer folding Wenbing Zhang and Shi-Jie Chen a) Department of Physics

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data

A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data A Principled Comparative Analysis of Dimensionality Reduction Techniques on Protein Structure Decoy Data Rohan Pandit 1 Amarda Shehu 2,3,4, 1 Thomas Jefferson High School, Alexandria, VA 2 Dept. of Computer

More information

Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements

Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements Free Radical-Initiated Unfolding of Peptide Secondary Structure Elements Thesis of the Ph.D. Dissertation by Michael C. Owen, M.Sc. Department of Chemical Informatics Faculty of Education University of

More information

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA

Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Supporting Information Protein Structure Determination from Pseudocontact Shifts Using ROSETTA Christophe Schmitz, Robert Vernon, Gottfried Otting, David Baker and Thomas Huber Table S0. Biological Magnetic

More information

Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12

Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12 Quantitative Stability/Flexibility Relationships; Donald J. Jacobs, University of North Carolina at Charlotte Page 1 of 12 The figure shows that the DCM when applied to the helix-coil transition, and solved

More information

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation

Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Polypeptide Folding Using Monte Carlo Sampling, Concerted Rotation, and Continuum Solvation Jakob P. Ulmschneider and William L. Jorgensen J.A.C.S. 2004, 126, 1849-1857 Presented by Laura L. Thomas and

More information

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1

F. Piazza Center for Molecular Biophysics and University of Orléans, France. Selected topic in Physical Biology. Lecture 1 Zhou Pei-Yuan Centre for Applied Mathematics, Tsinghua University November 2013 F. Piazza Center for Molecular Biophysics and University of Orléans, France Selected topic in Physical Biology Lecture 1

More information

Local Interactions Dominate Folding in a Simple Protein Model

Local Interactions Dominate Folding in a Simple Protein Model J. Mol. Biol. (1996) 259, 988 994 Local Interactions Dominate Folding in a Simple Protein Model Ron Unger 1,2 * and John Moult 2 1 Department of Life Sciences Bar-Ilan University Ramat-Gan, 52900, Israel

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

Low-Dimensional Free Energy Landscapes of Protein Folding Reactions by Nonlinear Dimensionality Reduction

Low-Dimensional Free Energy Landscapes of Protein Folding Reactions by Nonlinear Dimensionality Reduction Appeared in PNAS 103(26):9885-9890, 2006 Low-Dimensional Free Energy Landscapes of Protein Folding Reactions by Nonlinear Dimensionality Reduction Payel Das 1, Mark Moll 2, Hernan Stamati 2, Lydia E. Kavraki

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran

Geometrical Concept-reduction in conformational space.and his Φ-ψ Map. G. N. Ramachandran Geometrical Concept-reduction in conformational space.and his Φ-ψ Map G. N. Ramachandran Communication paths in trna-synthetase: Insights from protein structure networks and MD simulations Saraswathi Vishveshwara

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

BIOINFORMATICS. Markov Dynamic Models for Long-Timescale Protein Motion

BIOINFORMATICS. Markov Dynamic Models for Long-Timescale Protein Motion BIOINFORMATICS Vol. 00 no. 00 2010 Pages 1 9 Markov Dynamic Models for Long-Timescale Protein Motion Tsung-Han Chiang 1, David Hsu 1 and Jean-Claude Latombe 2 1 Department of Computer Science, National

More information

Conformational Geometry of Peptides and Proteins:

Conformational Geometry of Peptides and Proteins: Conformational Geometry of Peptides and Proteins: Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information