6 Similarities between principal components of protein dynamics and random diffusion

Size: px
Start display at page:

Download "6 Similarities between principal components of protein dynamics and random diffusion"

Transcription

1 6 Similarities between principal components of protein dynamics and random diffusion B. Hess Phys. Rev. E 6: () Principal component analysis, also called essential dynamics, is a powerful tool for finding global, correlated motions in atomic simulations of macromolecules. It has become an established technique for analyzing molecular dynamics simulations of proteins. he first few principal components of simulations of large proteins often resemble cosines. We derive the principal components for high-dimensional random diffusion, which are almost perfect cosines. his resemblance between protein simulations and noise implies that for many proteins the time scales of current simulations are too short to obtain convergence of collective motions. 8

2 6 PC s of proteins and diffusion 6. Introduction his chapter focuses on the use of covariance analysis as a tool to gain information about the conformational freedom of proteins. Knowledge about the conformational freedom of a protein can give insight into the stability and functional motions of the protein. Conformational freedom is related to the potential energy: in equilibrium the distribution of conformations can, in principle, be calculated from the potential. In practice this is not possible, since proteins are too complex, not only because of their high dimensionality, but also because the energy landscape usually consists of many minima. Normal mode analysis can be used to analyze any of these potential energy minima, but it does not take entropy due to the occupation of several minima into account, which plays an important role at room temperature. For reviews on normal mode and principal component analysis applied to proteins see [55, 56]. he first application of principal component analysis to macromolecules was performed to estimate the configurational entropy [57]. More recently the application of principal component analysis to proteins has been termed molecule optimal dynamic coordinates [58, 59] and essential dynamics [6]. It has now become a standard technique for analyzing molecular dynamics trajectories. However, the effects of insufficient sampling on the results are not well understood. Covariance analysis, also called principal component analysis, is a mathematical technique for analyzing high-dimensional data sets. Essentially, it defines a new coordinate system for the data set, with the special property that the covariance is zero for any two coordinates. In this sense these new coordinates can be called uncorrelated. hese coordinates are to be ordered according to the variance of the data in that coordinate. his can allow for a reduction of the dimensionality of the space by neglecting the coordinates with small variance, thus concentrating on the coordinates with larger spread or fluctuations. he procedure for N-dimensional data x(t) is as follows. First the covariance matrix has to be constructed. he covariance C ij of coordinate i and coordinate j is defined as: C ij x i x i x j x j where an over-line denotes the average over all data points. he data can consist of a finite number of N-dimensional points, in which case the average is a summation. x(t) can also be an N-dimensional function; then the average is an integral. he symmetric N N matrix C can be diagonalized with an orthonormal transformation matrix R: R CR diag( ; ;:::; N ) he columns of R are the eigenvectors or principal modes. he eigenvalues are equal to the variance in the direction of the corresponding eigenvector; they can be chosen such that ::: N. he original data can be projected on the eigenvectors to 8

3 6. Analysis of random diffusion give the principal components p i ; i ;::: N: p R (x x) When the data are the result of a dynamic process, the principal components are a function of time and p (t) will be the principal component with the largest mean square fluctuation. Note that if the system obeys Newton s laws every coordinate has to be weighted with the square root of the mass to obtain physically relevant dynamic principal components. For a system moving in a (quasi-) harmonic potential, the mass-weighted principal modes are similar to the normal modes, which are the eigenvectors of the mass-weighted Hessian of the potential function in the energy minimum. he advantage of normal mode analysis is that it depends only on the shape of the potential energy surface; however, its use is restricted to systems for which the motion is a single potential well is a relevant representation of its dynamics. Covariance analysis can be applied to any dynamic system, but the results will also depend on the sampling. he problem is how to separate intrinsic properties of the system from sampling artifacts. As was shown by Linssen [6], the principal components of multidimensional random diffusion are cosine shaped, with amplitudes inversely proportional to the eigenvalue ranking number. his behavior is very similar to that observed from simulations of proteins. In order to interpret protein data correctly, it is important that the behavior of random diffusion is basically understood. In the next section we will perform a theoretical analysis of random diffusion. he results of random diffusion depend only on the sampling, because there is no potential. 6. Analysis of random diffusion N-dimensional diffusion is described by a system of N independent stochastic differential equations: dx i (t) r i (t) i ;:::;N (6.) r i (t) is a stationary stochastic noise term, which is ffi-correlated: h r i (t) i (6.) h r i (t) r j (t fi ) i Dffi ij ffi(fi ) (6.3) where h i denotes the expectation, ffi ij is the Kronecker delta and ffi(fi ) is the Dirac delta function. he solution of this system is a set of N non-stationary stochastic functions x i (t), which represents a Wiener process [6]. he displacements x i (t fi ) x i (t) are Gaussian distributed with: h x i (t fi ) x i (t) i (6.4) 83

4 6 PC s of proteins and diffusion h (x i (t fi ) x i (t))(x j (t fi ) x j (t)) i Dfiffi ij (6.5) We will use this continuum description for the analysis of random diffusion. Realizations can be generated only for a finite number of time points. his is not a problem, since the results of the covariance analysis will be virtually identical when we have more than N time points. When the number of time points n is smaller than N, the results for first n principal components will be the same, the last principal components will all be zero. We will now prove that the first few principal components of the randomly diffusing system (6.)-(6.3) are cosines, by expanding the solution x(t) in an appropriate series of functions. We are interested in the eigenvalues and eigenvectors of the covariance matrix C and in the functional shape of the principal components: C ij (x i (t) x i )(x j (t) x j ) (6.6) p k (t) e k (x(t) hxi) (6.7) where e k is eigenvector k and x i is the average of x i : x i x i (t) (6.8) he eigenvectors are ordered according to descending eigenvalue. he first eigenvector e is the vector that produces the linear combination of x i s with the largest mean square fluctuation. hus e is the vector with ke k which maximizes: max kek (e (x(t) hxi)) (6.9) We can write x(t) hxi as a sum of a series of orthonormal functions f k : max kek ψ e X c k f k (t)! (6.) k f k (t)f l (t) ffi kl (6.) c k i f k (t)(x i (t) hx i i) (6.) We can choose the f k such that f k (t) is proportional to the projection on eigenvector k: f k (t) ο e k (x(t) hxi), thus f k will contribute only to k. f (t) is the function that maximizes the variance: max max kek f e c f (t) max max e c kek f (6.3) 84

5 6. Analysis of random diffusion It can easily be derived that e is proportional to c : e c p c c (6.4) When N is, f (t) is proportional to x (t) hx i.asn gets larger, the set of x i s will form a better representation of the full ensemble. For large N we can approximate with an ensemble average: max f c c ß max f he integral over f k is zero: f k (t) ο D c c E D E max Nc f i c i Λ (6.5) e k (x(t) hxi) (6.6) For large N we can approximate f by the function that maximizes Λ, we will call this function f Λ. Since the integral over f is zero and we want to approximate f, we demand that the integral over f Λ is zero as well. Using this we can calculate the expectation of c i c i : fi fl *ψ! Z t c i f Λ (t)x i(t) D f Λ (u)du (6.7) he full derivation is given in appendix 6.A. When we define g(t) as: Z t g(t) f Λ (u)du (6.8) we can rephrase the optimization problem in terms of g and add the constraints (6.) and (6.6) using Lagrange multipliers μ and μ. We have to find the function g which maximizes: with the boundary conditions: (g(t)) μ g (t) μ ((g (t)) ) g() g( ) Z L(t; g; g ) (6.9) f Λ (t) (6.) f Λ (t) (6.) According to the Euler-Lagrange formalism, the optimal g is the @g g(t) μ g (t) (6.) 85

6 6 PC s of proteins and diffusion he solutions are: g k (t) C sin ψ ßkt! with k ; ; ; 3;::: (6.3) g only satisfies the boundary conditions when C, so we discard this function. By differentiating g k and using equation (6.) we obtain a set of orthonormal functions h k : h k (t) dg k(t) s ψ ßkt cos! with k ; ; 3;::: (6.4) We have to find the h k that maximizes Λ: f Λ h k ) Λ D E Nc i c ND i (6.5) ß k hus the best guess for the projection on the first eigenvector f Λ is h. We can apply the same procedure for fk Λ, with the extra restriction that equation (6.) should hold for all» l<k. he result is: f Λ k h k (6.6) Λ k D E Nc i c ND i (6.7) ß k he derivation of Λ k is based on expectations and statistical fluctuations are not taken into account. Since the difference between Λ and Λ is a factor of 4, we expect that the projection on the first eigenvector is very close to a half cosine. But for large k the statistical fluctuations might cause mixing of the cosines. he projections fk Λ suggest that the covariance matrix can be approximately diagonalized with a matrix of cosine coefficients. o test this approximation we have to write each x i as a sum of cosines: s X x i (t) hx i i c k i f Λ k (t) X (6.8) k k ψ! ßkt c k i cos where c k i is defined as: Z s c k i x i (t)f Λ k (t) x i (t) cos ψ ßkt! (6.9) he cosine coefficients are zero on average and uncorrelated: D c k i E where k ; ; 3;::: (6.3) D c k i c l j E D ß kl ffi ijffi kl where k; l ; ; 3;::: (6.3) 86

7 6. Analysis of random diffusion Full derivations are given in appendix 6.A. he covariance matrix can be expressed in terms of cosine coefficients: C ij X k C Λ ij X X c k i f Λ k (t) k l c l jf Λ l (t) (6.3) c k i c k j (6.33) k c k i c k j X kn X kn c k i c k j (6.34) c k i c k j (6.35) From (6.3) we can see that on average C Λ ij is about N times larger than the second term in equation (6.35). We can write C Λ as a matrix product which involves a diagonal matrix and a matrix with independent stochastic elements with variance : C Λ ij k ψp D ßk!ψ ßk pd ck i!ψ!ψp ßk D pd ck j ßk! Y XX Y ij (6.36) where Y is a diagonal matrix with Y kk Dß k. Recently Bai and Silverstein proved that for large N the eigenvalues of (N)C Λ and Y have the same separation [63]. his means that when there are intervals that contain no eigenvalues the number of eigenvalues in the intervals between such intervals are equal for both matrices. Since Y is a diagonal matrix with separated eigenvalues Y kk Λ kn, the eigenvalues of C Λ and Y are identical for large N. Because C Λ is a good approximation of C, the eigenvalues of C are Λ k. We want to prove that the projections on the first few eigenvectors of C resemble cosines. When this is the case, we can approximate the eigenvector matrix of C with a matrix of the first N normalized cosine coefficients for each coordinate: for i; k ;:::;N : R Λ ik ß p ck c k c k i (6.37) qh c k c k i ck i (6.38) ßk pnd ck i (6.39) R ik (6.4) For reasonably large N, R will be a good approximation of R Λ. he use of R instead of R Λ simplifies the analysis. On average the rows and columns of R are uncorrelated 87

8 6 PC s of proteins and diffusion and have norm : * N X k * N X i R ikr jk R ikr il However, the columns are only approximately orthogonal: *ψ N! X *ψ! for i 6 j : RikR ß kl il ND i N ffi ij (6.4) ffi kl (6.4) ψ ß kl ND! We can use the matrix R to transform the covariance matrix: B ij (R CR ) ij l ßi pnd ci l ß ij ND 3 R li l m m X k X c i l c j m l m k i j i fi c k i c k i c k j c l ic l j flfi c l i fl (6.43) (6.44) (6.45) C lm R mj (6.46) c k l c k ßj m pnd cj m (6.47) c k l c k m (6.48) Since the columns of R are not completely orthogonal, matrix B will not inherit all the properties of C. We hope that the matrix B is approximately diagonal. o check this we have to calculate the expectation of each matrix element: h B ii i *ψ ß i X N ND 3 X c i l c i m l m k c i X N l l m c i X l c k l ( ffiik ) l k ψ* ß i X N ND 3 * N X c k l c k m! c i m ( ffilm ) * N X! c i 4 l (6.49) (6.5) l ψ Di N N N ψ! ß Nß i 4 i 3N! (6.5) 6 i i 4 D N 6 (6.5) ß i h B ij i for i 6 j (6.53) 88

9 6.3 Validation where we have used the fourth moment of a Gaussian distribution for D c i l4 E. he trace of B is twice as large as the trace of C. Comparing B ii with Λ i shows that this difference is caused by the term 6, which arises from correlations between the columns of R. he 6 is negligible for i fi p N. he expectation of the off-diagonal elements is zero, because every term contains at least one c with an odd power. For the same reason the cross correlation between elements of B is zero: h B ij B kl i for i 6 j or k 6 l (6.54) he off-diagonal elements are zero on average, but they might be very large. he variance of the off-diagonal elements of B is: D (B ij ) E 4D ψ 9 3ß ψ i j! ψ! N 3 ß 4 i N 4 O 4 j 4 i j N! (6.55) he derivation is given in appendix 6.A. When i fi p N, the off-diagonal elements are order p N, which is p N smaller than the diagonal elements. Although there are N off-diagonal elements, they are completely uncorrelated. he sum of N random elements is p N larger than each element and also random. A cumulative effect can be achieved for one inner product of an eigenvector with a row of B, but for an eigenvalue of order N the inner products with all N rows should be N times larger than the eigenvector elements and of the correct sign. Since this is impossible and for i fi p N the eigenvalues of C are B ii O(), the projections will resemble cosines. When i p N, the off-diagonal elements are of the same order as the diagonal elements and the resemblance will be lost. 6.3 Validation o see how much real principal components resemble cosines, we have to compare the theory with results from Langevin dynamics simulations. An x i (t) can only be generated for a finite number of time points. his can be done using the algorithm x i ((n )fi ) x i (nfi ) x. he value of x i () is irrelevant. x should be drawn from a Gaussian distribution which obeys equations (6.4) and (6.5). When fi is small, the distribution does not have to be Gaussian, since the convolution of many distributions will converge to a Gaussian distribution. he simulations and covariance analysis were performed with the Gromacs package[3]. For the simulations we always used steps and a convolution of 4 uniform distributions, which corresponds to 4 steps with an uniform distribution. Figure 6. shows the eigenvalues obtained from a simulation and the B ii from equation (6.46) with normalized eigenvector length for N, D :5 and 89

10 6 PC s of proteins and diffusion Eigenvalue 5 3 eigenvalues eigenvalue estimates using cosines π i 5 5 Index Figure 6.: he eigenvalues of the covariance matrix of a -dimensional Brownian dynamics simulation. he eigenvalue estimates are the B ii from equation (6.46), using the normalized vectors of equation (6.37). 9

11 6.3 Validation 4 pc 5 pc 4 pc 3 pc pc ime Figure 6.: he first 5 principal components of a -dimensional Brownian dynamics simulation.. he first 3 eigenvalues are predicted quite accurately. After eigenvalues the diagonal elements of B are too small compared to the off-diagonal elements to be a reasonable approximation of the eigenvalues. he predictions are close to the estimated values (equation (6.5)), the term 6 in equation (6.5) starts to dominate for i >. When this term is discarded and the eigenvalues are estimated by Λ i (equation (6.7)), a good correspondence is obtained over the whole range. Using Λ i it can be calculated that on average the first 3 eigenvalues contain 8% of the mean square fluctuation when N and 84% for large N. he first 5 principal components are shown in Figure 6., they resemble cosines. he cosine content of an eigenvector can be determined by taking inner products with columns of R Λ. Because these inner products fluctuate heavily, we have calculated average inner products over Langevin dynamics simulations with N set to 3, and 48. Figure 6.3 shows average inner products of eigenvector i with column i of R Λ. he first eigenvector is almost a perfect cosine, with a inner product p larger than.996 for all three system sizes. he inner products are one half for i N, which is exactly the behavior we expected. he cosine-like principal components lead to strange dynamic effects. One example is the mean square displacement, which is proportional to time for a single pure diffusive coordinate. he mean square displacement along the cosine-shaped principal compo- 9

12 6 PC s of proteins and diffusion.8 3 dimensions dimensions 48 dimensions Average inner product Principal component index Figure 6.3: he average inner products of the eigenvectors of the covariance matrix and the estimated eigenvectors R Λ (equation (6.37)). he inner products were averaged over simulations for each system size. Since the columns of R are cosine coefficients, these inner products show how much each principal component resembles a cosine with the number of periods equal to half the principal component index. he inner product is one half at the square root of the system size. he error bars show the standard deviation. 9

13 6.3 Validation.. pc pc pc 3 pc 4 pc 5 all coordinates MSD..... ime Figure 6.4: he mean square displacement of the first 5 principal components of a -dimensional Brownian simulation. he principal components are ballistic (MSD ο t ) over a large range of times. he MSD of the whole system is proportional to time. nents is proportional to the time squared. his can be shown with a simple derivation. Principal component i can be approximated as: p 4ND ßit p i (t) ß cos (6.56) iß From this we can calculate the mean square displacement along eigenvector i: msd i (t) ß Z t 4ND t i ß 8ND i ß sin ißt ψ ψ cos ißy cos ißt iß( t) sin ψ!! iß(y t) dy (6.57)! (6.58) ß ND t for t< (6.59) i he mean square displacements of the principal components for the -dimensional Brownian system (Figure 6.) are shown in Figure 6.4. he whole system behaves diffusively, but the first few eigenvectors exhibit ballistic motion. 93

14 6 PC s of proteins and diffusion 6.4 Protein simulations Molecular simulations of macromolecules are good examples of high-dimensional systems where principal component analysis can be useful. It will reveal global motions when they are present in the system, without having to visually inspect the whole trajectory. In order to analyze internal motions of a molecule, the overall rotation has to be removed. he most simple way to accomplish this is least squares fitting each structure on a reference structure. Care should be taken when applying this procedure. Because different reference structures will lead to different orientations of the structures with respect to each other, the reference structure should be representative for the whole ensemble of structures. When large movements take place the fit might not be well determined and the fitted structures might jump between different orientations. his problem is most apparent in elongated linear chains, such as polymers, where the rotational orientation around the chain axis is not well defined. We will present principal component analysis results for three different protein systems. In all these systems the dynamics of the first few principal components is mainly diffusive, but the sampled part of the free-energy landscape has different characteristics for the different simulations. he three simulations and the covariance analyses were performed with the Gromacs package[3]. Ompf porin in a bilayer Ompf is a trimeric protein that consists of three fi-barrels of 34 residues each. he protein was simulated with molecular dynamics (MD) at ph 3 in a bilayer of 53 DMPC lipids, surrounded by 49 SPC water molecules and 66 chloride ions. his system of 583 atoms was simulated for 3 nanoseconds with a time step of femtoseconds, the coordinates were written to file every picoseconds. A twin-range cut-off was used:. nm for the Lennard-Jones and short range Coulomb interactions and.8 nm for the long-range Coulomb interactions, updated every steps. he temperature and pressure were coupled to 3 K and bar respectively. he system is similar to the system in [64]. A description of the force-field parameters can be found in [65]. We split the trajectory into 3 parts of one nanosecond. Covariance analyses were performed for each part on the C ff atoms of monomer one (34 atoms, dimensions). All structures were least squares fitted on the C ff atoms of the monomer one at time. Since there are structures in each part, there are only 6 94 nonzero eigenvalues. he eigenvalues of the covariance matrix for time - ns are shown in Figure 6.5. he eigenvalues decay with approximately a power of -.. he first 5 principal components are shown in Figure 6.6. he first 4 resemble cosines with the number of periods equal to half the eigenvalue index. hus the sampling in these directions is far from complete. he principal components resemble those of the Brownian system, which suggests that the protein might be diffusing on a part of the free-energy landscape 94

15 6.4 Protein simulations..5 Eigenvalue (nm ) Index Figure 6.5: Eigenvalues of the covariance matrix of subunit of a Ompf trimer. he analysis was done over time to nanoseconds of the MD simulation. that is almost flat in a few dimensions. If this is the case, then these eigenvectors do not describe relevant motions, but only give an indication in which directions the protein is more free to move. o check the relevance of the eigenvectors, we can look at the overlap between subspaces spanned by eigenvectors for different parts of the simulation. he overlap between two sets of n orthonormal vectors v ;:::;v n and w ;:::;w n can be quantified as follows: overlap(v; w) nx nx (v i w j ) (6.6) n i j We can also look at the total overlap s of two covariance matrices, where the squared inner products of the eigenvectors are weighted with the square root of the product of the corresponding eigenvalues. his measure s is introduced in chapter 7, equations (7.8) to (7.). When sets v and w span the same subspace, the overlap is. he overlap between the subspace spanned by the first eigenvectors is.3 between the first and second nanosecond,.6 between the second and third nanosecond and. between the first and third nanosecond. he weighted overlap s of the total space is. between the first and the second and the second and the third nanosecond and.8 between the first and the third nanosecond. his means that one nanosecond is not enough to get a good 95

16 6 PC s of proteins and diffusion pc 5 pc 4 pc 3 pc pc ime (ns) Figure 6.6: he first 5 principal components (nm) of subunit of the Ompf trimer. he analysis was done over time to nanoseconds of the MD simulation. Each principal components is fitted to a cosine of with number of periods equal to half the index. 96

17 6.4 Protein simulations MSD (nm ).. pc ns pc ns pc 3 ns pc 4 ns pc 5 ns pc 3ns t.5 t. ime (ps) Figure 6.7: he mean square displacement of the first 5 principal components of subunit of the Ompf trimer. he analysis was done over time to nanoseconds of the MD simulation. he first principal component for the analysis over to 3 nanoseconds is shown as well. hese principal components are subdiffusive on short time scales. he first principal components show ballistic behavior, due to their cosine shape (see Figure 6.6). impression of the conformational freedom of this protein. We have also calculated the mean square displacement along eigenvectors, these are shown in Figure 6.7. On time scales below ps the behavior is sub diffusive, which is reasonable for a set of connected atoms on a relatively flat part of the free energy landscape. On longer time scales eigenvectors and go ballistic. his is not a property of the system, but rather an artifact of the short simulation time in combination with the covariance analysis, which filters ballistic motions out of a diffusive system. his can be illustrated by doing the same analysis for the whole 3 nanoseconds. Again the principal components look like cosines, but now the wavelength is 3 times as long. hus the cosine behavior is linked to the analysis interval and not to inherent properties of the protein. he mean square displacement looks similar as well, but the transition to ballistic motion is shifted by approximately a factor of 3 (see Figure 6.7). 97

18 6 PC s of proteins and diffusion i (nm ) k kj mol nm a () fi (ps) fl u ps able 6.: Properties of the first 5 eigenvectors of the fi-hairpin simulation. i is the eigenvector index, is the eigenvalue, k k B is an estimate of the harmonic force constant in the direction of eigenvector i, a and fi are the result of an exponential fit a exp( tfi ) from to ps to the autocorrelation function of principal component i, fl kfi is a rough estimate of the friction coefficient. fi-hairpin in water he second system is a 6 residue fi-hairpin solvated in a box of 44 SPC water molecules and 3 sodium ions. his system was simulated for nanoseconds, structures were written to trajectory every. ps. A complete description of the simulation setup can be found in [66]. We performed the covariance analysis on the 4 structures from 5.3 to 7.7 ns. We chose this period, because the peptide seems to reside in one free energy minimum for these.4 nanoseconds. Because the termini are very flexible, only the backbone atoms of residue to 5 were included in the analysis. he first 5 eigenvalues can be found in able 6.; they contain 65% of the fluctuations and the first eigenvalues contain 8% of the fluctuations. he eigenvectors are well defined for this part of the simulation, the subspace overlap between the first and second. nanosecond intervals is.89 for the first 5 eigenvectors, the weighted overlap (expression 7.) of the total space is.84. he first five principal components are shown in Figure 6.8. hese look very noisy and resemble diffusion in a harmonic potential; their distributions are almost Gaussian. he force constants k for the harmonic well can be estimated from the eigenvalues and the temperature (see able 6.). Kinetic information can be obtained from the autocorrelation functions of the principal components, shown in Figure 6.9. here is a rapid drop in the first half picosecond followed by an approximately exponential decay. After ps the curves level off, which is an indication that the peptide is not diffusing in a single harmonic well, but in several connected wells. he fast effect can also be found in the velocity autocorrelation functions of the eigenvectors functions; these have a negative peak at.3 to.4 ps. his time q scale corresponds to the momentum effect, which should occur on a time scale of mk, where m is approximately amu. he long time scale motions are over damped, since the velocity autocorrelation is almost zero after a few picoseconds. In a diffusive system the correlations decay exponentially with fi flk, where fl is the friction coefficient We can estimate the friction coefficient 98

19 6.4 Protein simulations pc 5 pc 4 pc 3 pc pc ime (ns) Figure 6.8: he first 5 principal components (nm) of the fi-hairpin..8 pc pc pc 3 pc 4 pc 5 Autocorrelation ime (ps) he autocorrelation of the first 5 principal components of the fi- Figure 6.9: hairpin. 99

20 6 PC s of proteins and diffusion pc 5 pc 4 pc 3 pc pc ime (ns) Figure 6.: he first 5 principal components (nm) of HPr. for the first 3 eigenvectors, by fitting the autocorrelation function with an exponential from to ps (see able 6.). HPr in water he last system is the 85-residue protein HPr simulated in a box of 535 SPC water molecules. he total simulation time was 5 nanoseconds, structures were written to a trajectory file every picosecond. We left the first nanosecond out of the analysis to remove equilibration effects. he first 5 and eigenvalues contain 63% and 74% of the fluctuations respectively. he first 5 principal components show plateaus with distinct jumps at.5 and 4. nanoseconds (Figure 6.). he global shape of the first and second eigenvector resemble a half and full cosine. he jumping behavior can also be observed in the root mean square deviation (RMSD) matrix (Figure 6.), which has several light triangles along the diagonal. he structures can be clustered on RMSD with for instance a Jarvis-Patrick algorithm. In this algorithm a list of the 9 closest neighbors is constructed for each structure. wo structures are in the same cluster when they are in each others list and have a least 3 list members in common. he algorithm finds 3 large clusters (Figure 6.). he first 5 eigenvectors mainly describe the difference between the clusters. Because of the jumping behavior the overlap between the first and the second half of the 4 nanoseconds is only.3 and.46 for the subspace spanned by the first

21 6.4 Protein simulations ime (ns) ime (ns) RMSD (nm).6 Figure 6.: Matrix showing the RMS deviation of the C ff s HPr for each pair of structures in the upper left half. he lower right half shows the result of Jarvis- Patrick clustering. A black dot means that two structures are in the same cluster. here are 3 large clusters.

22 6 PC s of proteins and diffusion 5 and eigenvectors respectively. he weighted overlap (expression 7.) of the total space is.43. When the simulation is prolonged, jumps to new clusters might occur, which can cause large changes in the eigenvectors. 6.5 Discussion Principal component analysis is considered to be a powerful tool for finding large scale motions in proteins. he advantages of the analysis do not lie in the analysis itself, but in the fact that most of the fluctuations can be captured with the first few principal modes. his means that many analyses can be performed in only a few dimensions, which makes visual inspection of the results easier. his can reveal features of the free energy landscape. Kinetic properties can also be analyzed, since all the time information is still present in the principal components. However, one can not conclude from the eigenvalues alone, that only a small subspace is important. We have shown that for high-dimensional random diffusion, the first 3 eigenvalues contain 84% of the fluctuations. his does not mean that the first 3 principal modes reveal special properties of the system. In random diffusion all directions are equivalent, a second simulation will produce completely different principal modes. How can the relevance of the principal modes be determined? One should always divide the simulation into or more parts and compare the principal modes for each part. A simple measure is the subspace overlap of the first few principal modes (equation (6.6)). Amadei et al. extensively analyzed the overlap for MD simulations of protein L and Cytochrome C [67]. hey found that the overlap for the first principal components is.6 for two parts of 5 ps, irrespective of the time interval between the two parts. his indicates that the protein samples only one free energy minimum, as is the case for the fi-hairpin simulation. he overlap can be lower when the protein samples multiple minima, as was shown for HPr. In that case the principal modes describe the differences between the minima, rather than the shape of the energy minima. Although the subspace of the first 5 or principal modes might not be well defined, there will always be a splitting in a subspace of diffusive modes, which contain most of the fluctuations and a subspace of (near-) harmonic modes. his splitting will not change significantly between different simulations. We have derived that the first few principal components of high-dimensional random diffusion are a half cosine, a full cosine, one and a half cosine etc. Many protein simulations with similar principal components can be found in the literature, for instance [59, 68, 69]. When the trajectory is projected on the plane of the first two principal components, the half and full cosine produce a characteristic semi-circle, as can been seen for BPI simulations in[7]. Often these simulations are relatively short (a few nanoseconds) and of relatively large proteins (more than residues), like the Ompf simulation presented in this chapter. It is very improbable that properties of the protein will produce

23 6.5 Discussion cosine shaped atomic displacements, consisting of exactly a half period, a full period etc. It is more likely that this behavior is the result of random diffusion, since the simulation time is not long enough to repeatedly reach barriers on the free energy landscape. he first principal component, a half cosine, can be very misleading, as it seems if the protein moves from one well-defined conformation to another. he cosine shaped principal components lead to ballistic behavior on longer time scales. García et al[68] write that this behavior belongs to the Lévy flight class. However, this behavior is caused by the incomplete sampling and scales with the simulation time, as shown for Ompf, and it will disappear as a free energy minimum has been sampled to a reasonable extent, as shown for the fi-hairpin. When one sees cosine-like principal components, one should interpret the results carefully and always keep in mind that most of the fluctuations could be caused by random diffusion. he analysis is still useful, because it can separate the random diffusion degrees of freedom from the more restricted degrees of freedom. In such cases insight into the principal modes of random diffusion in a harmonic potential will make better estimates of the conformational freedom possible. In the next chapter we present simulations of proteins of reasonable size, which are long enough to analyze the local convergence. 3

24 6 PC s of proteins and diffusion 6.A Appendix: variances he variance of the integral over f (t)x i (t) can be calculated, given that: We can apply partial integration: *ψ f (t)x i (t) *ψ Z t *ψ * Z t Z t Z t Z t D! f (t) f (v)dv dx Z i(t) t f (v)dv dx i(t) f (v)dv dx i(t) Z t f (v)dv f (v)dv f (u)du Z u Z u! Z u f (w)dw * dxi (t) f (v)dvx i (t) dx i (u) du fi fi fi fi t t! f (w)dw dx i(u) du du du f (w)dwdffi(t u)du (A.) (A.) (A.3) (A.4) (A.5) (A.6) (A.7) (A.8) he expectations of c k i and ck i cl j can be calculated using integration by parts: D E * s Z c k ψ! ßkt i x i (t)cos (A.9) * p ψ! dx p i (t) ßkt sin (A.) ßk p Z * ψp! dxi (t) ßkt p sin (A.) ßk (A.) For k ; ;:::and l ; ;:::: D c k i c l j E * x i (s)cos * ß kl dx i (s) ds ψ! ßks ds ψ! ßks sin ds x j (t)cos dx j (t) ψ! ßlt sin ψ ßlt! (A.3) (A.4) 4

25 ß kl ß kl 4D ß kl ffi ij * dxi (s) ds dx j (t) Dffi(s t)ffi ij sin ψ! ßkt ψ ßlt sin sin D ß kl ffi ijffi kl he size of the off-diagonal elements of B is: D (B ij ) E where U *ψ ß ij ND 3 ψ ß ij sin ψ ßks! ψ ßks c i l c j m l m k 6.A Appendix: variances! ds sin ψ ßlt! ds sin X c k l c k m! D E (U V ND 3 ij V ji ) ψ! ß ij D E U ND 3! ψ ßlt!! (A.5) (A.6) (A.7) (A.8) (A.9) (A.) D E D E V ij V ji (A.) h UV ij i h UV ji i h V ij V ji i) (A.) X c i l c j m l m k V ij c i X N l l m c j mc i m c k l c k m ( ffi ik)( ffi jk ) (A.3) (A.4) D U E D V ij E h UV ij i * N X c i X N l c j X m c k l l m k ψ! D N O(N) ß *ψ N X! X N c i l i j ψ ß 4 c j m c i m c k m ( ffiik )( ffi jk )! 9 i 4 j 4 l m ψ! D N 3 N (3 ) O(N) ß * N X c i X N l l m ψ D ß c j m c i m i 6 j c k m c i m c j m k ψ! ß i 4 j 6 i j ψ X! N O(N)! (A.5) (A.6) (A.7) (A.8) (A.9) (A.3) 5

26 6 PC s of proteins and diffusion h V ij V ji i * N X c i X N l c j m c i X N m c j k l m k ψ! D N 3 N (3 ) O(N) ß i 4 j 4 D (B ij ) E 4D ψ 9 3ß ψ i j! ψ! N 3 ß 4 i N 4 O 4 j 4 i j N! (A.3) (A.3) (A.33) 6

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme

Structural and mechanistic insight into the substrate. binding from the conformational dynamics in apo. and substrate-bound DapE enzyme Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is the Owner Societies 215 Structural and mechanistic insight into the substrate binding from the conformational

More information

Analysis of the simulation

Analysis of the simulation Analysis of the simulation Marcus Elstner and Tomáš Kubař January 7, 2014 Thermodynamic properties time averages of thermodynamic quantites correspond to ensemble averages (ergodic theorem) some quantities

More information

Citation for published version (APA): Hess, B. (2002). Stochastic concepts in molecular simulation Groningen: s.n.

Citation for published version (APA): Hess, B. (2002). Stochastic concepts in molecular simulation Groningen: s.n. University of Groningen Stochastic concepts in molecular simulation Hess, Berk IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

CHAPTER V. Brownian motion. V.1 Langevin dynamics

CHAPTER V. Brownian motion. V.1 Langevin dynamics CHAPTER V Brownian motion In this chapter, we study the very general paradigm provided by Brownian motion. Originally, this motion is that a heavy particle, called Brownian particle, immersed in a fluid

More information

PHYS 705: Classical Mechanics. Rigid Body Motion Introduction + Math Review

PHYS 705: Classical Mechanics. Rigid Body Motion Introduction + Math Review 1 PHYS 705: Classical Mechanics Rigid Body Motion Introduction + Math Review 2 How to describe a rigid body? Rigid Body - a system of point particles fixed in space i r ij j subject to a holonomic constraint:

More information

MD Thermodynamics. Lecture 12 3/26/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

MD Thermodynamics. Lecture 12 3/26/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky MD Thermodynamics Lecture 1 3/6/18 1 Molecular dynamics The force depends on positions only (not velocities) Total energy is conserved (micro canonical evolution) Newton s equations of motion (second order

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Exploring the energy landscape

Exploring the energy landscape Exploring the energy landscape ChE210D Today's lecture: what are general features of the potential energy surface and how can we locate and characterize minima on it Derivatives of the potential energy

More information

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or

This is a Gaussian probability centered around m = 0 (the most probable and mean position is the origin) and the mean square displacement m 2 = n,or Physics 7b: Statistical Mechanics Brownian Motion Brownian motion is the motion of a particle due to the buffeting by the molecules in a gas or liquid. The particle must be small enough that the effects

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Lecture 6 Positive Definite Matrices

Lecture 6 Positive Definite Matrices Linear Algebra Lecture 6 Positive Definite Matrices Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 2017/6/8 Lecture 6: Positive Definite Matrices

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Efficient Observation of Random Phenomena

Efficient Observation of Random Phenomena Lecture 9 Efficient Observation of Random Phenomena Tokyo Polytechnic University The 1st Century Center of Excellence Program Yukio Tamura POD Proper Orthogonal Decomposition Stochastic Representation

More information

How DLS Works: Interference of Light

How DLS Works: Interference of Light Static light scattering vs. Dynamic light scattering Static light scattering measures time-average intensities (mean square fluctuations) molecular weight radius of gyration second virial coefficient Dynamic

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods

More information

Langevin Methods. Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 10 D Mainz Germany

Langevin Methods. Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 10 D Mainz Germany Langevin Methods Burkhard Dünweg Max Planck Institute for Polymer Research Ackermannweg 1 D 55128 Mainz Germany Motivation Original idea: Fast and slow degrees of freedom Example: Brownian motion Replace

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

The dynamics of small particles whose size is roughly 1 µmt or. smaller, in a fluid at room temperature, is extremely erratic, and is

The dynamics of small particles whose size is roughly 1 µmt or. smaller, in a fluid at room temperature, is extremely erratic, and is 1 I. BROWNIAN MOTION The dynamics of small particles whose size is roughly 1 µmt or smaller, in a fluid at room temperature, is extremely erratic, and is called Brownian motion. The velocity of such particles

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Relative binding enthalpies from molecular dynamics simulations. using a direct method

Relative binding enthalpies from molecular dynamics simulations. using a direct method Relative binding enthalpies from molecular dynamics simulations using a direct method Amitava Roy, 1 Duy P. Hua, 1 Joshua M. Ward, 1 and Carol Beth Post* Department of Medicinal Chemistry, Markey Center

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Mathematical Preliminaries

Mathematical Preliminaries Mathematical Preliminaries Introductory Course on Multiphysics Modelling TOMAZ G. ZIELIŃKI bluebox.ippt.pan.pl/ tzielins/ Table of Contents Vectors, tensors, and index notation. Generalization of the concept

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Brownian motion and the Central Limit Theorem

Brownian motion and the Central Limit Theorem Brownian motion and the Central Limit Theorem Amir Bar January 4, 3 Based on Shang-Keng Ma, Statistical Mechanics, sections.,.7 and the course s notes section 6. Introduction In this tutorial we shall

More information

Unfolding CspB by means of biased molecular dynamics

Unfolding CspB by means of biased molecular dynamics Chapter 4 Unfolding CspB by means of biased molecular dynamics 4.1 Introduction Understanding the mechanism of protein folding has been a major challenge for the last twenty years, as pointed out in the

More information

Energy Barriers and Rates - Transition State Theory for Physicists

Energy Barriers and Rates - Transition State Theory for Physicists Energy Barriers and Rates - Transition State Theory for Physicists Daniel C. Elton October 12, 2013 Useful relations 1 cal = 4.184 J 1 kcal mole 1 = 0.0434 ev per particle 1 kj mole 1 = 0.0104 ev per particle

More information

LANGEVIN THEORY OF BROWNIAN MOTION. Contents. 1 Langevin theory. 1 Langevin theory 1. 2 The Ornstein-Uhlenbeck process 8

LANGEVIN THEORY OF BROWNIAN MOTION. Contents. 1 Langevin theory. 1 Langevin theory 1. 2 The Ornstein-Uhlenbeck process 8 Contents LANGEVIN THEORY OF BROWNIAN MOTION 1 Langevin theory 1 2 The Ornstein-Uhlenbeck process 8 1 Langevin theory Einstein (as well as Smoluchowski) was well aware that the theory of Brownian motion

More information

Anomalous Lévy diffusion: From the flight of an albatross to optical lattices. Eric Lutz Abteilung für Quantenphysik, Universität Ulm

Anomalous Lévy diffusion: From the flight of an albatross to optical lattices. Eric Lutz Abteilung für Quantenphysik, Universität Ulm Anomalous Lévy diffusion: From the flight of an albatross to optical lattices Eric Lutz Abteilung für Quantenphysik, Universität Ulm Outline 1 Lévy distributions Broad distributions Central limit theorem

More information

The Kramers problem and first passage times.

The Kramers problem and first passage times. Chapter 8 The Kramers problem and first passage times. The Kramers problem is to find the rate at which a Brownian particle escapes from a potential well over a potential barrier. One method of attack

More information

The quantum state as a vector

The quantum state as a vector The quantum state as a vector February 6, 27 Wave mechanics In our review of the development of wave mechanics, we have established several basic properties of the quantum description of nature:. A particle

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Metropolis, 2D Ising model

Metropolis, 2D Ising model Metropolis, 2D Ising model You can get visual understanding from the java applets available, like: http://physics.ucsc.edu/~peter/ising/ising.html Average value of spin is magnetization. Abs of this as

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Second-Order Inference for Gaussian Random Curves

Second-Order Inference for Gaussian Random Curves Second-Order Inference for Gaussian Random Curves With Application to DNA Minicircles Victor Panaretos David Kraus John Maddocks Ecole Polytechnique Fédérale de Lausanne Panaretos, Kraus, Maddocks (EPFL)

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Anomalous diffusion in biology: fractional Brownian motion, Lévy flights

Anomalous diffusion in biology: fractional Brownian motion, Lévy flights Anomalous diffusion in biology: fractional Brownian motion, Lévy flights Jan Korbel Faculty of Nuclear Sciences and Physical Engineering, CTU, Prague Minisymposium on fundamental aspects behind cell systems

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

From random matrices to free groups, through non-crossing partitions. Michael Anshelevich

From random matrices to free groups, through non-crossing partitions. Michael Anshelevich From random matrices to free groups, through non-crossing partitions Michael Anshelevich March 4, 22 RANDOM MATRICES For each N, A (N), B (N) = independent N N symmetric Gaussian random matrices, i.e.

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013. The University of Texas at Austin Department of Electrical and Computer Engineering EE381V: Large Scale Learning Spring 2013 Assignment 1 Caramanis/Sanghavi Due: Thursday, Feb. 7, 2013. (Problems 1 and

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation

More information

SIO 211B, Rudnick, adapted from Davis 1

SIO 211B, Rudnick, adapted from Davis 1 SIO 211B, Rudnick, adapted from Davis 1 XVII.Empirical orthogonal functions Often in oceanography we collect large data sets that are time series at a group of locations. Moored current meter arrays do

More information

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno

Hale Collage. Spectropolarimetric Diagnostic Techniques!!!!!!!! Rebecca Centeno Hale Collage Spectropolarimetric Diagnostic Techniques Rebecca Centeno March 1-8, 2016 Today Simple solutions to the RTE Milne-Eddington approximation LTE solution The general inversion problem Spectral

More information

StructuralBiology. October 18, 2018

StructuralBiology. October 18, 2018 StructuralBiology October 18, 2018 1 Lecture 18: Computational Structural Biology CBIO (CSCI) 4835/6835: Introduction to Computational Biology 1.1 Overview and Objectives Even though we re officially moving

More information

Multi-Ensemble Markov Models and TRAM. Fabian Paul 21-Feb-2018

Multi-Ensemble Markov Models and TRAM. Fabian Paul 21-Feb-2018 Multi-Ensemble Markov Models and TRAM Fabian Paul 21-Feb-2018 Outline Free energies Simulation types Boltzmann reweighting Umbrella sampling multi-temperature simulation accelerated MD Analysis methods

More information

Seminar on Linear Algebra

Seminar on Linear Algebra Supplement Seminar on Linear Algebra Projection, Singular Value Decomposition, Pseudoinverse Kenichi Kanatani Kyoritsu Shuppan Co., Ltd. Contents 1 Linear Space and Projection 1 1.1 Expression of Linear

More information

1. General Vector Spaces

1. General Vector Spaces 1.1. Vector space axioms. 1. General Vector Spaces Definition 1.1. Let V be a nonempty set of objects on which the operations of addition and scalar multiplication are defined. By addition we mean a rule

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

4 Linear Algebra Review

4 Linear Algebra Review Linear Algebra Review For this topic we quickly review many key aspects of linear algebra that will be necessary for the remainder of the text 1 Vectors and Matrices For the context of data analysis, the

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

Langevin Dynamics of a Single Particle

Langevin Dynamics of a Single Particle Overview Langevin Dynamics of a Single Particle We consider a spherical particle of radius r immersed in a viscous fluid and suppose that the dynamics of the particle depend on two forces: A drag force

More information

ADAPTIVE ANTENNAS. SPATIAL BF

ADAPTIVE ANTENNAS. SPATIAL BF ADAPTIVE ANTENNAS SPATIAL BF 1 1-Spatial reference BF -Spatial reference beamforming may not use of embedded training sequences. Instead, the directions of arrival (DoA) of the impinging waves are used

More information

Statistical Mechanics

Statistical Mechanics Statistical Mechanics Newton's laws in principle tell us how anything works But in a system with many particles, the actual computations can become complicated. We will therefore be happy to get some 'average'

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Gaussian Basics Random Processes Filtering of Random Processes Signal Space Concepts

Gaussian Basics Random Processes Filtering of Random Processes Signal Space Concepts White Gaussian Noise I Definition: A (real-valued) random process X t is called white Gaussian Noise if I X t is Gaussian for each time instance t I Mean: m X (t) =0 for all t I Autocorrelation function:

More information

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)

More information

Introduction to Geometry Optimization. Computational Chemistry lab 2009

Introduction to Geometry Optimization. Computational Chemistry lab 2009 Introduction to Geometry Optimization Computational Chemistry lab 9 Determination of the molecule configuration H H Diatomic molecule determine the interatomic distance H O H Triatomic molecule determine

More information

Linear Algebra Exercises

Linear Algebra Exercises 9. 8.03 Linear Algebra Exercises 9A. Matrix Multiplication, Rank, Echelon Form 9A-. Which of the following matrices is in row-echelon form? 2 0 0 5 0 (i) (ii) (iii) (iv) 0 0 0 (v) [ 0 ] 0 0 0 0 0 0 0 9A-2.

More information

Asymptotic Eigenvalue Moments for Linear Multiuser Detection

Asymptotic Eigenvalue Moments for Linear Multiuser Detection Asymptotic Eigenvalue Moments for Linear Multiuser Detection Linbo Li and Antonia M Tulino and Sergio Verdú Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA flinbo, atulino,

More information

PEAT SEISMOLOGY Lecture 2: Continuum mechanics

PEAT SEISMOLOGY Lecture 2: Continuum mechanics PEAT8002 - SEISMOLOGY Lecture 2: Continuum mechanics Nick Rawlinson Research School of Earth Sciences Australian National University Strain Strain is the formal description of the change in shape of a

More information

Rotational motion of a rigid body spinning around a rotational axis ˆn;

Rotational motion of a rigid body spinning around a rotational axis ˆn; Physics 106a, Caltech 15 November, 2018 Lecture 14: Rotations The motion of solid bodies So far, we have been studying the motion of point particles, which are essentially just translational. Bodies with

More information

Cartesian Tensors. e 2. e 1. General vector (formal definition to follow) denoted by components

Cartesian Tensors. e 2. e 1. General vector (formal definition to follow) denoted by components Cartesian Tensors Reference: Jeffreys Cartesian Tensors 1 Coordinates and Vectors z x 3 e 3 y x 2 e 2 e 1 x x 1 Coordinates x i, i 123,, Unit vectors: e i, i 123,, General vector (formal definition to

More information

1. Introductory Examples

1. Introductory Examples 1. Introductory Examples We introduce the concept of the deterministic and stochastic simulation methods. Two problems are provided to explain the methods: the percolation problem, providing an example

More information

MARTINI simulation details

MARTINI simulation details S1 Appendix MARTINI simulation details MARTINI simulation initialization and equilibration In this section, we describe the initialization of simulations from Main Text section Residue-based coarsegrained

More information

Clifford Algebras and Spin Groups

Clifford Algebras and Spin Groups Clifford Algebras and Spin Groups Math G4344, Spring 2012 We ll now turn from the general theory to examine a specific class class of groups: the orthogonal groups. Recall that O(n, R) is the group of

More information

Handout 10. Applications to Solids

Handout 10. Applications to Solids ME346A Introduction to Statistical Mechanics Wei Cai Stanford University Win 2011 Handout 10. Applications to Solids February 23, 2011 Contents 1 Average kinetic and potential energy 2 2 Virial theorem

More information

Supporting information for. Direct imaging of kinetic pathways of atomic diffusion in. monolayer molybdenum disulfide

Supporting information for. Direct imaging of kinetic pathways of atomic diffusion in. monolayer molybdenum disulfide Supporting information for Direct imaging of kinetic pathways of atomic diffusion in monolayer molybdenum disulfide Jinhua Hong,, Yuhao Pan,, Zhixin Hu, Danhui Lv, Chuanhong Jin, *, Wei Ji, *, Jun Yuan,,*,

More information

Chapter 6: Orthogonality

Chapter 6: Orthogonality Chapter 6: Orthogonality (Last Updated: November 7, 7) These notes are derived primarily from Linear Algebra and its applications by David Lay (4ed). A few theorems have been moved around.. Inner products

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Math 21b Final Exam Thursday, May 15, 2003 Solutions

Math 21b Final Exam Thursday, May 15, 2003 Solutions Math 2b Final Exam Thursday, May 5, 2003 Solutions. (20 points) True or False. No justification is necessary, simply circle T or F for each statement. T F (a) If W is a subspace of R n and x is not in

More information

Multiple Degree of Freedom Systems. The Millennium bridge required many degrees of freedom to model and design with.

Multiple Degree of Freedom Systems. The Millennium bridge required many degrees of freedom to model and design with. Multiple Degree of Freedom Systems The Millennium bridge required many degrees of freedom to model and design with. The first step in analyzing multiple degrees of freedom (DOF) is to look at DOF DOF:

More information

Citation for published version (APA): Feenstra, K. A. (2002). Long term dynamics of proteins and peptides. Groningen: s.n.

Citation for published version (APA): Feenstra, K. A. (2002). Long term dynamics of proteins and peptides. Groningen: s.n. University of Groningen Long term dynamics of proteins and peptides Feenstra, Klaas Antoni IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

Recurrences and Full-revivals in Quantum Walks

Recurrences and Full-revivals in Quantum Walks Recurrences and Full-revivals in Quantum Walks M. Štefaňák (1), I. Jex (1), T. Kiss (2) (1) Department of Physics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague,

More information

Tensors, and differential forms - Lecture 2

Tensors, and differential forms - Lecture 2 Tensors, and differential forms - Lecture 2 1 Introduction The concept of a tensor is derived from considering the properties of a function under a transformation of the coordinate system. A description

More information

e 2 e 1 (a) (b) (d) (c)

e 2 e 1 (a) (b) (d) (c) 2.13 Rotated principal component analysis [Book, Sect. 2.2] Fig.: PCA applied to a dataset composed of (a) 1 cluster, (b) 2 clusters, (c) and (d) 4 clusters. In (c), an orthonormal rotation and (d) an

More information

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL.

Adaptive Filtering. Squares. Alexander D. Poularikas. Fundamentals of. Least Mean. with MATLABR. University of Alabama, Huntsville, AL. Adaptive Filtering Fundamentals of Least Mean Squares with MATLABR Alexander D. Poularikas University of Alabama, Huntsville, AL CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is

More information

Q1 Q2 Q3 Q4 Tot Letr Xtra

Q1 Q2 Q3 Q4 Tot Letr Xtra Mathematics 54.1 Final Exam, 12 May 2011 180 minutes, 90 points NAME: ID: GSI: INSTRUCTIONS: You must justify your answers, except when told otherwise. All the work for a question should be on the respective

More information