Quantifying conformation fluctuation induced uncertainty in bio-molecular systems Guang Lin, Dept. of Mathematics & School of Mechanical Engineering, Purdue University Collaborative work with Huan Lei, Xiu Yang, Bin Zhang, Nathan Baker, PNNL 2015 IMA Hot Topics Workshop on Uncertainty Quantification in Materials Modeling, Purdue University, West Lafayette, IN, Aug. 31, 2015. arxiv:1408.5629 This work is supported by DOE grant for the Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4)
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Background & Motivation A biomolecule in equilibrium: static or dynamic? Target properties: deterministic or stochastic? Figure: Tube diagram of the molecule Trypsin inhibitor (PDB code: 5pti) under equilibrium. Quantifying the influence of conformational uncertainty in biomolecular solvation
Outline Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues 1 Construct stochastic model of conformation fluctuation 2 Numerical methods to construct the surrogated model 3 Numerical Example: SASA of individual/total residues Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Construct the stochastic model Approximate the potential energy of molecule fluctuation by V (R, R) = γ (r ij r ij ) 2 h(r c r ij ) 2 i<j R T = [ r T 1 r T 2 r T N ] - equilibrium position; R T = [ r T 1 rt 2 rt N ] - instantaneous position; γ - elastic coefficient; r c - cut-off distance. Define the Hessian matrix H 11 H 12 H 1N H 21 H 22 H 2N H =.,H ij = H N1 H N2 H NN 2 V X i X j 2 V Y i X j 2 V Z i X j 2 V X i Y j 2 V Y i Y j 2 V Z i Y j 2 V X i Z j 2 V Y i Z j 2 V Z i Z j Fluctuation correlation between the residues i and j Ri R T k B T [ j = H 1 ] γ ij Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Full stochastic model Eigenvalue decomposition H = WΛW T, Λ = diag(λ 1,, λ 3N 6 ) λ i - the i-th nonzero eigenvalues of H, w i - i-th eigenvector of H. Correlation matrix C can be determined by C ij R i R T j C = k BT γ WΛ 1 W = UU T, Stochastic conformation states are generated by R(ξ) = R + R(ξ) R(ξ) = Uξ ξ - 3N 6 dimensional standard normal random vector. Target property X (ξ) := X (R(ξ)). Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Reduced stochastic model For local property X {p} on residue p, R X {p} could be sparse, i.e., X {p} R j = 0 if X {p} on residue p is independent of R j. Correlation matrix C can be reduced to C C ij = C ij h(r p c r ip )h(r p c r jp ), r pi = r p r i, r pj = r p r j, r p c - cut-off distance for X {p}. Residue label C C {p} Residue label Figure: Sketch of a typical reduced correlation matrix. C {p} = U {p} U {p}t, R {p} (ξ {p} ) = R {p} + U {p} ξ {p}, N d = 3 h(rc p r ip ), i X {p} (ξ {p} ) := X {p} (R(ξ {p} )) ξ {p} : d-dimensional normal random vector Quantifying the influence of conformational uncertainty in biomolecular solvation
Outline Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues 1 Construct stochastic model of conformation fluctuation 2 Numerical methods to construct the surrogated model 3 Numerical Example: SASA of individual/total residues Quantifying the influence of conformational uncertainty in biomolecular solvation
Generalized polynomial chaos (gpc) expansion in uncertainty quantification (UQ) (Ghanem and Spanos 1991; Xiu and Karniadakis, 2002) Quantity of interest, e.g., force, velocity, etc. truncation error i.i.d. random variables For input samples 1
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Generalized Polynomial Chaos Generalized Polynomial Chaos (gpc) helps to represent uncertainty. In practice, we truncate the gpc expansion up to polynomial order P X (ξ) X (ξ) = P α =0 c α ψ α (ξ) Construct gpc expansion: probabilistic collocation methods (e.g, tensor product, sparse grid method, etc.) Q X (ξ)ψ α (ξ)dp(ξ) X i ψ α (ξ i )w i, i=1 where X i - collocation point, w i - weight. Major challenge: For high dimensional system: large number of collocation points (e.g. d = 27, P = 2, Q = 7.6 10 12 tensor product points) sensitive to numerical error accompanied with X Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Brief introduction to compressive sensing Consider a linear system Ψ M N c N 1 = u M 1. Ψ c = u When M < N, the system is underdetermined, it is possible to obtain c if c is a sparse vector. We may obtain it by solving the following optimization problem: Ψc + e = u, where e 2 ɛ, we modify (P h,0 ) as (P h,ɛ ) : min c c h subject to Ψc u 2 ɛ. 1. E.J. Candès, J. Romberg, T. Tao, IEEE Trans. Inform. Theory, 2006. 2. D.L. Donoho, M. Elad, V.N. Temlyakov, IEEE Trans. Inform. Theory, 2006. Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Application to generalized polynomial chaos Consider a gpc expansion X (ξ) = α c αψ α (ξ). By sampling ξ, we obtain: N X (ξ 1 ) c α ψ α (ξ 1 ), α=0 N X (ξ 2 ) c α ψ α (ξ 2 ), α=0 which can be cast into the linear system: i.e., ψ 0 (ξ 1 ) ψ 1 (ξ 1 ) ψ N (ξ 1 ) c 0 X (ξ 1 ) ψ 0 (ξ 2 ) ψ 1 (ξ 2 ) ψ N (ξ 2 ) c 1........ X (ξ 2 )., ψ 0 (ξ M ) ψ 1 (ξ M ) ψ N (ξ M ) X (ξ M ) Ψc + e = X, where e is related to the truncation error. c N Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Critical Problem The sparsity of the gpc expansion is unknown a priori. Quantifying the influence of conformational uncertainty in biomolecular solvation
Sparsity exact sparse nearly sparse 1
Sparsity exact sparse norm : norm : Example: is called -sparse if nearly sparse is sparse if Example: best - sparse approximation of 1
Compressive Sensing in UQ Classical works: Donoho, Candes, Tao, Romberg, Boyd 2004-2009. Orthonormal polynomial systems: Rauhut and Ward 2012. Application in UQ: Doostan and Owhadi 2010. Bayesian model uncertainty method: Karagiannis, Lin 2014. Mixed Shrinkage Prior procedure: Karagiannis, Konomi, Lin 2014. Sampling strategy: Rauhut and Ward 2012; Yan, Guo and Xiu 2012; Xu and Zhou 2014; Hampton and Doostan 2015. Enhancing sparsity: Candes, Wakin and Boyd 2008; Yang and Karniadakis 2013; Peng, Hampton and Doostan 2014. Adaptive basis selection: Jakeman, Eldred and Sargsyan 2015. Incorporating gradient information: Jakeman, Eldred and Sargsyan 2015; Lei, Yang, Zheng, Lin and Baker 2014; Peng, Hampton and Doostan 2015. 3
Increase the sparsity in the optimization Reweighted minimization (Candes, Wakin and Boyd 2008, Yang and Karniadakis 2013): It can be achieved iteratively: 4
Increase the sparsity intrinsically 5
Increase the sparsity intrinsically 6
Difficulties How to obtain? Understanding of the physical model How to compute the PDF of? may not be independent which may be an issue when generating new set of orthonormal polynomials. Does the matrix still have good properties? 7
A Special Case We consider a special case : are i.i.d. Gaussian, i.e., and the mapping is a rotation: are still i.i.d. Gaussian. are Hermite polynomials. are the value of Hermite polynomials at another set of input samples generated in the same manner (e.g., randn in MATLAB). 8
Example 9
Example 10
Rotation Matrix Active subspace method by Constantine, Dow and Wang (2014). Define the gradient matrix (outer product of gradient): where is symmetric and 11
Rotation Matrix Active subspace method by Constantine, Dow and Wang (2014). Define the gradient matrix (outer product of gradient): where is symmetric and is unknown! 12
Rotation matrix 13
Iteratively Construct Rotation Matrix Given the input sample and the output samples, the rotation matrix can be obtained iteratively to (possibly) improve its performance: In other words, where is the number of iterations. 14
Iteratively Construct Rotation Matrix Given the input sample and the output samples, the rotation matrix can be obtained iteratively to (possibly) improve its performance: In other words, where is the number of iterations. 15
Summary of the Algorithm 1.. 2. Set iteration counter and set. 3. Construct measurement matrix as, and compute the gpc coefficients with compressive sensing method. 16
Summary of the Algorithm 1.. 2. Set iteration counter and set. 3. Construct measurement matrix as, and compute the gpc coefficients with compressive sensing method. 4. Compute rotation matrix based on. 5. If is close to identity matrix or permutation matrix, stop. Otherwise, set, i.e., and go to step 3. 17
Outline Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues 1 Construct stochastic model of conformation fluctuation 2 Numerical methods to construct the surrogated model 3 Numerical Example: SASA of individual/total residues Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Solvent Accessible Surface Area of individual residues C C {p} 0.1 Figure: Molecule Trypsin inhibitor (PDB code: 5pti) under equilibrium. Probability density distribution 0.075 0.05 0.025 0 full correlation matrix reduced correlation matrix zero off diagonal element 60 90 120 SASA Figure: Probability density function of the Solvent accessible surface area (SASA) of the 14th residue. Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues gpc coefficient 100 10 0 gpc coefficient 80 60 40 20 ξ χ Normalized eigenvalue 10 1 10 2 10 3 10 4 G C 0 10 0 10 1 10 2 gpc basis index Figure: gpc coefficients c α with respect to ξ and χ. 10 5 5 10 15 20 25 Index Figure: Eigenvalues of the gradient matrix G and the correlation matrix C. Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Relative L 2 error We compute the relative L 2 error by ε Ns N s X (ξ i ) X (ξ i ) 2 / X (ξ i ) 2 i where N s - the number of sampling data (N s = 10 6.) i Relative L 2 error 10 1 CS (ξ) p = 2 p = 3 CS (χ) p = 2 p = 3 Sp level 1 (55 samples) Sp level 2 (1513 samples) Relative L 2 error 10 1 CS (ξ) p = 2 p = 3 CS (χ) p = 2 p = 3 Sp level 1 (55 samples) Sp level 2 (1513 samples) 10 2 10 2 200 300 400 500 600 number of sample 200 300 400 500 600 number of sample Figure: Symbols - gpc expansions X (χ) and X (ξ) by compressive sensing method. Dash lines - Sparse grid points on level 1 and 2. Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Probability density function probability density distribution 0.1 0.075 0.05 0.025 MC (10 6 samples) SP level 1 (55 samples) SP level 2 (1513 samples) CS (300 samples) MC (300 samples) 0 60 70 80 90 100 110 120 SASA (a) K L divergence 10 2 10 3 10 4 SP level 1 (55 samples) MC (300 samples) MC (1200 samples) CS (χ) 10 5 200 300 400 500 number of sample 600 (b) Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Total SASA 10 0 40 Relative L 2 error 10 1 10 2 CS (ξ) CS (χ) SP level 1 (337 samples) < SASA N SASA gpc > 30 20 10 CS (800 samples) CS (1600 samples) 10 3 800 1200 1600 number of sample 2000 Figure: Relative L 2 error of the total SASA by gpc expansion. 0 3250 3300 3350 3400 3450 SASA Figure: Mean error of the surrogated model for the total SASA in different regimes. Quantifying the influence of conformational uncertainty in biomolecular solvation
Summary Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues We proposed a framework based on gpc expansion to quantify the conformation fluctuation induced uncertainty in bio-molecular systems. We proposed a method to elevate the sparsity of the gpc expansion, yielding more accurate surrogated model. This method is well-suited for UQ study in bio-molecular system of high dimensionality, where sample points are often accompanied with numerical error. Quantifying the influence of conformational uncertainty in biomolecular solvation
Construct stochastic model of conformation fluctuation Numerical methods to construct the surrogated model Numerical Example: SASA of individual/total residues Acknowledgement We acknowledge helpful discussion from: T. Goddard, G. Karniadakis, W. Pan, G. Schenter, X. Wan, Z. Zhang, W. Zhou. We acknowledge financial support from DOE Grant for the new Collaboratory on Mathematics for Mesoscopic Modeling of Materials. Quantifying the influence of conformational uncertainty in biomolecular solvation