De Novo Protein Structure Prediction

Size: px
Start display at page:

Download "De Novo Protein Structure Prediction"

Transcription

1 De Novo Protein Structure Prediction

2 Multiple Sequence Alignment A C T A T T T G G

3 Multiple Sequence Alignment A C T ACT-- A T T ATT-- T --TGG G G

4 Multiple Sequence Alignment What about the simple extension from 2D?! There are seven possible endings : u v u v m v m u u u v m w n v m w w n w n w n endings for 2 k 1 k sequences. Why?

5 Multiple Sequence Alignment s i 1,j 1,k 1 + (u i,v j,w k ) s i 1,j 1,k + (u i,v j, ) s i 1,j,k 1 + (u i,,w k ) s i,j,k = max s i,j 1,k 1 + (,v j,w k ) s i 1,j,k + (u i,, ) s i,j,k 1 + (,,w k ) s i,j 1,k + (,v j, ) s i,j,k 1 + (,,w k ) (x, y, z) is an entry in the 3D scoring matrix Time and space grow exponentially with number of sequences

6 Scoring Sum-of-Pairs Scoring (SP): S(A) = m 1 i=1 m j=i+1 S( s i, s j ) A s i S( s i, s j ) a multiple alignment projection of sequence i ( i with gaps) score of pairwise alignment Idea: A good multiple alignment should contain good pairwise alignments

7 Pruning the DP Matrix The dynamic programming matrix is large, but we only want the best alignment, and most matrix elements are not on that path.! Can we direct the search to avoid evaluating cells that are provably not on the best path? S v S v Score of the best path from start to v v F v F v Bound on the best path from to the end v K Score of best known alignment What if: S v + F v <K

8 Pruning the DP Matrix ARSTVK, ASVK, ARTR Let v = (3, 2, 2) S v F v is score of best alignment of: ARS, AS, AR is upper bound on score of aligning: TVK, VK, TR If S v + F v <K then mark v as dead-ending (aka prune v )

9 Pruning the DP Matrix We know the alignment score is: m 1 m S(A) S(s k,s l ) k=1 l=k+1 Observation: S(A) = m 1 m i=1 j=i+1 S( s i, s j ) S( s i, s j ) S(s i,s j ) So our bound can be: F v = m 1 k=1 m l=k+1 S(s k v k +1...n k,s l v l +1...n l ) Runtime for computing F (using dynamic programming): O(n 2 m 2 )

10 Backbone Native State Each point on the energy landscape defines a conformation and associated energy.! How many degrees of freedom should we have? How many do we want? A protein conformation can be represented by a vector of DOF choices, and the conformation with minimum (potential) energy is: =(...,r i,...) E( ) = X i6=j E i,j + X i E i

11 Backbone Native State Each point on the energy landscape defines a conformation and associated energy.! How many degrees of freedom should we have? How many do we want? A protein conformation can be represented by a vector of DOF choices, and the conformation with minimum (potential) energy is: =(...,r i,...) = arg min E( )

12 Primary Sequence Energy Function Conformation Space In order to apply a discrete optimization technique, we need a discretized search space!

13 Primary Sequence Energy Function Algorithm Conformation Space

14 (Homologous) Backbone Energy Function Algorithm Conformation Space

15 (Homologous) Backbone X-ray Data Algorithm Conformation Space

16 (Homologous) Backbone X-ray Data Energy Function Algorithm Conformation Space

17 (Homologous) Backbone X-ray Data Energy Function PDB Statistics Algorithm Conformation Space

18 (Homologous) Backbone NMR Data Energy Function PDB Statistics Algorithm Conformation Space

19 Prior Knowledge and Observations (Sequence/Fold, Energy Function, Statistics, Experimental Data) Conformation AlgorithmSpace Best-Fit Model (3D Structure, Backbone, Sidechains, Docking, design)

20 Prior Knowledge and Observations (Sequence/Fold, Energy Function, PDB Statistics, Experimental Data) Fast (enough)? Accurate (enough) model? Conformation AlgorithmSpace Correct (enough) solution? Best-Fit Model (3D Structure, Backbone, Sidechains, Docking, design)

21 Prior Knowledge and Observations (Sequence/Fold, Energy Function, Statistics, Experimental Data) Fast (enough)? We want O(n c ) and not O(c n ) Conformation AlgorithmSpace Correct objective function? Guarantees on solution quality? Best-Fit Model (3D Structure, Backbone, Sidechains, Docking, design)

22 Discretizing Sidechains Table 1 Published rotamer libraries. Authors Year Type of library Number of proteins in library Resolution (Å) C handrasekaran and Ramachandran [2] 1970 B BIND 3 NA Janin et al. [4] 1978 B BIND, SSDEP Bhat et al. [3] 1979 B BIND 23 NA James and Sielecki [5] 1983 B BIND 5 1.8, R-factor < 0.15 B enedetti et al. [6] 1983 B BIND 238 peptides R-factor < 0.10 Ponder and Richards [7] 1987 B BIND Mc Gregor et al. [8] 1987 SSDEP Tuffery et al. [9] 1991 B BIND Dunbrack and Karplus [10] 1993 B BIND, B B DEP Schrauber et al. [11] 1993 B BIND, S S DEP Kono and Doi [12] 1996 B BIND 103 NA De Maeyer et al. [13] 1995 B BIND Dunbrack and C ohen [14] B BIND, B B DEP 850* 1.7 Lovell et al. [15 ] 2000 B BIND, SSDEP *Latest update, May NA, not available. [Dunbrack, Rotamer Libraries in the 21st Century, 2002]

23 Energy Functions Standard approaches (e.g. Amber, CHARMM, GROMACS) model potential energy as : E total = E bonded + E unbonded where: E bonded = E bond + E angle + E dihedral E nonbonded = E electrostatic + E vdw

24

25 (Homologous) Backbone Energy Function Algorithm Conformation Space

26 R Phenylalanine ! n For a protein with amino acids, the protein backbone has 2n 2 degrees of freedom.! Sidechain conformations are also defined by dihedral angles, but can be discretized by rotamers. [Shapovalov, Dunbrack 11]

27 Native State Backbone Each point on the energy landscape defines a conformation and associated energy.! For sidechain placement, we have n degrees of freedom. Each amino acid has a number of states equal to the number of rotamers for that type. A sidechain conformation can be represented by a vector of rotamer choices, and the conformation with minimum (potential) energy is: =(...,i r,...) = arg min E( )

28 Dead End Elimination One of the only deterministic, non-trivial, and effective combinatorial optimization algorithms in Computational Structural Biology Prunes rotamers that are provably NOT part of the GMEC Used For Side-Chain Placement (tertiary structure prediction) Protein Design Original DEE

29 Dead End Elimination Total Energy 1 3 2

30 Dead End Elimination Total Energy i r i t

31 Dead End Elimination Total Energy i r i t

32 Dead End Elimination Total Energy i r i t

33 Dead End Elimination Original DEE (Simplified) i r i t? 3? 3 2? 2?

34 Dead End Elimination Original DEE (Simplified) i r min i t max?? min 3 max 3?? 2 2

35 Dead End Elimination Original DEE (Simplified) Pierce, Spriet, Desmet, Mayo, JCC, 2000

36 Dead End Elimination Original DEE: Pierce, Spriet, Desmet, Mayo, JCC, 2000

37 Dead End Elimination Goldstein Criterion: E(i r ) E(i t )+ X j6=i min s {E(i r,j s ) E(i t,j s )} > 0 Pierce, Spriet, Desmet, Mayo, JCC, 2000

38 Dead End Elimination Goldstein Criterion: E(i r ) E(i t )+ X j6=i min s {E(i r,j s ) E(i t,j s )} > 0 Pierce, Spriet, Desmet, Mayo, JCC, 2000

39 Dead End Elimination Generalized Goldstein Criterion: E(i r ) X t=1,t C t E(i t )+X j6=i {min s E(i r,j s ) X t=1,t C t E(i t,j s )} > 0 Pierce, Spriet, Desmet, Mayo, JCC, 2000

40 Conformation Space k c k a k b k d k e The idea behind bottom line DEE is that the conformation space can be partitioned to improve pruning.! If a particular rotamer can be eliminated in any partition, then it is not in the GMEC.

41 Dead End Elimination Simple Split DEE (for each partition): E(i r ) E(i t )+ X j6=k6=i min s {E(i r,j s ) E(i t,j s )} +[E(i r,k v ) E(i t,k v )] > 0 Pierce, Spriet, Desmet, Mayo, JCC, 2000

42 TABLE II. CPU Minutes Consumed Using Goldstein (T = 1) DEE, Split (s = 1) DEE, and Split (s = 2 mb )DEEforEachof Three Test Cases. Case Method (T = 1) time (s = 1) time (s = 2 mb )time Doublestime Totaltime 1 Goldstein (T = 1) a Split (s = 1) Split (s = 2 mb ) Goldstein (T = 1) a Split (s = 1) Split (s = 2 mb ) Goldstein (T = 1) a Split (s = 1) a Split (s = 2 mb ) a Failed to converge due to combinatorial explosion in the number of superrotamers created by unification. FIGURE 5. Plastocyanin core design (the two split methods are indistinguishable). FIGURE 6. Protein G core/boundary design. FIGURE 7. Protein G surface design.

43 Extensions and Results Sidechain placement vs. design, is there a difference? DEE can be an extremely powerful pruning strategy, what do we do in cases where the conformation space remains large? Can we do better than looking at conformations exhaustively?

44 Conformation Space 1 Far apart Explicit: (1,1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3) An explicit representation considers all possible conformations individually, leading to an exponentially sized conformation space.

45 Factorized Conformation Space 1 Far apart Implicit: (1, 2, 3), (1, 2, 3) Rather than defining the conformation space explicitly, by considering only local interactions we can obtain a compact, factored representation of the conformation space.

46 Factorized Conformation Space 1 i E(i, j) E(i, i 0 ) 2 3 E(i, j) E(j, j 0 ) 1 2 j 3 Implicit: (1, 2, 3), (1, 2, 3) Rather than defining the conformation space explicitly, by considering only local interactions we can obtain a compact, factored representation of the conformation space.

47 Factorized Conformation Space Rather than defining the conformation space explicitly, by considering only local interactions we can obtain a compact, factored representation of the conformation space.

48 Protein Interaction Graph We can construct an interaction graph of residues in which edges are defined for residues that are close enough. These define pairwise energy terms for any chosen pair of rotamers.

49 Protein Interaction Graph The sidechain placement problem is then to select rotamers at each position so as to minimize the sum over all edges of interaction energies.

50 Linear Programming LP Solver A, b, c Minimize X i c i x i Subject to: A x apple b x 1,x 2,...,x n Linear programming is a general-purpose tool for optimization a linear objective function under linear constraints. The general problem of Linear Programming is polynomial-time solvable.

51 Linear Programming ILP Solver A, b, c Minimize X i c i x i Subject to: A x apple b x i 2 Z x 1,x 2,...,x n Linear programming is a general-purpose tool for optimization a linear objective function under linear constraints. Integer Linear Programming is not known to be polynomial-time solvable.

52 Linear Programming ILP Solver A, b, c Minimize X i c i x i Subject to: A x apple b x i 2{0, 1} x 1,x 2,...,x n Linear programming is a general-purpose tool for optimization a linear objective function under linear constraints. Integer Linear Programming is not known to be polynomial-time solvable.

53 3 P 1 1 [Wikipedia] The feasible region of a set of constraints can be viewed as the set of all points that satisfy the constraints.! All LP solvers search the space of solutions and try to find a point that maximizes the objective function.

54 3 P 1 1 [Wikipedia] Fact: Some vertex of the feasible region is optimal. Fact: A vertex is optimal if there is no better neighboring vertex.! Dantzig (1947) came up the simplex algorithm: Set v = any vertex!! While a neighbor vertex v has better cost:!!! v = v

55 LP for Sidechain Placement { } Minimize E = u V E uux uu + {u,v} D E uvx uv subject to u V j x uu = 1 for j = 1,..., p u V j x uv = x vv for j = 1,..., p and v V \ V j x uu, x uv {0, 1}. (IP1) [Kingsford et al. 05] Integer linear programming (ILP) gives linear constraints on a set of variables, and a linear cost function.! The goal is to minimize cost (determined by variable choices) while satisfying the constraints.! ILP does not care about the energy function, or about the fact that the interaction graph comes from a protein structure.

56 LP for Sidechain Placement Minimize E = u V E uux uu + {u,v} D E uv x uv subject to x uu = 1 u V j for j = 1,..., p x uv = x vv for j = 1,..., p and v N + (V j ) u V j x uv x vv for j = 1,..., p and v N + (V j ) u V j :E uv <0 x uu, x uv {0, 1} (IP2) [Kingsford et al. 05] One simple optimization is to only include rotamer pairs that will ever interact with a non-zero pairwise energy.! These pairs can be precomputed ahead of time, and we can reduce the number of constraints.

57 LP for Sidechain Placement Minimize E = u V E uux uu + {u,v} D E uv x uv subject to x uu = 1 u V j for j = 1,..., p x uv = x vv for j = 1,..., p and v N + (V j ) u V j x uv x vv for j = 1,..., p and v N + (V j ) u V j :E uv <0 x uu, x uv {0, 1} x uu,x uv apple 1 x uu,x uv 0 (IP2) What does it mean for the integrality constraints to be relaxed?

58 LP for Sidechain Placement Minimize E = u V E uux uu + {u,v} D E uv x uv subject to x uu = 1 u V j for j = 1,..., p x uv = x vv for j = 1,..., p and v N + (V j ) u V j x uv x vv for j = 1,..., p and v N + (V j ) u V j :E uv <0 x uu, x uv {0, 1} x uu,x uv apple 1 x uu,x uv 0 (IP2) Relaxing the integrality constraints allows the application of a polynomial-time algorithm for finding an optimal solution for the given set of constraints and objective. What does it mean to have a fractional solution?

59 LP for Sidechain Placement Minimize E = u V E uux uu + {u,v} D E uv x uv subject to x uu = 1 u V j for j = 1,..., p x uv = x vv for j = 1,..., p and v N + (V j ) u V j x uv x vv for j = 1,..., p and v N + (V j ) u V j :E uv <0 x uu, x uv {0, 1} x uu,x uv apple 1 x uu,x uv 0 (IP2) u S k x uu p 1 for k = 1,..., m 1 We can also run this method iteratively, excluding previously identified minimum-energy conformations from being selected.

60 Table 3. Prediction of side-chain conformations on native backbones, with a comparison of the LP/ILP prediction with those of other methods and the crystal structure Table 4. Prediction of side-chain conformations using homology modeling, with a comparison of the LP/ILP prediction with those of other methods and the crystal structure Core residues All residues Core residues (Å) All residues (Å) (a) LP/ILP χ 1 /χ %/62% 80%/51% (b) Scwrl χ 1 /χ %/60% 80%/49% (c) LP/ILP rmsd Å Å (d) Scwrl rmsd Å Å All values are averaged over the 25 proteins of Table 1. (a) The percentage of residues over all proteins for which LP/ILP predicted conformation has the χ 1 and χ 1+2 dihedral angles within 20 of the native structure; (b) these values for Scwrl; (c) the rmsd of the predicted side-chain conformations from those of the native side chains using the LP/ILP method; and (d) these are values for Scwrl. (a) LP/ILP rmsd (b) Scwrl rmsd (c) Backbone rmsd All values are averaged over the 33 problems of Table 2. (a) The rmsd between just sidechain atoms when comparing the LP/ILP predicted structure with the crystal structure; (b) this value when comparing the Scwrl predictions with the native structure; and (c) the rmsd between template and target structures when only considering backbone atoms.

61 Table 5. Proteins for which the core was redesigned Prot. Var len Rot Size Time (ILP) Rel gap N 1aac e2 (1.3e2) aho Integral 1b9o e2 (9.4) c5e e1 Integral 1c9o e1 (4.6e1) cc e1 (2.4) cex e3 (7.0e2) cku Integral 1ctj e1 Integral 1cz e3 (3.2e2) czp e2 (1.4e2) d4t e2 (8.9e1) igd Integral 1mfm e3 (5.4e3) plc e2 (1.3e2) qj e4 (4.5e5) qq e3 (6.9e2) qtn e2 (7.0e1) qu e2 (6.4) rcf e3 (9.6e1) vfy Integral 2pth e4 (2.4e4) lzt e2 (3.9e2) p e3 (1.3e4) rsa e2 (1.4e1) Relative gap aac Instance 1aho 1aac 1ctj 1igd 1cex Fig. 2. Relative gap between the optimal solution (with value OPT) and the nine next lowest-energy solutions (where the i-th solution has value x i ). Inset shows relative gaps for the 100 lowest-energy solutions for 1aac. Relative gap at each iteration i is defined as 100( OPT x i / OPT ).

62 Factorized Conformation Space 1 i E(i, j) E(i, i 0 ) 2 3 E(i, j) E(j, j 0 ) 1 2 j 3 Implicit: (1, 2, 3), (1, 2, 3) Rather than defining the conformation space explicitly, by considering only local interactions we can obtain a compact, factored representation of the conformation space.

63 Factorized Conformation Space Rather than defining the conformation space explicitly, by considering only local interactions we can obtain a compact, factored representation of the conformation space.

64 Protein Interaction Graph We can construct an interaction graph of residues in which edges are defined for residues that are close enough. These define pairwise energy terms for any chosen pair of rotamers.

65 Factor Graphs x f 3 Suppose we know that the likelihood function is: f 2 v y f 4 MAP Configuration Find the configuration of variables that maximizes : f 1 u [Loeliger et al. 01] [Pearl 88, Jordan...] z f 5 Marginalization Find the marginal value of on a particular variable. For example: g z = X f 1 (u, v) f 2 (v, x) f 3 (x, y) f 4 (y, z) f 5 (z) u,v,x,y

66 Factor Graphs x f 3 Suppose we know that the likelihood function is: f 2 v y f 4 MAP Configuration Find the configuration of variables that maximizes : f 1 u [Loeliger et al. 01] [Pearl 88, Jordan...] z f 5 Here, variables take on a fixed number of states, and factors define local interactions.

67 Factor Graphs x f 3 Suppose we know that the likelihood function is: f 2 v f 1 y f 4 z max z f 5 (z) MAP Configuration Find the configuration of variables that maximizes : max f 4 (y, z) y max hu,v,x,y,zi g(u, v, x, y, z) = max x f 3(x, y) max f 2 (v, x) max f 1(u, v) v u u [Loeliger et al. 01] [Pearl 88, Jordan...] f 5 We can define likelihoods using the Boltzmann distribution: Pr[ ] / e E( )

68 f 3 y f 1 x f 2 f 4 f 5 z To construct a protein factor graph, we take each amino acid in the primary sequence as a variable, and its sidechains as states. Univariate and bivariate factors are defined using self- and pairwise energies (i.e., probabilities). A MAP configuration corresponds to a minimum-energy conformation. 1UBQ - Ubiquitin Boltzmann distribution: Pr[ ] / e E( ) The model (with appropriate parameters) can be used to analyze protein energetics [Yanover/Weiss 02, Xu 05, Kamisetty et al 07, 11].

69 Max-Product Algorithm x f 3 Maximization can be computed by message passing : f 2 v f 1 u y f 4 z f 5 max z µ fj!x i (x i ) = max X j \x i f j (X j ) Y x2x j \x i µ x!fj (x) Once all messages have been passed, we can assign a maximizing configuration starting at leaf factors. f 5 (z) max f 4 (y, z) max f 3(x, y) max f 2 (v, x) max f 1(u, v) y x v u [Pearl 88]

70 Dealing with Cycles f 2 v f 1 u x??? f 3 [Pearl 88, Yedidia et al ] y f 4 z f 5 Computing marginals or MAP configurations exactly in a model with cycles is NP-hard.! However, we can still use the sumproduct algorithm in two ways:! Collapse multiple variables into a single variable to eliminate cycles.! Run sum-product as before, but until convergence.! One method is exact, while the other is approximate.

71 Dealing with Cycles x f 1 u v f 2 f 3 [Pearl 88, Yedidia et al ] x y f 4 z f 5 Computing marginals or MAP configurations exactly in a model with cycles is NP-hard.! However, we can still use the sumproduct algorithm in two ways:! Collapse multiple variables into a single variable to eliminate cycles.! Run sum-product as before, but until convergence.! Variable/Factor grouping must be chosen carefully to avoid state-space explosion.

72 Unfortunately exact methods are prohibitively expensive if we consider longer-range interactions. We can approximate by stopping message passing near (or at) convergence.

73 Tree Decomposition h fh b d f g abcdefm fg m c e a i clk eij l k j Fig. 1. Example of a residue interaction graph. Fig. 2. Example of the biconnected-component decomposition of a graph. The width of this decomposition is 6. Given a factor graph, we can actually reorganize it as long as we don t lose any dependencies. But, we don t want to add too many unnecessary ones either.

74 Tree Decomposition fh abd acd cdem defm fg clk eij Fig. 3. Example of a tree decomposition of a graph with width 3. Given a factor graph, we can actually reorganize it as long as we don t lose any dependencies. But, we don t want to add too many unnecessary ones either.! In general, which trees capture the original graph, and how can we measure how good a particular tree it?

75 Tree Decomposition fh abd acd cdem defm fg clk eij Fig. 3. Example of a tree decomposition of a graph with width 3. A tree decomposition is a tree on vertex subsets that satisfies the following:! 1. The union of all vertex subsets equals the original vertex set. 2. For any edge in the original graph, there is some with. 3. If and, then for all on the path between and. X j X i (u, v) X i u, v 2 X i v 2 X i v 2 X j v 2 X k X k X i

76 Standard Applications Stereo Vision [Sontag 10] Signal Processing Coding [Söding 05] [McEliece et al. 98]

77 General Graphs a b a b c a b d c d c e f d e x f y Tree Decomposition (NP-Hard) e x f y u v u x v y Loopy Graph Junction Tree To deal with graphs with cycles, we group variables such that the original likelihood function is unchanged but we obtain a tree-structured model. If this junction-tree has treewidth, sum-product requires O(n d ) time.

78 Sum-Product is Fragile x f 3 x f 3 f 2 y f 2 y v f 1 f 4 z Update v f 1 f 4 z Updating a tree-structured factor graph can change messages in an execution of the sum-product algorithm. u f 5 u f 5 c e x a b d f y Add (u, v) c e u u e x a b c u u d a b f d u f u y Adding a cycle to the input graph can change nodes in the junction tree (and associated factor graph). u v u x v u y

79 Clustering in Factor Graphs x f 3 x Cluster Functions f 2 y f 2 f 3 y v f 4 Rake, Compress,, v f 4 f 1 z f 1 z u f 5 ū f 5 In each round of clustering, we rake all leaves and compress a maximal independent set of degree-two nodes [Miller/Reif 84], while computing cluster functions.

80 Tree Contraction x f 3 f 2 v y f 4 Rake, Compress f 2 f 1 x y z Compress f 2 y z f 1 z f 1 (u, v) = f 1 (u, v) Finalize ū f 5 f 2 (y) = f 1 (u, v) x (x, y)f 2 (v, x, y) ȳ u,v,x = f 1 (u, v)f 3 (x, y)f 2 (v, x, y) u,v,x How long do intermediate cluster function computations take? How may rounds until everything is eliminated?

81 Cluster Tree x f 3 ȳ f 2 y O(n d 3 ) time f 2 v f 4 f 1 x z f 1 z u f 5 ū v f 3 f4 f5 We also keep track of the boundaries, defined as the set of edges leaving a cluster at the time of its creation during contraction.

82 Computing Marginals Mȳ ȳ x f 3 ' x f 2 y f 2 M f2 f 1 x x z z v f 4 M f1 f 1 z v ' v ū v f 3 f4 f5 u f 5 ' z Any marginal can be computed in O(d 2 log n) time.

83 Dealing with Cycles f 2 v f 1 u x f 3 [Pearl 88, Yedidia et al ] y f 4 z f 5 Computing marginals or MAP configurations exactly in a model with cycles is NP-hard.! However, we can still use the sumproduct algorithm in two ways:! Collapse multiple variables into a single variable to eliminate cycles.! Run max-product as before, but until convergence.! The focus of research in approximate methods is in improving convergence times.

84 Message Passing and Free Energy We have been trying to minimize the potential energy of a protein conformation. But given that proteins exist in an ensemble of conformations, what do we minimize?! The free energy of a protein is defined as:!!! H G = H TS Where is the enthalpy of the system, and is the entropy of the system. S How does this relate to graphical models? We can define:!! G = X p( )E( )+T X p( )ln(p( )) =!! Here, is the normalizing constant, or partition function. Z ln Z

85 Approximate Inference Can we simplify the model in order to make it tractable? How do we do this? What can we say about the associated global likelihood? We d like to relate our approximation b( ) with the underlying global distribution p( ). The Kullback-Leibler distance between p( ) and b( ) is defined as: D(b; p) = X b( )ln b( ) p( ) Using that p( ) =e E( ) /Z we get that: D(b; p) = X which is minimized when b( )ln(b( )) + X b = p and we get: b( )E( )+lnz G = X p( )E( )+T X p( )ln(p( )) = ln Z

86 Variational Inference Now, the fit of our estimated b( ) can be measured using:! D(b; p) = X b( )ln(b( )) + X b( )E( )+lnz The variational approach to message-passing seeks to perform inference efficiently, while using bounds on ln Z to obtain a goodness of fit. Can you think of a lower bound for ln Z? An upper bound? A key area of research is to develop bounds useful for performing inference. How does all this relate back to protein structure?

87 [Lange Lab, TU-München] How does the potential-energy based view of protein design differ from the free-energy based view?

88 Free Energy Is it easy to compute the free energy of a given protein sequence (with fixed backbone)? Can we minimize the free energy for a particular choice of sequence for protein design? How can we use graphical models? Are there other (more efficient/accurate) approaches?

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer

More information

Side-chain positioning with integer and linear programming

Side-chain positioning with integer and linear programming Side-chain positioning with integer and linear programming Matt Labrum 1 Introduction One of the components of homology modeling and protein design is side-chain positioning (SCP) In [1], Kingsford, et

More information

BIOINFORMATICS. Solving and analyzing side-chain positioning problems using linear and integer programming

BIOINFORMATICS. Solving and analyzing side-chain positioning problems using linear and integer programming BIOINFORMATICS Vol. 00 no. 0 2004, pages 1 11 doi:10.1093/bioinformatics/bti144 Solving and analyzing side-chain positioning problems using linear and integer programming Carleton L. Kingsford, Bernard

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

Course Notes: Topics in Computational. Structural Biology.

Course Notes: Topics in Computational. Structural Biology. Course Notes: Topics in Computational Structural Biology. Bruce R. Donald June, 2010 Copyright c 2012 Contents 11 Computational Protein Design 1 11.1 Introduction.........................................

More information

What is Protein Design?

What is Protein Design? Protein Design What is Protein Design? Given a fixed backbone, find the optimal sequence. Given a fixed backbone and native sequence, redesign a subset of positions (e.g. in the active site). What does

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models David Sontag New York University Lecture 6, March 7, 2013 David Sontag (NYU) Graphical Models Lecture 6, March 7, 2013 1 / 25 Today s lecture 1 Dual decomposition 2 MAP inference

More information

Molecular Modeling Lecture 11 side chain modeling rotamers rotamer explorer buried cavities.

Molecular Modeling Lecture 11 side chain modeling rotamers rotamer explorer buried cavities. Molecular Modeling 218 Lecture 11 side chain modeling rotamers rotamer explorer buried cavities. Sidechain Rotamers Discrete approximation of the continuous space of backbone angles. Sidechain conformations

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

Computational Protein Design

Computational Protein Design 11 Computational Protein Design This chapter introduces the automated protein design and experimental validation of a novel designed sequence, as described in Dahiyat and Mayo [1]. 11.1 Introduction Given

More information

Texas A&M University

Texas A&M University Texas A&M University Electrical & Computer Engineering Department Graphical Modeling Course Project Author: Mostafa Karimi UIN: 225000309 Prof Krishna Narayanan May 10 1 Introduction Proteins are made

More information

Fast and Accurate Algorithms for Protein Side-Chain Packing

Fast and Accurate Algorithms for Protein Side-Chain Packing Fast and Accurate Algorithms for Protein Side-Chain Packing Jinbo Xu Bonnie Berger Abstract This paper studies the protein side-chain packing problem using the tree-decomposition of a protein structure.

More information

Abstract. Introduction

Abstract. Introduction In silico protein design: the implementation of Dead-End Elimination algorithm CS 273 Spring 2005: Project Report Tyrone Anderson 2, Yu Bai1 3, and Caroline E. Moore-Kochlacs 2 1 Biophysics program, 2

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

More information

CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics. Tertiary Structure Prediction CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

More information

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target.

HOMOLOGY MODELING. The sequence alignment and template structure are then used to produce a structural model of the target. HOMOLOGY MODELING Homology modeling, also known as comparative modeling of protein refers to constructing an atomic-resolution model of the "target" protein from its amino acid sequence and an experimental

More information

Fast and Accurate Algorithms for Protein Side-Chain Packing

Fast and Accurate Algorithms for Protein Side-Chain Packing Fast and Accurate Algorithms for Protein Side-Chain Packing JINBO XU Toyota Technological Institute at Chicago and Massachusetts Institute of Technology AND BONNIE BERGER Massachusetts Institute of Technology

More information

Supporting Online Material for

Supporting Online Material for www.sciencemag.org/cgi/content/full/309/5742/1868/dc1 Supporting Online Material for Toward High-Resolution de Novo Structure Prediction for Small Proteins Philip Bradley, Kira M. S. Misura, David Baker*

More information

Travelling Salesman Problem

Travelling Salesman Problem Travelling Salesman Problem Fabio Furini November 10th, 2014 Travelling Salesman Problem 1 Outline 1 Traveling Salesman Problem Separation Travelling Salesman Problem 2 (Asymmetric) Traveling Salesman

More information

Integer Linear Programs

Integer Linear Programs Lecture 2: Review, Linear Programming Relaxations Today we will talk about expressing combinatorial problems as mathematical programs, specifically Integer Linear Programs (ILPs). We then see what happens

More information

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs

- Well-characterized problems, min-max relations, approximate certificates. - LP problems in the standard form, primal and dual linear programs LP-Duality ( Approximation Algorithms by V. Vazirani, Chapter 12) - Well-characterized problems, min-max relations, approximate certificates - LP problems in the standard form, primal and dual linear programs

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Min-Max Message Passing and Local Consistency in Constraint Networks

Min-Max Message Passing and Local Consistency in Constraint Networks Min-Max Message Passing and Local Consistency in Constraint Networks Hong Xu, T. K. Satish Kumar, and Sven Koenig University of Southern California, Los Angeles, CA 90089, USA hongx@usc.edu tkskwork@gmail.com

More information

Topics in Theoretical Computer Science April 08, Lecture 8

Topics in Theoretical Computer Science April 08, Lecture 8 Topics in Theoretical Computer Science April 08, 204 Lecture 8 Lecturer: Ola Svensson Scribes: David Leydier and Samuel Grütter Introduction In this lecture we will introduce Linear Programming. It was

More information

A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking

A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking 5nd IEEE Conference on Decision and Control December 10-13, 013. Florence, Italy A New Distributed Algorithm for Side-Chain Positioning in the Process of Protein Docking Mohammad Moghadasi, Dima Kozakov,

More information

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models

Protein Modeling. Generating, Evaluating and Refining Protein Homology Models Protein Modeling Generating, Evaluating and Refining Protein Homology Models Troy Wymore and Kristen Messinger Biomedical Initiatives Group Pittsburgh Supercomputing Center Homology Modeling of Proteins

More information

Submodularity in Machine Learning

Submodularity in Machine Learning Saifuddin Syed MLRG Summer 2016 1 / 39 What are submodular functions Outline 1 What are submodular functions Motivation Submodularity and Concavity Examples 2 Properties of submodular functions Submodularity

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University

Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Alpha-helical Topology and Tertiary Structure Prediction of Globular Proteins Scott R. McAllister Christodoulos A. Floudas Princeton University Department of Chemical Engineering Program of Applied and

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE

Examples of Protein Modeling. Protein Modeling. Primary Structure. Protein Structure Description. Protein Sequence Sources. Importing Sequences to MOE Examples of Protein Modeling Protein Modeling Visualization Examination of an experimental structure to gain insight about a research question Dynamics To examine the dynamics of protein structures To

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Decision Procedures An Algorithmic Point of View

Decision Procedures An Algorithmic Point of View An Algorithmic Point of View ILP References: Integer Programming / Laurence Wolsey Deciding ILPs with Branch & Bound Intro. To mathematical programming / Hillier, Lieberman Daniel Kroening and Ofer Strichman

More information

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Protein Threading. BMI/CS 776  Colin Dewey Spring 2015 Protein Threading BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2015 Goals for Lecture the key concepts to understand are the following the threading prediction task

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Associate Professor Computer Science Department Informatics Institute University of Missouri, Columbia 2013 Protein Energy Landscape & Free Sampling

More information

Introduction to Comparative Protein Modeling. Chapter 4 Part I

Introduction to Comparative Protein Modeling. Chapter 4 Part I Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature

More information

Template Free Protein Structure Modeling Jianlin Cheng, PhD

Template Free Protein Structure Modeling Jianlin Cheng, PhD Template Free Protein Structure Modeling Jianlin Cheng, PhD Professor Department of EECS Informatics Institute University of Missouri, Columbia 2018 Protein Energy Landscape & Free Sampling http://pubs.acs.org/subscribe/archive/mdd/v03/i09/html/willis.html

More information

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov

Molecular dynamics simulations of anti-aggregation effect of ibuprofen. Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Biophysical Journal, Volume 98 Supporting Material Molecular dynamics simulations of anti-aggregation effect of ibuprofen Wenling E. Chang, Takako Takeda, E. Prabhu Raman, and Dmitri Klimov Supplemental

More information

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.

Section Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints. Section Notes 8 Integer Programming II Applied Math 121 Week of April 5, 2010 Goals for the week understand IP relaxations be able to determine the relative strength of formulations understand the branch

More information

SOLVING INTEGER LINEAR PROGRAMS. 1. Solving the LP relaxation. 2. How to deal with fractional solutions?

SOLVING INTEGER LINEAR PROGRAMS. 1. Solving the LP relaxation. 2. How to deal with fractional solutions? SOLVING INTEGER LINEAR PROGRAMS 1. Solving the LP relaxation. 2. How to deal with fractional solutions? Integer Linear Program: Example max x 1 2x 2 0.5x 3 0.2x 4 x 5 +0.6x 6 s.t. x 1 +2x 2 1 x 1 + x 2

More information

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs

Integer Programming ISE 418. Lecture 8. Dr. Ted Ralphs Integer Programming ISE 418 Lecture 8 Dr. Ted Ralphs ISE 418 Lecture 8 1 Reading for This Lecture Wolsey Chapter 2 Nemhauser and Wolsey Sections II.3.1, II.3.6, II.4.1, II.4.2, II.5.4 Duality for Mixed-Integer

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Docking. GBCB 5874: Problem Solving in GBCB

Docking. GBCB 5874: Problem Solving in GBCB Docking Benzamidine Docking to Trypsin Relationship to Drug Design Ligand-based design QSAR Pharmacophore modeling Can be done without 3-D structure of protein Receptor/Structure-based design Molecular

More information

Graphical Model Inference with Perfect Graphs

Graphical Model Inference with Perfect Graphs Graphical Model Inference with Perfect Graphs Tony Jebara Columbia University July 25, 2013 joint work with Adrian Weller Graphical models and Markov random fields We depict a graphical model G as a bipartite

More information

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved. Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved. 1 Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Reconnect 04 Introduction to Integer Programming

Reconnect 04 Introduction to Integer Programming Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, Reconnect 04 Introduction to Integer Programming Cynthia Phillips, Sandia National Laboratories Integer programming

More information

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C.

CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. CS 273 Prof. Serafim Batzoglou Prof. Jean-Claude Latombe Spring 2006 Lecture 12 : Energy maintenance (1) Lecturer: Prof. J.C. Latombe Scribe: Neda Nategh How do you update the energy function during the

More information

Part III: Traveling salesman problems

Part III: Traveling salesman problems Transportation Logistics Part III: Traveling salesman problems c R.F. Hartl, S.N. Parragh 1/282 Motivation Motivation Why do we study the TSP? c R.F. Hartl, S.N. Parragh 2/282 Motivation Motivation Why

More information

Linear Programming. Scheduling problems

Linear Programming. Scheduling problems Linear Programming Scheduling problems Linear programming (LP) ( )., 1, for 0 min 1 1 1 1 1 11 1 1 n i x b x a x a b x a x a x c x c x z i m n mn m n n n n! = + + + + + + = Extreme points x ={x 1,,x n

More information

Assignment 2 Atomic-Level Molecular Modeling

Assignment 2 Atomic-Level Molecular Modeling Assignment 2 Atomic-Level Molecular Modeling CS/BIOE/CME/BIOPHYS/BIOMEDIN 279 Due: November 3, 2016 at 3:00 PM The goal of this assignment is to understand the biological and computational aspects of macromolecular

More information

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror

Molecular dynamics simulation. CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror Molecular dynamics simulation CS/CME/BioE/Biophys/BMI 279 Oct. 5 and 10, 2017 Ron Dror 1 Outline Molecular dynamics (MD): The basic idea Equations of motion Key properties of MD simulations Sample applications

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Protein sidechain conformer prediction: a test of the energy function Robert J Petrella 1, Themis Lazaridis 1 and Martin Karplus 1,2

Protein sidechain conformer prediction: a test of the energy function Robert J Petrella 1, Themis Lazaridis 1 and Martin Karplus 1,2 Research Paper 353 Protein sidechain conformer prediction: a test of the energy function Robert J Petrella 1, Themis Lazaridis 1 and Martin Karplus 1,2 Background: Homology modeling is an important technique

More information

Lecture 20: LP Relaxation and Approximation Algorithms. 1 Introduction. 2 Vertex Cover problem. CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8

Lecture 20: LP Relaxation and Approximation Algorithms. 1 Introduction. 2 Vertex Cover problem. CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8 CSCI-B609: A Theorist s Toolkit, Fall 2016 Nov 8 Lecture 20: LP Relaxation and Approximation Algorithms Lecturer: Yuan Zhou Scribe: Syed Mahbub Hafiz 1 Introduction When variables of constraints of an

More information

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007 Molecular Modeling Prediction of Protein 3D Structure from Sequence Vimalkumar Velayudhan Jain Institute of Vocational and Advanced Studies May 21, 2007 Vimalkumar Velayudhan Molecular Modeling 1/23 Outline

More information

Ab-initio protein structure prediction

Ab-initio protein structure prediction Ab-initio protein structure prediction Jaroslaw Pillardy Computational Biology Service Unit Cornell Theory Center, Cornell University Ithaca, NY USA Methods for predicting protein structure 1. Homology

More information

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation...

Context of the project...3. What is protein design?...3. I The algorithms...3 A Dead-end elimination procedure...4. B Monte-Carlo simulation... Laidebeure Stéphane Context of the project...3 What is protein design?...3 I The algorithms...3 A Dead-end elimination procedure...4 B Monte-Carlo simulation...5 II The model...6 A The molecular model...6

More information

Distributed Distance-Bounded Network Design Through Distributed Convex Programming

Distributed Distance-Bounded Network Design Through Distributed Convex Programming Distributed Distance-Bounded Network Design Through Distributed Convex Programming OPODIS 2017 Michael Dinitz, Yasamin Nazari Johns Hopkins University December 18, 2017 Distance Bounded Network Design

More information

Recitation 9: Loopy BP

Recitation 9: Loopy BP Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 204 Recitation 9: Loopy BP General Comments. In terms of implementation,

More information

Integer Linear Programming (ILP)

Integer Linear Programming (ILP) Integer Linear Programming (ILP) Zdeněk Hanzálek, Přemysl Šůcha hanzalek@fel.cvut.cz CTU in Prague March 8, 2017 Z. Hanzálek (CTU) Integer Linear Programming (ILP) March 8, 2017 1 / 43 Table of contents

More information

Topics in Approximation Algorithms Solution for Homework 3

Topics in Approximation Algorithms Solution for Homework 3 Topics in Approximation Algorithms Solution for Homework 3 Problem 1 We show that any solution {U t } can be modified to satisfy U τ L τ as follows. Suppose U τ L τ, so there is a vertex v U τ but v L

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Limitations of Algorithms Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Limitations of Algorithms We conclude with a discussion of the limitations of the power of algorithms. That is, what kinds

More information

1 Column Generation and the Cutting Stock Problem

1 Column Generation and the Cutting Stock Problem 1 Column Generation and the Cutting Stock Problem In the linear programming approach to the traveling salesman problem we used the cutting plane approach. The cutting plane approach is appropriate when

More information

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

More information

CS281A/Stat241A Lecture 19

CS281A/Stat241A Lecture 19 CS281A/Stat241A Lecture 19 p. 1/4 CS281A/Stat241A Lecture 19 Junction Tree Algorithm Peter Bartlett CS281A/Stat241A Lecture 19 p. 2/4 Announcements My office hours: Tuesday Nov 3 (today), 1-2pm, in 723

More information

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories for MILPs: Two Stories Santanu S. Dey Joint work with: Andres Iroume, Marco Molinaro, Domenico Salvagnin, Qianyi Wang MIP Workshop, 2017 Sparsity in real" Integer Programs (IPs) Real" IPs are sparse: The

More information

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix)

Protein folding. α-helix. Lecture 21. An α-helix is a simple helix having on average 10 residues (3 turns of the helix) Computat onal Biology Lecture 21 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

Inference in Graphical Models Variable Elimination and Message Passing Algorithm

Inference in Graphical Models Variable Elimination and Message Passing Algorithm Inference in Graphical Models Variable Elimination and Message Passing lgorithm Le Song Machine Learning II: dvanced Topics SE 8803ML, Spring 2012 onditional Independence ssumptions Local Markov ssumption

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Knowledge-based structure prediction of MHC class I bound peptides: a study of 23 complexes Ora Schueler-Furman 1,2, Ron Elber 2 and Hanah Margalit 1

Knowledge-based structure prediction of MHC class I bound peptides: a study of 23 complexes Ora Schueler-Furman 1,2, Ron Elber 2 and Hanah Margalit 1 Research Paper 549 Knowledge-based structure prediction of MHC class I bound peptides: a study of 23 complexes Ora Schueler-Furman 1,2, Ron Elber 2 and Hanah Margalit 1 Background: The binding of T-cell

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

3.7 Cutting plane methods

3.7 Cutting plane methods 3.7 Cutting plane methods Generic ILP problem min{ c t x : x X = {x Z n + : Ax b} } with m n matrix A and n 1 vector b of rationals. According to Meyer s theorem: There exists an ideal formulation: conv(x

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Exact Algorithms for Dominating Induced Matching Based on Graph Partition

Exact Algorithms for Dominating Induced Matching Based on Graph Partition Exact Algorithms for Dominating Induced Matching Based on Graph Partition Mingyu Xiao School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731,

More information

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins

Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Multi-Scale Hierarchical Structure Prediction of Helical Transmembrane Proteins Zhong Chen Dept. of Biochemistry and Molecular Biology University of Georgia, Athens, GA 30602 Email: zc@csbl.bmb.uga.edu

More information

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University

Protein Structure Analysis with Sequential Monte Carlo Method. Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University Protein Structure Analysis with Sequential Monte Carlo Method Jinfeng Zhang Computational Biology Lab Department of Statistics Harvard University Introduction Structure Function & Interaction Protein structure

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

Computational protein design

Computational protein design Computational protein design There are astronomically large number of amino acid sequences that needs to be considered for a protein of moderate size e.g. if mutating 10 residues, 20^10 = 10 trillion sequences

More information

Applications of Hidden Markov Models

Applications of Hidden Markov Models 18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation

More information

Extended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications

Extended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications Extended Formulations, Lagrangian Relaxation, & Column Generation: tackling large scale applications François Vanderbeck University of Bordeaux INRIA Bordeaux-Sud-Ouest part : Defining Extended Formulations

More information

11.1 Set Cover ILP formulation of set cover Deterministic rounding

11.1 Set Cover ILP formulation of set cover Deterministic rounding CS787: Advanced Algorithms Lecture 11: Randomized Rounding, Concentration Bounds In this lecture we will see some more examples of approximation algorithms based on LP relaxations. This time we will use

More information

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4

Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4 Z1 Z2 Z3 Z4 Inference: Exploiting Local Structure aphne Koller Stanford University CS228 Handout #4 We have seen that N inference exploits the network structure, in particular the conditional independence and the

More information

Structured Variational Inference

Structured Variational Inference Structured Variational Inference Sargur srihari@cedar.buffalo.edu 1 Topics 1. Structured Variational Approximations 1. The Mean Field Approximation 1. The Mean Field Energy 2. Maximizing the energy functional:

More information

Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations

Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations 65:164 179 (2006) Modeling Protein Conformational Ensembles: From Missing Loops to Equilibrium Fluctuations Amarda Shehu, 1 Cecilia Clementi, 2,3 * and Lydia E. Kavraki 1,3,4 * 1 Department of Computer

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th

Introduction The gramicidin A (ga) channel forms by head-to-head association of two monomers at their amino termini, one from each bilayer leaflet. Th Abstract When conductive, gramicidin monomers are linked by six hydrogen bonds. To understand the details of dissociation and how the channel transits from a state with 6H bonds to ones with 4H bonds or

More information

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract) Arthur Choi and Adnan Darwiche Computer Science Department University of California, Los Angeles

More information

to work with) can be solved by solving their LP relaxations with the Simplex method I Cutting plane algorithms, e.g., Gomory s fractional cutting

to work with) can be solved by solving their LP relaxations with the Simplex method I Cutting plane algorithms, e.g., Gomory s fractional cutting Summary so far z =max{c T x : Ax apple b, x 2 Z n +} I Modeling with IP (and MIP, and BIP) problems I Formulation for a discrete set that is a feasible region of an IP I Alternative formulations for the

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

Bio nformatics. Lecture 23. Saad Mneimneh

Bio nformatics. Lecture 23. Saad Mneimneh Bio nformatics Lecture 23 Protein folding The goal is to determine the three-dimensional structure of a protein based on its amino acid sequence Assumption: amino acid sequence completely and uniquely

More information

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy

Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Design of a Novel Globular Protein Fold with Atomic-Level Accuracy Brian Kuhlman, Gautam Dantas, Gregory C. Ireton, Gabriele Varani, Barry L. Stoddard, David Baker Presented by Kate Stafford 4 May 05 Protein

More information

Tightness of LP Relaxations for Almost Balanced Models

Tightness of LP Relaxations for Almost Balanced Models Tightness of LP Relaxations for Almost Balanced Models Adrian Weller University of Cambridge AISTATS May 10, 2016 Joint work with Mark Rowland and David Sontag For more information, see http://mlg.eng.cam.ac.uk/adrian/

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information