Determining Optimum Path in Synthesis f Organic Cmpunds using Branch and Bund Algrithm Diastuti Utami 13514071 Prgram Studi Teknik Infrmatika Seklah Teknik Elektr dan Infrmatika Institut Teknlgi Bandung, Jl. Ganesha 10 Bandung 40132, Indnesia 13514071@std.stei.itb.ac.id Abstract Synthesis f an rganic cmpund frm a certain cmpund ffers a wide variety f pssibility. Getting the ptimum yield in a synthesis reactin has been a cncern fr many years in the field f rganic chemistry. Smetimes, the reactin can be mre cmplicated that it requires sme time t determine the mst efficient path. Nwadays, there is a number f sftware that culd assist in determining the ptimum path, such as sftware ffering a database f reactins that culd be used in path determining. In this paper, we will discuss hw branch and bund algrithm culd find ptimum path in synthesis f rganic cmpund(s). Keywrds branch and bund; cmputatinal chemistry;branch and bund algrithm; I. INTRODUCTION Organic chemistry is a field f chemistry that studies cmpunds and substances made f a chain f carbn atms. Reactins in rganic chemistry ften cnsidered unique because it have distinct mechanism and almst always nt yielding 100% prducts. The thing with rganic reactins is that it takes a lt f time. Therefre, catalysts are ften invlved in many reactins, but as we knew, the cst f catalyst ften nt cheap. Anther thing is, the reactin is nt always effective. Suppse we want t synthesize cmpund A frm cmpund B, it will nly yield 45% cmpund A, meaning that if we predict we can get, say, ne mle f A, in reality we nly get 0.45 mles f A. Organic cmpunds are ften used in industrial scales, yet the prcess t synthesize an rganic cmpund ften invlves many steps and takes a lt f time. On the ther hand, there are many ways pssible t synthesize an rganic cmpund. This results in a challenge f getting the mst efficient way t synthesize an rganic cmpund. Anther challenge is t pen all the pssibilities f reactins. This is essential t determine the ptimum path, a synthesis reactin ften has many pssibilities, but t create a cmprehensive database f rganic reactins is anther challenge. Furthermre, we have t match the cmpund we want t search with the database that s anther thing. The breakthrugh in cmputatinal chemistry allws an insight t slve the challenge. Nwadays, many sftware ffer assistant n searching the pssible reactins, as database f rganic reactins have been develped everywhere. Out f many pssibilities t determine the mst efficient way t synthesize an rganic cmpund, we will see hw the branch and bund algrithm slves the prblem. II. THEORY OF BRANCH AND BOUND ALGORITHM Branch and bund is a state space search methd in which all the children f a nde are generated befre expanding any f its children. It is similar t backtracking technique but uses BFS-like search. In branch and bund, we will use the term live-nde t describe a nde that has nt been expanded, dead nde which is a nde that has been expanded, and slutin nde. Figure 1 Synthesis f benzcaine frm tluene Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016
Figure 2 FIFO and LIFO Branch & Bund The searching used in branch and bund is called least-cst search (LC). Least-cst search have sme attributes: The selectin rule fr the next E-nde in FIFO r LIFO branch-and-bund is smetimes blind. i.e. the selectin rule des nt give any preference t a nde that has a very gd chance f getting the search t an answer nde quickly. The search fr an answer nde can ften be speeded by using an intelligent ranking functin, als called an apprximate cst functin C Expanded-nde (E-nde): is the live nde with best C value Requirements: Branching: A set f slutins, which is represented by a nde, can be partitined int mutually exclusive sets. Each subset in the partitin is represented by a child f the riginal nde. Lwer bunding: An algrithm is available fr calculating a lwer bund n the cst f any slutin in a given subset. Example is 8-puzzle, where: Cst functin: C = g(x) +h(x), where h(x) = the number f misplaced tiles and g(x) = the number f mves s far Assumptin: mve ne tile in any directin cst 1 Figure 3 8-puzzle branch and bund Glbal branch and bund algrithm: 1. Insert rt nde t queue Q. If rt nde is a slutin nde, then stp. 2. If Q empty and n slutin fund, then stp. 3. If Q nt empty, select frm Q a nde i that have minimum cst C(i). if there is mre than 1 nde that satisfies the cnditin, chse ne randmly. 4. If nde i is a slutin nde, then stp. Else, generate all the child ndes. If there is n child fund, return t step 2. 5. Fr each child j frm nde i, cunt C(j), and insert the children t Q. 6. Return t step 2. The algrithm in this paper is similar t the knapsack prblem. Knapsack prblem is a prblem where: Input Capacity K Gal n items with weights w i and values v i Output a set f items S such that the sum f weights f items in S is at mst K and the sum f values f items in S is maximized Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016
Where: Figure 4 Tree fr a knapsack prblem Prperties f a knapsack tree as fllws: Left child is always xi = 1 and right child is always xi = 0 Bunding functin t prune tree: At a live nde in the tree, if we can estimate the upper bund (best case) prfit at that nde, and if that upper bund is less than the prfit f an actual slutin fund already, then we dn t need t explre that nde. We can use the greedy knapsack as ur bund functin as it gives an upper bund, since the last item in the knapsack is usually fractinal Greedy algrithms are ften gd ways t cmpute upper (ptimistic) bunds n prblems, e.g., fr jb scheduling with varying jb times, we can cut each jb int equal length parts and use the greedy jb scheduler t get an upper bund Linear prgrams that treat the 0-1 variables as cntinuus between 0 and 1 are ften anther gd chice Suppse we have a knapsack prblem like in the table abve, where maximum weight is 110. This generate a slutin tree: Numbers inside a nde are prfit and weight at that nde, based n decisins frm rt t that nde Ndes withut numbers inside have same values as their parent Numbers utside the nde are upper bund calculated by greedy algrithm Upper bund fr every feasible left child (xi =1) is same as its parent s bund Chain f left children in tree is same as greedy slutin at that pint in the tree We nly recmpute the upper bund when we can t mve t a feasible left child Final prfit and final weight (lwer bund) are updated at each leaf nde reached by algrithm Slutin imprves at each leaf nde reached N further leaf ndes reached after D because lwer bund (ptimal value) is sufficient t prune all ther tree branches befre leaf is reached By using flr f upper bund at ndes E and F, we avid generating the tree belw either nde Since ptimal slutin must be integer, we can truncate upper bunds By truncating bunds at E and F t 159, we avid explring E and F Branch and bund algrithm can als use depth first search. Depth first search is used in cmbinatin with breadth first search in many prblems. Cmmn strategy is t use depth first search n ndes that have nt been pruned. This gets t a leaf nde, and a feasible slutin, which is a lwer bund that can be used t prune the tree in cnjunctin with the greedy upper bunds. If greedy upper bund is less than lwer bund, prune the tree. Once a nde has been pruned, breadth first search is used t mve t a different part f the tree. Depth first search bunds tend t be very quick t cmpute if we mve dwn the tree sequentially, e.g. the greedy bund desn t need t be recmputed. Linear prgram as bunds are ften quick t: few simplex pivts. III. BRANCH AND BOUND IN SYNTHESIS REACTIONS First ff, we shuld determine hw we will cunt the bund. The factrs related t the bund include yield, time, and cst. Because here we can t predict the cst, the bund will be determined using yield and time nly. Let s take a lk at this simple reactin diagram. We can make a table ut f it, cnsisting reagents, prducts, yield, time, and yield/time rati. In this table C stands fr cmpund, s C 1 means cmpund number 1, and s n. We will assume that the alkyl R is CH 3. Figure 5 Generated tree frm the knapsack prblem Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016
In real life, the diagram was generated thrugh a searching in the database. A user will input a final prduct and sme pssible reagent that he/she has, and then frm the database, we will get a few pssibilities available frm reagents t prduce a certain prduct. The reactin path is represented with a directed graph. Reagent Prduct Yield Time(m) Yield/Time C 2 C 1 90% 60 1.500 C 2 C 3 82% 35 2.343 C 2 C 4 78% 75 1.040 C 2 C 5 75% 120 0.625 C 2 C 7 85% 80 1.063 C 3 C 6 62% 95 0.653 C 3 C 7 46% 120 0.383 C 5 C 2 70% 120 0.583 C 7 C 6 73% 34 3.042 C 7 C 8 80% 40 2 C 7 C 9 79% 60 1.317 C 7 C 10 60% 55 1.091 C 7 C 11 55% 25 2.200 C 9 C 12 74% 20 3.700 C 9 C 13 92% 30 3.067 Table 1 Yield and time data table f reactin map in figure 6 Figure 6 Reactin Map f Armatic Cmpunds As this prblem similar t the knapsack prblem, we can derived the bund frmula as bund = ttal yield + (( time time taken) * next best yield/time) with ttal yield = current yield * selected reactin yield There are a few differences frm the knapsack prblem, such as: The next best yield/time can nly be determined by accessible cmpund frm the reagent cmpund. The final bund calculatin use the yield/time fr final nde t replace the next best yield/time Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016
We assume that the ndes accessible frm a certain nde have been determined when we retrieved the data frm the given database. If there was n further reactin pssible frm a nde, and the nde isn t slutin nde, then deactivate the nde. Suppse we want t synthesize cmpund 13, and we dn t have any cmpund 7 yet (and later cmpunds), there are few pssibilities. First, we can start frm cmpund 2, 3, r 5. The prcess f getting the result will be summarized n the table belw. C 0 Expanded Ndes C 3, C 2 C 27 C 279 Active Ndes C 2 = 0 + (969 * 2.343) = 2270.367 C 5 = 0 + (969 * 0.583) = 564.927 C 3 = 0 + (969 * 0.653) = 632.757 C 36 = 0.62 + ((969 95) * 0) (dead because n further reactin pssible) C 37 = 0.46 + ((969-120) * 3.042) = 2583.118 - C 21 = 0.9 + ((969 60) * 0) = 0.9 (dead because n further reactin pssible) C 24 = 0.78 + ((969 75) * 0) = 0.78 (dead because n further reactin pssible) C 27 = 0.85 + ((969 80) * 3.042) = 2705.188 (C 5 dead because reactin pssible is nly t cmpund 2 and cmpund 2 already accessible frm the start) C 37 = 2583.118 C 276 = 0.85 * 0.73 + ((969 80 34) * 0) = 0.621 (dead because n further reactin pssible) C 278 = 0.68 (dead because n further reactin pssible) C 279 = 3067.9715 C 2710 = 0.51 (dead because n further reactin pssible) C 2711 = 0.468 (dead because n further reactin pssible) C 37 = 2583.118 C 27912 = 0.497 (dead because n further reactin pssible) C 27913 Slutin fund, bund = 2451.151 C 37 C 379 C 27913 C 27913 = 2451.151 C 378 = 0.368 (dead because n further reactin pssible) C 379 = 2919.6634 C 3710 = 0.276 (dead because n further reactin pssible) C 3711 = 0.253 (dead because n further reactin pssible) C 27913 = 2451.151 C 37913 = 2328.187 Final slutin fund. Nde C 37913 dead because the bund is lwer. N ther nde active. Table 2 Branch and Bund Decmpsitin f the Armatic Reactin Frm the table abve, we can determine the slutin is C 2 C 7 C 9 C 13, which has the ptimum path cnsidering the yield and time taken. We shuld nte that the result n the table abve is generated by nly cnsidering yield (number f prduct that can be generated) and time taken. In reality, many things shuld be taken int accunt and there shuld be a way t make a quantificatin f thse things, such as cst, accessibility t the prduct, and s n. By lking at the diagram, we culd see that this result is valid because the nly pssible way t synthesize cmpund 13 is by is C 2 C 7 C 9 C 13 r is C 3 C 7 C 9 C 13, and it is clear that the yield and time taken in via C 2 is much mre effective than via C 3. In real life, the algrithm itself can be applied t a mre cmplicated reactin map, such as: Figure 7 A Mre Cmplicated Reactin Paths which wn t be discussed here because there are numerus pssibilities and it will be t lng. Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016
IV. CONCLUSION By using branch and bund algrithm strategy, the bund can be cunted by using the frmula: bund = ttal yield + (( time time taken) * next best yield/time) with ttal yield = current yield * selected reactin yield and is prven valid as it culd generate the ptimum path f the reactin n figure 6. The branch and bund algrithm can find the ptimum path in rganic reactins, given many pssibilities, yield, and time taken. This algrithm can be imprved by cnsidering the cst r mney t underg a reactin. This is als essential because in industrial wrld, we shuld make a prductin nt nly time effective, but als cst effective. V. ACKNOWLEDGEMENT I thank Allah SWT fr his will s I can finish this paper and my parents wh had always supprted me. I als want t send my gratitude t Mr. Rinaldi Munir and Mrs. Nur Ulfa Maulidevi as IF 2211 lecturers fr bth have taught Algrithm Strategy. I wuld als like t thank all the peple wh were invlved in the prcess f writing this paper. Thank yu fr the peple wh have helped me bth directly and indirectly. I hpe this paper I wrte will help peple wh read it t gain mre knwledge like I did while writing it. REFERENCES [1] Munir, Rinaldi. Diktat Kuliah IF2211 Strategi Algritma. Bandung: Teknik Infrmatika Institut Teknlgi Bandung, 2009. [2] Slmn, Graham T.W. Organic Chemistry. Jhn Wiley & Sns, 2011. [3] J, Clayden, et al. Organic Chemistry. Ocfrd, 2001 [4] Zumdahl, Steven S, Susan A. Zumdahl. General Chemistry. Brks & Cle, 2010 [5] http://cw.mit.edu/curses/civil-and-envirnmental-engineering/1-204- cmputer-algrithms-in-systems-engineering-spring-2010/lecturentes/mit1_204s10_lec16.pdf [6] https://www.seas.gwu.edu/~bell/csci212/branch_and_bund.pdf [7] http://stanfrd.edu/class/ee364b/lectures/bb_slides.pdf STATEMENT I hereby declare that this paper is my wn wrk and nt a cpy, translatin, nr plagiarism f smebdy else s wrk. Bandung, May 9 2016 Diastuti Utami 13514071 Makalah IF2211 Strategi Algritma, Semester II Tahun 2015/2016