Combinatorial Algorithms and Computational Complexity for DNA Self-Assembly

Combinatorial Algorithms and Computational Complexity for DNA Self-Assembly Ming-Yang Kao Northwestern University Evanston, Illinois USA Presented at Kyoto University on December 12, 2013

Outline of the Talk 1. examples of self-assembly 2. examples of DNA self-assembly 3. a basic model for DNA self-assembly 4. combinatorial problems for DNA self-assembly use DNAs to self-assemble shapes use DNAs to self-assemble circuits design DNA sequences for DNA self-assembly If we have time 5. general research directions 2

What Is Self-Assembly? [adapted from a slide of Shinnosuke Seki] Self-assembly is a phenomenon in which complex structures emerge from simple components through local interactions with limited global control. 3

Example of Self-Assembly Self-Assembly by Magnetic Forces [http://www.math.udel.edu/meclab, 2007] 4

Example of Self-Assembly Self-Assembly of Stars into Galaxy [hubblesite.org, http://self-assembly.net, 2013] 5

Example of Self-Assembly Hydrophilic and Hydrophobic Interactions [http://staff.jccc.net/pdecell/chemistry/selfassem.html] proteins and molecules on cell membrane 6

Example of Self-Assembly Human Language Development [adapted from a slide of S. Seki] Speaking similar languages leads to being socially close. Being socially close leads to similar languages. 7

Example of Self-Assembly Robot Self-Assembly via Cellular Automata [Tuci et al., 2006] A group of robots physically connected to each other that (a) moves on rough terrain and (b) passes over a gap during an experiment in a close arena with a flat terrain. 8

Example of Self-Assembly Robot Self-Assembly -- Kilorobot Project [Self-Organizing Systems Research Group, Harvard, 2011] 9

Example of Self-Assembly Crystal Formation [http://web.mit.edu/lms/www, Zhang, 2001] 10

Example of Self-Assembly Insulation around Copper Wiring [http://www.technologyreview.com/biztech, 2007] This microprocessor cross section shows empty space in between the chip s copper wiring. Wires are usually insulated with a glasslike material, but IBM has used selfassembly techniques, which can be employed in chip-making facilities, to create air gaps that insulatethe wires. Credit: IBM 11

Example of Self-Assembly Self-Assembly of Hot Dog Slices [bradley.bradley.edu/~campbell/demopix6.html, 2013] Left: Cutting hot dogs into slices. Right: Floating them in a pan of water. 12

Example of Self-Assembly Self-Assembly of Lego Pieces [http://www.math.udel.edu/meclab, 2007] LEGO Bricks + Water + Capillary Forces 13

Example of Self-Assembly DNA Brick Structures Analogous to LEGO Brick Structures [Ke et al., Science 2012, 338:1177-1183, Peng Yin s Lab at Harvard] 14

Message: Self-assembly is everywhere and has many kinds! Focus of This Talk: Algorithmic DNA Self-Assembly 15

Algorithmic DNA Self-Assembly Algorithms + DNA + Self-Assembly In the intersection of Nanotechnology Theoretical Computer Science 16

Algorithmic DNA Self-Assembly Nanotechnology + Theoretical Computer Science Objective: Use DNA to create nanostructures. Methodology: Step 1: Encode a program into DNAs. Step 2: Execute the program to guide the DNAs to self-assemble into desired nanostructures. How to encode a program: DNA has 4 bases, A, C, G, T. How to execute a program: A T and C G. When DNAs bind, the binding executes the program. There are other possibilities for the above! 17

Types of Algorithmic DNA Self-Assembly 1 dimensional 2 dimensional 3 dimensional more focus of this talk 18

DNA Tiles -- Basic Unit of 2D Self-Assembly TILE encode a program execute the program G C A T C G C G T A G C 19

Algorithmic DNA Self-Assembly Program = Tiles + Lab Steps Output = Shape + Pattern 20

Examples of DNA Tiles [Holliday, 1964] exchange of genetic information in yeast aaa a 21

Examples of DNA Tiles aaa a TILE aaa a 22

Examples of DNA Tiles [Reif s Group, Duke University] A G A T C G A C T C T A G C T G T A C C G C A T A T G G C G T A A T A G C T A T C G T G A T C G G A A C T A G C C T G C T T G A C C C G A A C T G G A T A G C T A T C G A T A G C T A T C G A C T A G C C T A C T A G C C T C T A G C C G T G A T C G G C A G T A C A C A T G T A T A G C T A T C G A T A G C T A T C G T G A A T A G C A C T T A T C G A C T A G C C T A C T A G C C T A T A G C T A T C G A T A G C T A T C G G A C A G C G G T C T T C C A 9 DNA sequences T T A G T 23

Examples of DNA Tiles [Park, Pistol, Ahn, Reif, Lebeck, Dwyer, and LaBean, 2006] 24

Examples of DNA Tiles [Winfree s Group, Cal Tech] 25

Examples of DNA Tiles [Sierpinski Triangle, Rothemund, Papadakis, Winfree, 2004] 26

Recap: Algorithmic DNA Self-Assembly Objectives and Methodologies: 1. Use DNA to compute. 2. Use computation to guide DNAs to selfassemble. Next, we will see 1. some examples and 2. some basic models for such computation. 27

Self-Assembly for Binary Counters [Winfree, 2000]

Examples of DNA Tiles [Winfree s Group, Cal Tech] 29

Self-Assembly for Binary Counters [Barish, Rothemund, Winfree, 2005] 30

2D Self-Assembly for Turing Machines [Winfree, Yang, and Seeman, 1998]

1D Self-Assembly for Regular Languages [Winfree, Yang, and Seeman, 1998]

Tree Self-Assembly for Context-Free Languages [Winfree, Yang, and Seeman, 1998]

Example of Self-Assembly DNA Brick Structures Analogous to LEGO Brick Structures [Ke et al., Science 2012, 338:1177-1183, Peng Yin s Lab at Harvard] 34

A Basic Model of DNA Self-Assembly [Rothemund and Winfree, STOC 2000] tile system: (T, s, G, t) T: tile set s: seed tile r {,,... } b y y w, b b g p r r r G: glue function G : {0,1,..., t} t : temperature, positive integer 35

T = S x Example: Build a Square 1. positive strength between same glues 2. zero strength between distinct glues 3. start with the seed tile 4. add one tile at a time 5. bind if total strength is at least t a c 6. order must not affect final shape and pattern b d G(, ) = 2 G(, ) = 2 G(, ) = 2 G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 36

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 S 37

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 S a 38

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 c S a 39

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c S a 40

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c S a b 41

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c x S a b 42

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c x x S a b 43

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d x temperature t = 2 c x x S a b 44

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d x x temperature t = 2 c x x S a b 45

Observations size of the 3 x 3 square = 9 cells number of distinct tiles used = 6 Question #1: To assemble an n x n square, how many distinct tiles do we need? Answer #1: at most n 2 distinct tiles. Question #2: What is the smallest number of distinct tiles that we need? Answer #2:??? 46

Example of Combinatorial Problems Tile Complexity for Shapes Input: a connected shape S Output: a minimum number of tiles that selfassembles S. 47

Tile Complexity of Squares Theorem: (Adleman et al. 2001) 1. An n x n square can be self-assembled by Θ(log n/log log n) distinct tiles at temperature 2. 2. Such a tile set can be computed in polynomial time in n. 48

Tile Complexity of General Shapes Theorem: (Adleman et al. 2002) For general shapes, it is NP-hard to compute a minimum number of distinct tiles to self-assemble a given shape at a fixed temperature. Open Problem: polynomial-time approximation algorithms with good approximation ratios 49

Tile Complexity of Squares Question: Can we do better than Theta(log n/log log n) for squares? Answer: Yes, if we adjust the temperature. 50

Temperature Programming the Case of Squares Theorem: [Kao, Schweller 2006] We can selfassemble a n x n square using O(1) tiles and adjusting the temperature O(log n) times using O(1) different temperatures. Intuition: Adjusting temperature is a form of encoding information and programs into selfassembly. 51

Temperature Programming for General Shapes Theorem: [Summers 2009] There is a set of O(1) distinct tiles that can selfassemble any finite shape S by adjusting the temperatures O(kolmogorov(S)) times, using O(1) distinct temperatures, and scaling the shape S by a constant factor c, where c depends on S. Kolmogorov(S) = Kolmogorov complexity of S 52

Temperature Programming for General Shapes Theorem: [Summers 2009] There is a set of O(1) distinct tiles that can selfassemble any finite shape S by adjusting the temperatures O( S ) times, using O(1) distinct temperatures, and scaling the shape S by a constant factor 22. trade-off: scaling factor versus # of temperature adjustments 53

Why Do We Want to Assemble Shapes? There are many potential science-fictionlike applications, including the following one: producing nano-circuits 54

A Long-Range Research Goal of This Field DNA Self-Assembly for Nano-Circuits [adapted from a slide of Shinnosuke Seki] 55

How to Self-Assemble a Nano-Circuit? Possible Methodology: Step 1: Attach circuit components to DNA tiles. Step 2: DNA tiles self-assemble into a pattern. Step 3: The pattern is the desired circuit. circuit components: AND-gate, OR-gate, NOTgate, wire, etc. 56

Proof of Concept Self-Assembly for Circuit Patterns [Cook, Rothemund, and Winfree, 2003]

Proof of Concept Attaching Gold Particles to DNA Tiles [Reif s Group, Duke University] A G A T C G A C T C T A G C T G T A C C G C A T A T G G C G T A A T A G C T A T C G T G A T C G G A A C T A G C C T G C T T G A C C C G A A C T G G A T A G C T A T C G A T A G C T A T C G A C T A G C C T A C T A G C C T C T A G C C G T G A T C G G C A G T A C A C A T G T A T A G C T A T C G A T A G C T A T C G T G A A T A G C A C T T A T C G A C T A G C C T A C T A G C C T A T A G C T A T C G A T A G C T A T C G G A C A G C G G T C T T C C A T T A G T 58

Proof of Concept Attaching Gold Particles to DNA Tiles [Park, Pistol, Ahn, Reif, Lebeck, Dwyer, and LaBean, 2006] 59

A Model for Self-Assembly of Circuits Changes to the Basic Model: 1. Locations in the input shape have colors. 2. Tiles also have colors. 3. Colors correspond to circuit components. 4. The color of a tile at a location matches the color of that location. 5. L-shape seed: the assembly starts with a L-shape border rather than a single tile. self-assembly for circuits = self-assembly for color patterns 60

Self-Assembly for Circuit Patterns [Cook, Rothemund, and Winfree, 2003] component (or functionality) of a location or tile = color of that location or tile L-seed

Self-Assembly for Color Patterns 62

Self-Assembly for Color Patterns 63

Self-Assembly for Color Patterns 64

Self-Assembly for Color Patterns 65

Example of Combinatorial Problems The PATS Problem (Patterned Self-Assembly Tile Synthesis) Input: a color pattern P of a rectangular shape. Output: a minimum number of tiles that selfassembles P starting from an L-shape seed. 66

Computational Complexity of PATS Theorem: (Czeizler, Popa 2012) If the input pattern may have an arbitrary number of colors, PATS is NPhard. Theorem: (Seki 2013) For 60-color patterns, PATS is NP-hard. 67

Computational Complexity for PATS Theorem: (Johnsen, Kao, Seki, in ISAAC 2013) 1. For 29-color patterns, PATS is NP-hard. 2. Moreover, approximation of the minimum number of tiles within a factor of 47/46 is NPhard as well. Proof: 1. Reduction from Subset Sum. 2. Case analysis based on 118 color patterns. 68

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 69

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 70

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 71

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 72

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 73

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 74

Further Work for Self-Assembly of Circuits Work in Progress: For 11-color patterns, PATS is NP-hard. (Johnsen and Seki) Work in Progress: For 4-color patterns, PATS is NP-hard. (Cal Tech, computer-generated case analysis) Conjecture: For 2-color patterns, PATS is NP-hard. Fact: For 1-color patterns, PATS only needs 1 tile in addition to the L-seed. Open Problem: good approximation algorithms for PATS final objective 75

Key Steps in Design of Tile Self-Assembly 1. Specify a shape or a pattern. 2. Design a tile system to self-assemble the shape or pattern. 3. Design DNA words (i.e., DNA sequences) to form the tiles. 76

DNA Tiles TILE G C A T C G DNA words C G T A G C 77

Applications of DNA Word Design Information Storage at Molecular Level Molecular Bar Codes DNA Arrays Algorithmic DNA Self-Assembly focus of this talk. 78

Example of Combinatorial Problems DNA Word Design Context: We are given some constraints on the desired words, and the alphabet DNA = {A,C,G,T}. Algorithmic Problem: Input: an integer n Output: a code W of n words of same length L: W satisfies the constraints, and L is minimized. 79

Two Types of Constraints Binding Constraints: Such constraints are heuristics that help maximize the probability that each word X in W only binds with its Watson-Crick complement X C. X = A G T T A G C X C = T C A A T C G Thermodynamic Constraints: Such constraints are heuristics that help maximize the probability that all words in W have similar thermodynamic properties (e.g., melting temperature). 80

9 Constraints Considered for Our Work All 9 constraints are taken from the literature. Binding Constraints: 1. Basic Hamming Constraint C 1 (k 1 ) 2. Reverse Complementary Constraint C 2 (k 2 ) 3. Self Complementary Constraint C 3 (k 3 ) 4. Shifting Hamming Constraint C 4 (k 4 ) 5. Shifting Reverse Complementary Constraint C 5 (k 5 ) 6. Shifting Self Complementary Constraint C 6 (k 6 ) 7. Consecutive Base Constraint C 8 (d) Thermodynamic Constraints: 1. GC Content Constraint C 7 (ϒ) 2. Free Energy Constraint C 9 (σ) 81

Binding Constraints and Hamming Distance Ideal Case for Binding: Two DNA words X and Y bind only when X and Y are Watson-Crick complementary. X = A G T T A G C Y = T C A A T C G Non-Ideal Case for Binding: X may bind with Y even if X and Y are not 100% complementary. Binding Constraints: To help prevent non-matched binding, we want a large Hamming distance between X and Y C. 82

Basic Hamming Constraint C 1 (k 1 ) Mathematical Condition: For all distinct words Y and X in W, H(Y, X) k 1. Y X Hamming distance Biological Meaning: This constraint helps prevent X from binding with the complement of Y. 83

Reverse Complementary Constraint C 2 (k 2 ) Mathematical Condition: For all distinct words Y and X in W, H(Y,X RC ) k 2. Y Y X 1 2...YL-1Y L C L X C L-1...X C 2 X C 1 X R X = reverse of X = X 1 X 2 X L X R = X L X 2 X 1 Biological Meaning: This constraint helps prevent Y from binding with the reverse of X. 84

Self Complementary Constraint C 3 (k 3 ) Same as C 2 (k 2 ) but with X = Y. Mathematical Condition: For each word Y in W, H(Y, Y RC ) k 3. Biological Meaning: This constraint prevents a word Y from binding with the reverse of itself. 85

Shifting Hamming Constraint C 4 (k 4 ) Mathematical Condition: For all distinct words Y and X in W, H (Y [1..i],X[(L i + 1)..L]) k 4 (L i) for all L i L k 4. Y X Biological Meaning: This constraint helps prevent a prefix of Y from binding with the complement of a suffix of X. 86

Shifting Reverse Complementary Constraint C 5 (k 5 ) Mathematical Condition: For all distinct words Y and X in W, H(Y[1..i], X[1..i] RC ) k 5 (L i), and H(Y[(L i + 1)..L],X [(L i + 1)..L] RC k 5 (L i) for all L i L k 5. X C L X C L-1 Y Y...X 1 2...YL-1Y L C 2 X C 1 Y Y 1 2...YL-1Y L X C L X C L-1...X Biological Meaning: This constraint helps prevent a prefix of Y from binding with the reverse of a prefix of X and prevent a suffix of Y from binding with the reverse of a suffix of X. 87 C 2 X C 1

Shifting Self Complementary Constraint C 6 (k 6 ) Same as C 5 (k 5 ) but with X = Y. Mathematical Condition: For each word Y in W, H(Y [1..i], Y[1..i] RC ) k 6 (L i), and H(Y [(L i + 1)..L, Y [(L i + 1)..L] RC ) k 6 (L i) for all L i L k 6. Y Y 1 2...YL-1Y L Y Y Y C L Y C C L-1 1 2...YL-1Y L Y C Y...Y...Y L L-1 2 1 Biological Meaning: This constraint helps prevent a prefix of Y from binding with its reverse and prevent a suffix of Y from binding with its reverse. 88 C C 2 Y Y C C 1

GC Content Constraint C 7 (ϒ) Mathematical Condition: ϒ percentage of the bases in any word Y in W are either G or C. AGCTCCCCCCTTAAA GGTCGCAATTTTGGC Biological Meaning: The GC content affects the thermodynamic properties of a word. Having the same ratio of GC content for all the words helps ensure that the words in W have similar thermodynamic characteristics. 89

Consecutive Base Constraint C 8 (d) Mathematical Condition: No word has more than d consecutive bases. A A A A A A A G G G G G G G G T T T T T T T C C C C C C C C AGCTCCCCCCTTAAA E.g., two perfectly complementary words bind at wrong positions. Biological Meaning: In some applications, consecutive occurrences of the same base increase binding errors. 90

Free Energy Constraint C 9 (σ) Mathematical Condition: For all words Y and X in W, FE(Y ) FE(X) σ. free energy Biological Meaning: This constraint ensures that the words in W have similar melting temperatures, which allows the DNA words in W to bind under the same temperature. 91

Free Energy of a DNA Word [Breslauer et al. 1986] Free Energy of X = x 1 x 2... x L : FE(X) = a constant + sum of pair-wise energies Γ( x, x2) + Γ( x2, x3) + + Γ( x L 1, x 1 L ) 92

Recap: Problem Formulation for DNA Self-Assembly Context: We are given some constraints on the desired words, and the alphabet DNA = {A,C,G,T}. Algorithmic Problem: Input: an integer n Output: a code W of n words of same length L: W satisfies the constraints, and L is minimized. 93

Previous Results heuristics without performance guarantees [most of the previous works] NP-hardness for some variants of the problem [Phan, Garzon 2008] randomized algorithms [Kao, Sanghi, Schweller 2005] 1. word length optimal to within a multiplicative constant 2. running time polynomial in the output size 3. satisfying the constraints with high probability 94

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can constructs a code C 1,4 of n words that satisfies constraints C 1 (k 1 ) and C 4 (k 4 ) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 4 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 95

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can construct a code C 1~8 of n DNA words that satisfies constraints C 1 (k 1 ), C 2 (k 2 ), C 3 (k 3 ), C 4 (k 4 ), C 5 (k 5 ), C 6 (k 6 ), C 7 (ϒ), C 8 (d) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 2, k 3, k 4, k 5, k 6 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 96

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can construct a code C 1~6,9 of n DNA words that satisfies constraints C 1 (k 1 ), C 2 (k 2 ), C 3 (k 3 ), C 4 (k 4 ), C 5 (k 5 ), C 6 (k 6 ), C 9 (σ) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 2, k 3, k 4, k 5, k 6 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 97

Further Research for DNA Word Design Concrete Open Problems: Our codes can satisfy only subsets of the 9 constraints, but not all the constraints at the same time. Design codes that satisfy all 9 constraints. General Research Direction: Adapt our randomized and derandomization techniques to other codeword design problems. 98

Conclusions 1. There many research possibilities for DNA selfassembly and other kinds of self-assemblies! 2. General research directions include: novel (or science-fiction-like ) applications of selfassembly (especially in Medicine) novel models for self-assembly in-vitro implementations efficient tile systems (e.g., small tile complexity) computational powers of self-assembly models fault-tolerant self-assembly (e.g., error correction) many more 99

Thank you! Any questions? 100