Combinatorial Algorithms and Computational Complexity for DNA Self-Assembly

Similar documents
Molecular Self-Assembly: Models and Algorithms

Active Tile Self Assembly:

Randomized Fast Design of Short DNA Words

Integer and Vector Multiplication by Using DNA

DNA Computing by Self Assembly. Presented by Mohammed Ashraf Ali

The Design and Fabrication of a Fully Addressable 8-tile DNA Lattice

How Does Nature Compute?

Parallel Solution to the Dominating Set Problem by Tile Assembly System

Overview of New Structures for DNA-Based Nanofabrication and Computation

NP-completeness. Chapter 34. Sergey Bereg

Strict Self-Assembly of Discrete Sierpinski Triangles

von Neumann Architecture

Solving NP-Complete Problems in the Tile Assembly Model

CPSC 506: Complexity of Computa5on

Limitations of Self-Assembly at Temperature 1

On Times to Compute Shapes in 2D Tile Self-Assembly

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

Autonomous Programmable Nanorobotic Devices Using DNAzymes

Programmable DNA Self-Assemblies for Nanoscale Organization of Ligands and Proteins

Oritatami, a model of cotranscriptional folding

CS256 Applied Theory of Computation

1 Computational Problems

On the Complexity of Graph Self-assembly in Accretive Systems

The PATS Problem: Search Methods and Reliability

Notes for Lecture Notes 2

3 rd Conference on Foundations of Nanoscience (FNANO06): Self- Assembled Architectures and Devices

CSC 373: Algorithm Design and Analysis Lecture 15

Umans Complexity Theory Lectures

Complexity Classes in Membrane Computing

Programmable Control of Nucleation for Algorithmic Self-Assembly

Finite State Transducers

Self-Assembly and Convergence Rates of Heterogeneous Reversible Growth Processes (Extended Abstract)

1. Introduction OPTIMAL TIME SELF-ASSEMBLY FOR SQUARES AND CUBES

Two Hands Are Better Than One (up to constant factors): Self-Assembly In The 2HAM vs. atam

Randomness and non-uniformity

1 Alphabets and Languages

Theoretical Computer Science. A comparison of graph-theoretic DNA hybridization models

UC Berkeley CS 170: Efficient Algorithms and Intractable Problems Handout 22 Lecturer: David Wagner April 24, Notes 22 for CS 170

Using DNA to Solve NP-Complete Problems. Richard J. Lipton y. Princeton University. Princeton, NJ 08540

CSCI 1010 Models of Computa3on. Lecture 11 Proving Languages NP-Complete

Staged Self-Assembly and. Polyomino Context-Free Grammars

There have been some notable successes

Intrinsic DNA Curvature of Double-Crossover Tiles

CS 320, Fall Dr. Geri Georg, Instructor 320 NP 1

The self-assembly of paths and squares at temperature 1

CSCI 2570 Introduction to Nanocomputing

arxiv: v1 [cs.cc] 7 Jul 2009

Self-Assembly. Lecture 2 Lecture 2 Models of Self-Assembly

Complexity Theory. Knowledge Representation and Reasoning. November 2, 2005

Further discussion of Turing machines

Computers of the Future? Moore s Law Ending in 2018?

Interdisciplinary Nanoscience Center University of Aarhus, Denmark. Design and Imaging. Assistant Professor.

CS21 Decidability and Tractability

NP-Completeness. NP-Completeness 1

Generating DNA Code Words Using Forbidding and Enforcing Systems

The Quest for Small Universal Cellular Automata Nicolas Ollinger LIP, ENS Lyon, France. 8 july 2002 / ICALP 2002 / Málaga, Spain

Harvard CS 121 and CSCI E-121 Lecture 22: The P vs. NP Question and NP-completeness

A Universal Turing Machine

A General Testability Theory: Classes, properties, complexity, and testing reductions

Unit 1A: Computational Complexity

arxiv: v1 [cs.cc] 12 Mar 2008

1 Introduction There is a long history of theoretical ideas in computer science that have led to major practical advances in experimental and applied

Automata-based Verification - III

Strict Self-Assembly of Discrete Sierpinski Triangles

An Interesting Perspective to the P versus NP Problem

CPSC 421: Tutorial #1

Essential facts about NP-completeness:

Design, Simulation, and Experimental Demonstration of Self-assembled DNA Nanostructures and Motors

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. Presenter: Brian Wongchaowart March 17, 2010

Lecture 3: Error Correcting Codes

COMPUTATIONAL COMPLEXITY

DNA-Scaffolded Self-Assembling Nano-Circuitry

Complexity, P and NP

Non-Deterministic Time

arxiv: v1 [cs.et] 19 Feb 2015

How do scientists build something so small? Materials 1 pkg of modeling materials 1 piece of butcher paper 1 set of cards 1 set of markers

Friday Four Square! Today at 4:15PM, Outside Gates

Nanoparticles, nanorods, nanowires

Autonomous DNA Walking Devices

NP-Completeness. Andreas Klappenecker. [based on slides by Prof. Welch]

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

NP-Completeness. Subhash Suri. May 15, 2018

Optimum Prefix Adders in a Comprehensive Area, Timing and Power Design Space

Constant Weight Codes: An Approach Based on Knuth s Balancing Method

The P versus NP Problem. Ker-I Ko. Stony Brook, New York

Exponential time vs probabilistic polynomial time

Parallelism and Time in Hierarchical Self-Assembly

Toward Modular Molecular Composite Nanosystems

5 3 Watson-Crick Automata with Several Runs

Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light

Theory of Computation. Theory of Computation

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

Equivalence of DFAs and NFAs

CS151 Complexity Theory. Lecture 1 April 3, 2017

NP-Complete problems

Two Hands Are Better Than One (up to constant factors): Self-Assembly In The 2HAM vs. atam

COMP Analysis of Algorithms & Data Structures

Limits of Feasibility. Example. Complexity Relationships among Models. 1. Complexity Relationships among Models

arxiv: v3 [cs.fl] 2 Jul 2018

Transcription:

Combinatorial Algorithms and Computational Complexity for DNA Self-Assembly Ming-Yang Kao Northwestern University Evanston, Illinois USA Presented at Kyoto University on December 12, 2013

Outline of the Talk 1. examples of self-assembly 2. examples of DNA self-assembly 3. a basic model for DNA self-assembly 4. combinatorial problems for DNA self-assembly use DNAs to self-assemble shapes use DNAs to self-assemble circuits design DNA sequences for DNA self-assembly If we have time 5. general research directions 2

What Is Self-Assembly? [adapted from a slide of Shinnosuke Seki] Self-assembly is a phenomenon in which complex structures emerge from simple components through local interactions with limited global control. 3

Example of Self-Assembly Self-Assembly by Magnetic Forces [http://www.math.udel.edu/meclab, 2007] 4

Example of Self-Assembly Self-Assembly of Stars into Galaxy [hubblesite.org, http://self-assembly.net, 2013] 5

Example of Self-Assembly Hydrophilic and Hydrophobic Interactions [http://staff.jccc.net/pdecell/chemistry/selfassem.html] proteins and molecules on cell membrane 6

Example of Self-Assembly Human Language Development [adapted from a slide of S. Seki] Speaking similar languages leads to being socially close. Being socially close leads to similar languages. 7

Example of Self-Assembly Robot Self-Assembly via Cellular Automata [Tuci et al., 2006] A group of robots physically connected to each other that (a) moves on rough terrain and (b) passes over a gap during an experiment in a close arena with a flat terrain. 8

Example of Self-Assembly Robot Self-Assembly -- Kilorobot Project [Self-Organizing Systems Research Group, Harvard, 2011] 9

Example of Self-Assembly Crystal Formation [http://web.mit.edu/lms/www, Zhang, 2001] 10

Example of Self-Assembly Insulation around Copper Wiring [http://www.technologyreview.com/biztech, 2007] This microprocessor cross section shows empty space in between the chip s copper wiring. Wires are usually insulated with a glasslike material, but IBM has used selfassembly techniques, which can be employed in chip-making facilities, to create air gaps that insulatethe wires. Credit: IBM 11

Example of Self-Assembly Self-Assembly of Hot Dog Slices [bradley.bradley.edu/~campbell/demopix6.html, 2013] Left: Cutting hot dogs into slices. Right: Floating them in a pan of water. 12

Example of Self-Assembly Self-Assembly of Lego Pieces [http://www.math.udel.edu/meclab, 2007] LEGO Bricks + Water + Capillary Forces 13

Example of Self-Assembly DNA Brick Structures Analogous to LEGO Brick Structures [Ke et al., Science 2012, 338:1177-1183, Peng Yin s Lab at Harvard] 14

Message: Self-assembly is everywhere and has many kinds! Focus of This Talk: Algorithmic DNA Self-Assembly 15

Algorithmic DNA Self-Assembly Algorithms + DNA + Self-Assembly In the intersection of Nanotechnology Theoretical Computer Science 16

Algorithmic DNA Self-Assembly Nanotechnology + Theoretical Computer Science Objective: Use DNA to create nanostructures. Methodology: Step 1: Encode a program into DNAs. Step 2: Execute the program to guide the DNAs to self-assemble into desired nanostructures. How to encode a program: DNA has 4 bases, A, C, G, T. How to execute a program: A T and C G. When DNAs bind, the binding executes the program. There are other possibilities for the above! 17

Types of Algorithmic DNA Self-Assembly 1 dimensional 2 dimensional 3 dimensional more focus of this talk 18

DNA Tiles -- Basic Unit of 2D Self-Assembly TILE encode a program execute the program G C A T C G C G T A G C 19

Algorithmic DNA Self-Assembly Program = Tiles + Lab Steps Output = Shape + Pattern 20

Examples of DNA Tiles [Holliday, 1964] exchange of genetic information in yeast aaa a 21

Examples of DNA Tiles aaa a TILE aaa a 22

Examples of DNA Tiles [Reif s Group, Duke University] A G A T C G A C T C T A G C T G T A C C G C A T A T G G C G T A A T A G C T A T C G T G A T C G G A A C T A G C C T G C T T G A C C C G A A C T G G A T A G C T A T C G A T A G C T A T C G A C T A G C C T A C T A G C C T C T A G C C G T G A T C G G C A G T A C A C A T G T A T A G C T A T C G A T A G C T A T C G T G A A T A G C A C T T A T C G A C T A G C C T A C T A G C C T A T A G C T A T C G A T A G C T A T C G G A C A G C G G T C T T C C A 9 DNA sequences T T A G T 23

Examples of DNA Tiles [Park, Pistol, Ahn, Reif, Lebeck, Dwyer, and LaBean, 2006] 24

Examples of DNA Tiles [Winfree s Group, Cal Tech] 25

Examples of DNA Tiles [Sierpinski Triangle, Rothemund, Papadakis, Winfree, 2004] 26

Recap: Algorithmic DNA Self-Assembly Objectives and Methodologies: 1. Use DNA to compute. 2. Use computation to guide DNAs to selfassemble. Next, we will see 1. some examples and 2. some basic models for such computation. 27

Self-Assembly for Binary Counters [Winfree, 2000]

Examples of DNA Tiles [Winfree s Group, Cal Tech] 29

Self-Assembly for Binary Counters [Barish, Rothemund, Winfree, 2005] 30

2D Self-Assembly for Turing Machines [Winfree, Yang, and Seeman, 1998]

1D Self-Assembly for Regular Languages [Winfree, Yang, and Seeman, 1998]

Tree Self-Assembly for Context-Free Languages [Winfree, Yang, and Seeman, 1998]

Example of Self-Assembly DNA Brick Structures Analogous to LEGO Brick Structures [Ke et al., Science 2012, 338:1177-1183, Peng Yin s Lab at Harvard] 34

A Basic Model of DNA Self-Assembly [Rothemund and Winfree, STOC 2000] tile system: (T, s, G, t) T: tile set s: seed tile r {,,... } b y y w, b b g p r r r G: glue function G : {0,1,..., t} t : temperature, positive integer 35

T = S x Example: Build a Square 1. positive strength between same glues 2. zero strength between distinct glues 3. start with the seed tile 4. add one tile at a time 5. bind if total strength is at least t a c 6. order must not affect final shape and pattern b d G(, ) = 2 G(, ) = 2 G(, ) = 2 G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 36

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 S 37

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 S a 38

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 temperature t = 2 c S a 39

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c S a 40

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c S a b 41

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c x S a b 42

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d temperature t = 2 c x x S a b 43

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d x temperature t = 2 c x x S a b 44

Example: Build a Square T = S a b G(, ) = 2 G(, ) = 2 G(, ) = 2 x c d G(, ) = 2 G(, ) = 1 G(, ) = 1 G(, ) = 1 d x x temperature t = 2 c x x S a b 45

Observations size of the 3 x 3 square = 9 cells number of distinct tiles used = 6 Question #1: To assemble an n x n square, how many distinct tiles do we need? Answer #1: at most n 2 distinct tiles. Question #2: What is the smallest number of distinct tiles that we need? Answer #2:??? 46

Example of Combinatorial Problems Tile Complexity for Shapes Input: a connected shape S Output: a minimum number of tiles that selfassembles S. 47

Tile Complexity of Squares Theorem: (Adleman et al. 2001) 1. An n x n square can be self-assembled by Θ(log n/log log n) distinct tiles at temperature 2. 2. Such a tile set can be computed in polynomial time in n. 48

Tile Complexity of General Shapes Theorem: (Adleman et al. 2002) For general shapes, it is NP-hard to compute a minimum number of distinct tiles to self-assemble a given shape at a fixed temperature. Open Problem: polynomial-time approximation algorithms with good approximation ratios 49

Tile Complexity of Squares Question: Can we do better than Theta(log n/log log n) for squares? Answer: Yes, if we adjust the temperature. 50

Temperature Programming the Case of Squares Theorem: [Kao, Schweller 2006] We can selfassemble a n x n square using O(1) tiles and adjusting the temperature O(log n) times using O(1) different temperatures. Intuition: Adjusting temperature is a form of encoding information and programs into selfassembly. 51

Temperature Programming for General Shapes Theorem: [Summers 2009] There is a set of O(1) distinct tiles that can selfassemble any finite shape S by adjusting the temperatures O(kolmogorov(S)) times, using O(1) distinct temperatures, and scaling the shape S by a constant factor c, where c depends on S. Kolmogorov(S) = Kolmogorov complexity of S 52

Temperature Programming for General Shapes Theorem: [Summers 2009] There is a set of O(1) distinct tiles that can selfassemble any finite shape S by adjusting the temperatures O( S ) times, using O(1) distinct temperatures, and scaling the shape S by a constant factor 22. trade-off: scaling factor versus # of temperature adjustments 53

Why Do We Want to Assemble Shapes? There are many potential science-fictionlike applications, including the following one: producing nano-circuits 54

A Long-Range Research Goal of This Field DNA Self-Assembly for Nano-Circuits [adapted from a slide of Shinnosuke Seki] 55

How to Self-Assemble a Nano-Circuit? Possible Methodology: Step 1: Attach circuit components to DNA tiles. Step 2: DNA tiles self-assemble into a pattern. Step 3: The pattern is the desired circuit. circuit components: AND-gate, OR-gate, NOTgate, wire, etc. 56

Proof of Concept Self-Assembly for Circuit Patterns [Cook, Rothemund, and Winfree, 2003]

Proof of Concept Attaching Gold Particles to DNA Tiles [Reif s Group, Duke University] A G A T C G A C T C T A G C T G T A C C G C A T A T G G C G T A A T A G C T A T C G T G A T C G G A A C T A G C C T G C T T G A C C C G A A C T G G A T A G C T A T C G A T A G C T A T C G A C T A G C C T A C T A G C C T C T A G C C G T G A T C G G C A G T A C A C A T G T A T A G C T A T C G A T A G C T A T C G T G A A T A G C A C T T A T C G A C T A G C C T A C T A G C C T A T A G C T A T C G A T A G C T A T C G G A C A G C G G T C T T C C A T T A G T 58

Proof of Concept Attaching Gold Particles to DNA Tiles [Park, Pistol, Ahn, Reif, Lebeck, Dwyer, and LaBean, 2006] 59

A Model for Self-Assembly of Circuits Changes to the Basic Model: 1. Locations in the input shape have colors. 2. Tiles also have colors. 3. Colors correspond to circuit components. 4. The color of a tile at a location matches the color of that location. 5. L-shape seed: the assembly starts with a L-shape border rather than a single tile. self-assembly for circuits = self-assembly for color patterns 60

Self-Assembly for Circuit Patterns [Cook, Rothemund, and Winfree, 2003] component (or functionality) of a location or tile = color of that location or tile L-seed

Self-Assembly for Color Patterns 62

Self-Assembly for Color Patterns 63

Self-Assembly for Color Patterns 64

Self-Assembly for Color Patterns 65

Example of Combinatorial Problems The PATS Problem (Patterned Self-Assembly Tile Synthesis) Input: a color pattern P of a rectangular shape. Output: a minimum number of tiles that selfassembles P starting from an L-shape seed. 66

Computational Complexity of PATS Theorem: (Czeizler, Popa 2012) If the input pattern may have an arbitrary number of colors, PATS is NPhard. Theorem: (Seki 2013) For 60-color patterns, PATS is NP-hard. 67

Computational Complexity for PATS Theorem: (Johnsen, Kao, Seki, in ISAAC 2013) 1. For 29-color patterns, PATS is NP-hard. 2. Moreover, approximation of the minimum number of tiles within a factor of 47/46 is NPhard as well. Proof: 1. Reduction from Subset Sum. 2. Case analysis based on 118 color patterns. 68

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 69

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 70

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 71

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 72

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 73

Some Tiles and Patterns in Proof of NP-Hardness of 29-Color PATS 74

Further Work for Self-Assembly of Circuits Work in Progress: For 11-color patterns, PATS is NP-hard. (Johnsen and Seki) Work in Progress: For 4-color patterns, PATS is NP-hard. (Cal Tech, computer-generated case analysis) Conjecture: For 2-color patterns, PATS is NP-hard. Fact: For 1-color patterns, PATS only needs 1 tile in addition to the L-seed. Open Problem: good approximation algorithms for PATS final objective 75

Key Steps in Design of Tile Self-Assembly 1. Specify a shape or a pattern. 2. Design a tile system to self-assemble the shape or pattern. 3. Design DNA words (i.e., DNA sequences) to form the tiles. 76

DNA Tiles TILE G C A T C G DNA words C G T A G C 77

Applications of DNA Word Design Information Storage at Molecular Level Molecular Bar Codes DNA Arrays Algorithmic DNA Self-Assembly focus of this talk. 78

Example of Combinatorial Problems DNA Word Design Context: We are given some constraints on the desired words, and the alphabet DNA = {A,C,G,T}. Algorithmic Problem: Input: an integer n Output: a code W of n words of same length L: W satisfies the constraints, and L is minimized. 79

Two Types of Constraints Binding Constraints: Such constraints are heuristics that help maximize the probability that each word X in W only binds with its Watson-Crick complement X C. X = A G T T A G C X C = T C A A T C G Thermodynamic Constraints: Such constraints are heuristics that help maximize the probability that all words in W have similar thermodynamic properties (e.g., melting temperature). 80

9 Constraints Considered for Our Work All 9 constraints are taken from the literature. Binding Constraints: 1. Basic Hamming Constraint C 1 (k 1 ) 2. Reverse Complementary Constraint C 2 (k 2 ) 3. Self Complementary Constraint C 3 (k 3 ) 4. Shifting Hamming Constraint C 4 (k 4 ) 5. Shifting Reverse Complementary Constraint C 5 (k 5 ) 6. Shifting Self Complementary Constraint C 6 (k 6 ) 7. Consecutive Base Constraint C 8 (d) Thermodynamic Constraints: 1. GC Content Constraint C 7 (ϒ) 2. Free Energy Constraint C 9 (σ) 81

Binding Constraints and Hamming Distance Ideal Case for Binding: Two DNA words X and Y bind only when X and Y are Watson-Crick complementary. X = A G T T A G C Y = T C A A T C G Non-Ideal Case for Binding: X may bind with Y even if X and Y are not 100% complementary. Binding Constraints: To help prevent non-matched binding, we want a large Hamming distance between X and Y C. 82

Basic Hamming Constraint C 1 (k 1 ) Mathematical Condition: For all distinct words Y and X in W, H(Y, X) k 1. Y X Hamming distance Biological Meaning: This constraint helps prevent X from binding with the complement of Y. 83

Reverse Complementary Constraint C 2 (k 2 ) Mathematical Condition: For all distinct words Y and X in W, H(Y,X RC ) k 2. Y Y X 1 2...YL-1Y L C L X C L-1...X C 2 X C 1 X R X = reverse of X = X 1 X 2 X L X R = X L X 2 X 1 Biological Meaning: This constraint helps prevent Y from binding with the reverse of X. 84

Self Complementary Constraint C 3 (k 3 ) Same as C 2 (k 2 ) but with X = Y. Mathematical Condition: For each word Y in W, H(Y, Y RC ) k 3. Biological Meaning: This constraint prevents a word Y from binding with the reverse of itself. 85

Shifting Hamming Constraint C 4 (k 4 ) Mathematical Condition: For all distinct words Y and X in W, H (Y [1..i],X[(L i + 1)..L]) k 4 (L i) for all L i L k 4. Y X Biological Meaning: This constraint helps prevent a prefix of Y from binding with the complement of a suffix of X. 86

Shifting Reverse Complementary Constraint C 5 (k 5 ) Mathematical Condition: For all distinct words Y and X in W, H(Y[1..i], X[1..i] RC ) k 5 (L i), and H(Y[(L i + 1)..L],X [(L i + 1)..L] RC k 5 (L i) for all L i L k 5. X C L X C L-1 Y Y...X 1 2...YL-1Y L C 2 X C 1 Y Y 1 2...YL-1Y L X C L X C L-1...X Biological Meaning: This constraint helps prevent a prefix of Y from binding with the reverse of a prefix of X and prevent a suffix of Y from binding with the reverse of a suffix of X. 87 C 2 X C 1

Shifting Self Complementary Constraint C 6 (k 6 ) Same as C 5 (k 5 ) but with X = Y. Mathematical Condition: For each word Y in W, H(Y [1..i], Y[1..i] RC ) k 6 (L i), and H(Y [(L i + 1)..L, Y [(L i + 1)..L] RC ) k 6 (L i) for all L i L k 6. Y Y 1 2...YL-1Y L Y Y Y C L Y C C L-1 1 2...YL-1Y L Y C Y...Y...Y L L-1 2 1 Biological Meaning: This constraint helps prevent a prefix of Y from binding with its reverse and prevent a suffix of Y from binding with its reverse. 88 C C 2 Y Y C C 1

GC Content Constraint C 7 (ϒ) Mathematical Condition: ϒ percentage of the bases in any word Y in W are either G or C. AGCTCCCCCCTTAAA GGTCGCAATTTTGGC Biological Meaning: The GC content affects the thermodynamic properties of a word. Having the same ratio of GC content for all the words helps ensure that the words in W have similar thermodynamic characteristics. 89

Consecutive Base Constraint C 8 (d) Mathematical Condition: No word has more than d consecutive bases. A A A A A A A G G G G G G G G T T T T T T T C C C C C C C C AGCTCCCCCCTTAAA E.g., two perfectly complementary words bind at wrong positions. Biological Meaning: In some applications, consecutive occurrences of the same base increase binding errors. 90

Free Energy Constraint C 9 (σ) Mathematical Condition: For all words Y and X in W, FE(Y ) FE(X) σ. free energy Biological Meaning: This constraint ensures that the words in W have similar melting temperatures, which allows the DNA words in W to bind under the same temperature. 91

Free Energy of a DNA Word [Breslauer et al. 1986] Free Energy of X = x 1 x 2... x L : FE(X) = a constant + sum of pair-wise energies Γ( x, x2) + Γ( x2, x3) + + Γ( x L 1, x 1 L ) 92

Recap: Problem Formulation for DNA Self-Assembly Context: We are given some constraints on the desired words, and the alphabet DNA = {A,C,G,T}. Algorithmic Problem: Input: an integer n Output: a code W of n words of same length L: W satisfies the constraints, and L is minimized. 93

Previous Results heuristics without performance guarantees [most of the previous works] NP-hardness for some variants of the problem [Phan, Garzon 2008] randomized algorithms [Kao, Sanghi, Schweller 2005] 1. word length optimal to within a multiplicative constant 2. running time polynomial in the output size 3. satisfying the constraints with high probability 94

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can constructs a code C 1,4 of n words that satisfies constraints C 1 (k 1 ) and C 4 (k 4 ) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 4 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 95

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can construct a code C 1~8 of n DNA words that satisfies constraints C 1 (k 1 ), C 2 (k 2 ), C 3 (k 3 ), C 4 (k 4 ), C 5 (k 5 ), C 6 (k 6 ), C 7 (ϒ), C 8 (d) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 2, k 3, k 4, k 5, k 6 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 96

Approximation Algorithms for DNA Word Design Theorem: (Kao, Leung, Sung, Zhang, 2010) We can construct a code C 1~6,9 of n DNA words that satisfies constraints C 1 (k 1 ), C 2 (k 2 ), C 3 (k 3 ), C 4 (k 4 ), C 5 (k 5 ), C 6 (k 6 ), C 9 (σ) such that 1. the word length L is optimal to within a multiplicative constant; i.e., L = Theta(k + log n), where k = max {k 1, k 2, k 3, k 4, k 5, k 6 }, 2. the time complexity is polynomial in the output size, and 3. the algorithm is deterministic. 97

Further Research for DNA Word Design Concrete Open Problems: Our codes can satisfy only subsets of the 9 constraints, but not all the constraints at the same time. Design codes that satisfy all 9 constraints. General Research Direction: Adapt our randomized and derandomization techniques to other codeword design problems. 98

Conclusions 1. There many research possibilities for DNA selfassembly and other kinds of self-assemblies! 2. General research directions include: novel (or science-fiction-like ) applications of selfassembly (especially in Medicine) novel models for self-assembly in-vitro implementations efficient tile systems (e.g., small tile complexity) computational powers of self-assembly models fault-tolerant self-assembly (e.g., error correction) many more 99

Thank you! Any questions? 100