Discrete tomography with two directions

Similar documents
Mean-field Description of the Structure and Tension of Curved Fluid Interfaces. Joris Kuipers

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle holds various files of this Leiden University dissertation

1 Basic Combinatorics

Collective motor dynamics in membrane transport in vitro. Paige M. Shaklee

Cover Page. The handle holds various files of this Leiden University dissertation

Ahlswede Khachatrian Theorems: Weighted, Infinite, and Hamming

Bichain graphs: geometric model and universal graphs

Introduction to Real Analysis Alternative Chapter 1

Central Groupoids, Central Digraphs, and Zero-One Matrices A Satisfying A 2 = J

Photo-CIDNP MAS NMR Studies on Photosynthetic Reaction Centers

arxiv: v1 [math.co] 28 Oct 2016

12-neighbour packings of unit balls in E 3

Sergey Norin Department of Mathematics and Statistics McGill University Montreal, Quebec H3A 2K6, Canada. and

Sydney University Mathematical Society Problems Competition Solutions.

UvA-DARE (Digital Academic Repository) Phenotypic variation in plants Lauss, K. Link to publication

SUMS PROBLEM COMPETITION, 2000

Citation for published version (APA): Weber, B. A. (2017). Sliding friction: From microscopic contacts to Amontons law

Isomorphisms between pattern classes

Advanced Combinatorial Optimization September 22, Lecture 4

ON COST MATRICES WITH TWO AND THREE DISTINCT VALUES OF HAMILTONIAN PATHS AND CYCLES

A PLANAR SOBOLEV EXTENSION THEOREM FOR PIECEWISE LINEAR HOMEOMORPHISMS

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

4 a b 1 1 c 1 d 3 e 2 f g 6 h i j k 7 l m n o 3 p q 5 r 2 s 4 t 3 3 u v 2

Citation for published version (APA): Hin, V. (2017). Ontogenesis: Eco-evolutionary perspective on life history complexity.

2016 EF Exam Texas A&M High School Students Contest Solutions October 22, 2016

The Triangle Closure is a Polyhedron

University of Groningen. Morphological design of Discrete-Time Cellular Neural Networks Brugge, Mark Harm ter

Solutions to the 74th William Lowell Putnam Mathematical Competition Saturday, December 7, 2013

Coherent X-ray scattering of charge order dynamics and phase separation in titanates Shi, B.

COMPLEX ANALYSIS Spring 2014

Definitions. Notations. Injective, Surjective and Bijective. Divides. Cartesian Product. Relations. Equivalence Relations

Lebesgue Measure on R n

THE STRUCTURE AND PERFORMANCE OF OPTIMAL ROUTING SEQUENCES. Proefschrift

Graphs with few total dominating sets

The Chromatic Number of Ordered Graphs With Constrained Conflict Graphs

COMPLEXITY OF SHORT RECTANGLES AND PERIODICITY

8. Prime Factorization and Primary Decompositions

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Facets for Node-Capacitated Multicut Polytopes from Path-Block Cycles with Two Common Nodes

Course 212: Academic Year Section 1: Metric Spaces

Set, functions and Euclidean space. Seungjin Han

Solving a linear equation in a set of integers II

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

MATHS 730 FC Lecture Notes March 5, Introduction

SMT 2013 Power Round Solutions February 2, 2013

Week 15-16: Combinatorial Design

DISTINGUISHING PARTITIONS AND ASYMMETRIC UNIFORM HYPERGRAPHS

Root systems and optimal block designs

A Review of Linear Programming

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

Citation for published version (APA): Hoekstra, S. (2005). Atom Trap Trace Analysis of Calcium Isotopes s.n.

Math 341: Convex Geometry. Xi Chen

CHAPTER 2: CONVEX SETS AND CONCAVE FUNCTIONS. W. Erwin Diewert January 31, 2008.

Standard forms for writing numbers

ALGEBRA. 1. Some elementary number theory 1.1. Primes and divisibility. We denote the collection of integers

University of Groningen. Taking topological insulators for a spin de Vries, Eric Kornelis

Perfect matchings in highly cyclically connected regular graphs

The Early and Middle Pleistocene Archaeological Record of Greece. current status and future prospects

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

Containment restrictions

Abstract. 2. We construct several transcendental numbers.

Clairvoyant scheduling of random walks

BRITISH COLUMBIA SECONDARY SCHOOL MATHEMATICS CONTEST,

Sets and Functions. (As we will see, in describing a set the order in which elements are listed is irrelevant).

Math 117: Topology of the Real Numbers

The Strong Largeur d Arborescence

WORKSHEET ON NUMBERS, MATH 215 FALL. We start our study of numbers with the integers: N = {1, 2, 3,...}

Stable periodic billiard paths in obtuse isosceles triangles

Introduction to Dynamical Systems

ADVANCE TOPICS IN ANALYSIS - REAL. 8 September September 2011

Generating p-extremal graphs

Using Laplacian Eigenvalues and Eigenvectors in the Analysis of Frequency Assignment Problems

Unmixed Graphs that are Domains

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Linear Algebra I. Ronald van Luijk, 2015

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Even Cycles in Hypergraphs.

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

Some Background Material

IRREDUCIBLE REPRESENTATIONS OF SEMISIMPLE LIE ALGEBRAS. Contents

ON KRONECKER PRODUCTS OF CHARACTERS OF THE SYMMETRIC GROUPS WITH FEW COMPONENTS

Strongly chordal and chordal bipartite graphs are sandwich monotone

On the mean connected induced subgraph order of cographs

Rectangles as Sums of Squares.

5 Quiver Representations

Spring 2014 Advanced Probability Overview. Lecture Notes Set 1: Course Overview, σ-fields, and Measures

Shortest paths with negative lengths

Mathematical Reasoning & Proofs

On Projective Planes

THE CAPORASO-HARRIS FORMULA AND PLANE RELATIVE GROMOV-WITTEN INVARIANTS IN TROPICAL GEOMETRY

Topological properties of Z p and Q p and Euclidean models

Packing cycles with modularity constraints

The chromatic number of ordered graphs with constrained conflict graphs

COMPLEX NUMBERS WITH BOUNDED PARTIAL QUOTIENTS

The decomposability of simple orthogonal arrays on 3 symbols having t + 1 rows and strength t

Transcription:

Discrete tomography with two directions Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van Rector Magnificus prof.mr. P.F. van der Heijden, volgens besluit van het College voor Promoties te verdedigen op dinsdag 0 september 0 klokke 5:00 uur door Birgit Ellen van Dalen geboren te s-gravenhage in 984

Samenstelling van de promotiecommissie Promotoren prof.dr. R. Tijdeman prof.dr. K.J. Batenburg (Centrum Wiskunde & Informatica, Universiteit Antwerpen) Overige leden prof.dr. S.J. Edixhoven prof.dr. H.W. Lenstra, Jr. prof.dr. A. Schrijver (Centrum Wiskunde & Informatica) prof.dr. P. Stevenhagen

Discrete tomography with two directions Birgit van Dalen

ISBN/EAN 97894608803 c Birgit van Dalen, Leiden, 0 bevandalen@gmail.com Typeset using L A TEX Printed by Gildeprint Drukkerijen, Enschede Cover design by Ad van den Broek

Contents Introduction. Discrete tomography............................ Applications.................................3 Two directions.............................. 3.4 Stability.................................. 4.5 Difference between reconstructions................... 5.6 Boundary length............................. 7.7 Shape of binary images......................... 9.8 Overview................................. 9 Stability results for uniquely determined sets. Introduction................................. Notation and statement of the problems.................3 Staircases................................. 4.4 A new bound for the disjoint case.................... 7.5 Two bounds for general α........................ 9

vi.6 Generalisation to unequal sizes..................... 9 3 Upper bounds for the difference between reconstructions 3 3. Introduction................................ 3 3. Notation.................................. 3 3.3 Some lemmas............................... 33 3.4 Uniquely determined neighbours.................... 35 3.5 Sets with equal line sums........................ 38 3.6 Sets with different line sums....................... 4 3.7 Concluding remarks........................... 46 4 A lower bound on the largest possible difference 47 4. Introduction................................ 47 4. Definitions and notation......................... 48 4.3 Main result................................ 49 4.4 Proof.................................... 50 4.5 Example.................................. 53 5 Minimal boundary length of a reconstruction 57 5. Introduction................................ 57 5. Definitions and notation......................... 59 5.3 The main theorem............................ 60 5.4 Some examples and a corollary..................... 63 5.5 An extension............................... 67

vii 6 Reconstructions with small boundary 8 6. Introduction................................ 8 6. Definitions and notation......................... 8 6.3 The construction............................. 83 6.4 Boundary length of the constructed solution.............. 90 6.5 Examples................................. 94 6.6 Generalising the results for arbitrary c and r............ 97 7 Boundary and shape of binary images 99 7. Introduction................................ 99 7. Definitions and notation......................... 00 7.3 Largest connected component...................... 0 7.4 Balls of ones in the image........................ 08 Bibliography 5 Samenvatting 9 Binaire plaatjes en Japanse puzzels................... 9 Onoplosbare puzzels........................... 3 Saaie puzzels............................... 4 4 Puzzels met meerdere oplossingen.................... 5 5 Rand.................................... 8 Curriculum Vitae 3

viii

CHAPTER Introduction In this chapter we introduce the topic of discrete tomography and explain the basic concepts. We then describe the part of discrete tomography that this thesis is focused on. We discuss the problems that are considered as well as the main results of the thesis.. Discrete tomography Let F be a finite subset of Z. If a point of Z is an element of F, we say that the point has value one, or that there is a one in this point. If on the other hand a point of Z is not an element of F, we say that the point has value zero, or that there is a zero in this point. In this way we can view the set F as a function that attaches a value from {0, } to every point in Z, where only finitely many points have value one. We also call this a binary image. Rather than considering the whole of Z, we usually restrict the image to a rectangle containing all points with value one. For integers a and b we can consider a line in the direction (a, b), that is, all points (x, y) Z satisfying ay bx = h for a certain integer h. We can count the number of elements of F on this line; this is called the line sum of F along this line. We can take all lines in the direction (a, b) that pass through integer points by varying h over Z. The infinite sequence of line sums we find in this way we call the projection of the binary image in the direction (a, b). Instead of considering all possible lines in

Chapter Introduction the direction (a, b), we usually consider a finite set of consecutive lines that contains all lines that pass through points of F. Then the projection becomes a finite sequence of line sums containing all the nonzero line sums. Given a binary image, the projection in any lattice direction is of course determined. If on the other hand the image is unknown, but the projections in several directions are given, it is not so clear whether the image is determined by these projections, or even whether there exists an image satisfying these projections. The problem of reconstructing binary images from given projections in several lattice directions is what discrete tomography is concerned with. An image satisfying given projections is called a reconstruction. There may be more than one reconstruction corresponding to given projections, or none at all. If there is exactly one reconstruction, then we say that the projections uniquely determine the image. The term discrete tomography is also used for a wider scope of reconstruction problems, such as reconstructing a binary image on R rather than Z. Then the domain of the function is no longer discrete, but the possible values of the function form a discrete set, which is why this is still called discrete tomography. And even if we restrict ourselves to functions on lattices, there are still some variations possible. For example, one may consider a function on Z that has a (small) discrete set of values, rather than just {0, }. It is also possible to do discrete tomography in more dimensions, using Z k rather than Z, or on a hexagonal grid rather than a square grid. A complete overview of discrete tomography is given in [4].. Applications The most direct application of discrete tomography is the reconstruction of nanocrystals at atomic resolution. In such a crystal, the atoms usually lie on a regular grid, and only a few types of atoms occur. By electron microscopy, two-dimensional projection images are acquired from various angles by tilting the sample. Recently, new algorithms have been developed that allow a fast and accurate reconstruction from a small number of projection images [7, 7]. There are also some applications in medical imaging [5, 5]. However, much more widely used in medical imaging (among other fields) is the technique of continuous or computerised tomography [3]. Here images can have values in a continuous set rather than a discrete set, and the object that is being reconstructed does not have a lattice structure, but a continuous structure. For the reconstruction of such images projections in very many directions are needed. The most well-known application of this type of tomography is the CT-scan, where CT stands for computerised tomography.

.3 Two directions 3 Further applications of discrete tomography are for example in nuclear science [9, 0] and materials science [7]..3 Two directions The first discrete tomography problems arose in the literature in 957, when Ryser published a paper on reconstructing binary images from their projections in the horizontal and vertical directions [4]. He was the first to describe an algorithm to do this, and he gave sufficient and necessary conditions on the projections for a reconstruction to exist. r = 8 r = 6 r 3 = 3 r 4 = 3 r 5 = r 6 = c8 = c7 = c6 = c5 = c4 = c3 = 4 c = 5 c = 6 Figure.: A uniquely determined set. The row and column sums are indicated. Let (r, r,..., r m ) be the sequence of row sums (the horizontal projection) and let (c, c,..., c n ) be the sequence of column sums (the vertical projection). We must have m i= r i = n j= c j, since both sums are equal to the number of elements of the binary image. As long as we are only interested in the number of possible reconstructions (and not in special properties of those reconstructions) we can without loss of generality order the rows and columns such that r r... r m and c c... c n. For i =,,..., m define b i = #{j : c j i}. Ryser proved that there exists a set F with those row and column sums if and only if k b i i= k r i for k =,,..., m. He also showed that the reconstruction is unique if and only if i= or, equivalently, k k b i = r i for k =,,..., m, i= i= b i = r i for i =,,..., m.

4 Chapter Introduction Such a uniquely determined image has a particular shape [6]. After all, r = b = #{j : c j } means that for every column j with c j there must be an element of F in (, j). And then r = b = #{j : c j } implies that for every column j with c j there must be an element of F in (, j), since any column j with c j = contains only one element of F, which is (, j). By continuing this argument, we find that (i, j) F if and only if c j i. This means that in row i the elements of F are precisely the points (i, ), (i, ),..., (i, r i ); in column j the elements of F are precisely the points (, j), (, j),..., (c j, j). See Figure. for an example of a uniquely determined set. Unfortunately, in discrete tomography with three or more directions such nice properties do not exist. The problem of deciding whether an image is uniquely determined, given projections in three or more directions, is NP-hard. The same holds for the problem of reconstructing an image from its projections in three or more directions []. The research in this thesis concerns only discrete tomography in two directions, the horizontal and vertical directions. In the remainder of this chapter we will therefore always use discrete tomography with only horizontal and vertical line sums, unless explicitly mentioned otherwise..4 Stability Suppose line sums that uniquely determine an image are given. If we slightly tweak those line sums, say by adding to a few row sums and subtracting from exactly as many other row sums, then the resulting line sums may no longer uniquely determine an image. A question that naturally arises from this is: do the reconstructions of the new line sums still look a lot like the original, uniquely determined image, or is it possible that an image satisfying the new line sums is completely different from the original image? This concerns what we call stability: the more the reconstructions from the new line sums have in common with the original image, the more stable the original image is. In the case of three or more directions Alpers et al. showed that there can exist two images, both uniquely determined by their line sums, that are disjoint but have almost the same line sums [, 3]. So in the case of three or more directions, even uniquely determined images are highly unstable. However, this does not hold for discrete tomography with two directions.

.5 Difference between reconstructions 5 Consider given column sums C = (c, c,..., c n ), and define B = (b, b,..., b m ) as b i = #{j : c j i} for i m. We have seen in the previous section that the row sums B and column sums C uniquely determine an image F. Now suppose we have slightly different row sums R = (r, r,..., r m ), such that there exists at least one binary image F with row sums R and column sums C. Let N = n j= c j. Furthermore define α = m r i b i. Note that α is an integer, since α is congruent to m m m (r i + b i ) = r i + b i = N 0 mod. i= i= i= i= The parameter α measures the difference in the row sums of F and F. The stability question now translates into: can it happen that the symmetric difference F F is large (compared to N, the number of elements of F ), while α is small? Alpers et al. [, ] proved two results related to this question. They showed that if F F =, then N α. So if F and F are disjoint, then α must be large compared to N. On the other hand, they considered the case α = and showed that F F 8N +. In Chapter of this thesis we consider the stability problem for general α. We generalise the above bound to F F α 8N + α. We also prove a different bound. Write p = F F, then F F α + (α + p) log(α + p). By using this bound with p = 0, we can derive that if F and F are disjoint, then N α( + log α), which improves the bound of Alpers et al. for disjoint F and F..5 Difference between reconstructions Another interesting question, related to stability, is how much two reconstructions from the same projections can possibly differ. We already know that there exist

6 Chapter Introduction images that are uniquely determined. On the other hand, it is not so hard to find images that are disjoint, but have the same line sums. See Figure.(a) for the smallest example and Figure.(b) for a more complex example. But perhaps it is possible to define a collection of almost uniquely determined images of which any two reconstructions always must have large intersection? (a) 3 (b) 3 3 Figure.: Each picture shows two disjoint sets with the same line sums. One set consists of the white points, the other set consists of the black points. In Chapter 3 we consider this question. First we define a parameter that indicates in some sense how close an image is to being uniquely determined. For this we use the parameter α that we introduced before. As we have seen in the previous section, α measures the distance between a given set F and a given uniquely determined set F. For a fixed F we can characterise the sets F that yield the smallest α, and the α corresponding to such a set F is the one we will use. We study the difference between two sets with the same line sums and small α, and we prove that this difference is bounded from above, using the results from Chapter. We also indicate a subset of points that must contain a sizeable part of any reconstruction. On the other hand, we show that α must be large if there exist two disjoint reconstructions. And finally, we generalise everything to reconstructions from different sets of row and column sums. In Chapter 4 we consider the complementary problem: given line sums, find two reconstructions that are as different as possible. Again the parameter α plays an important role, and we show constructively that if α (that is, if the projections do not uniquely determine the image) there exist two reconstructions that have a symmetric difference of at least α +.

.6 Boundary length 7.6 Boundary length Rather than viewing a binary image as consisting of points in Z that each have value zero or one, we can also view a binary image as consisting of pixels (cells of by ) that each are white or black. See also Figure.3. Now there is a natural way to define the boundary of the image: it consists precisely of all the line segments that separate black cells from white cells. Equivalently, the boundary is the set of pairs of points (i, j) and (i, j ) in Z such that the points are adjacent, that is: i = i and j j =, or i i = and j = j ; (i, j) F and (i, j ) F. The length of the boundary is the number of pairs of points in this set. 3 5 5 4 5 4 7 3 5 6 6 4 4 5 (a) The image is represented by the white points. 3 5 5 4 5 4 7 3 5 6 6 4 4 5 (b) The image is represented by the grey cells. The length of the boundary of this image is 6. Figure.3: The same binary image represented in two different ways. The numbers indicate the row and column sums. Recall from Section.3 the special shape of a uniquely determined set with monotone row and column sums. In every row and columns all the points with value one (or the black cells) are connected, so each row and each column with a nonzero line sum contributes to the length of the boundary. So if there are m nonzero row sums and n nonzero column sums, then the total length of the boundary is m + n. This is obviously the smallest possible length of the boundary of any set with the same number of nonzero row sums and nonzero column sums. This minimum is not only attained for uniquely determined sets with monotone line sums. There are also other sets that have this property. In general a set with m

8 Chapter Introduction nonzero row sums, n nonzero column sums and a boundary of length m + n is called hv-convex. See Figure.4 for an example of an hv-convex set and another set (not hv-convex) that have the same line sums (so this hv-convex set is not uniquely determined). Deciding whether there exists an hv-convex reconstruction for given row and columns sums, is NP-complete [8] and hence it is also NP-complete to decide whether there exists a reconstruction with boundary length equal to m + n. 3 3 4 (a) This image is hv-convex. The length of the boundary is 4. 3 3 4 (b) This image is not hv-convex. The length of the boundary is 34. Figure.4: Two binary images with the same line sums. However, that does not mean that it is always hard to decide from the line sums whether the boundary can have length m + n or not. There exist arguments that can be used in part of the cases to prove easily that a boundary of length m + n is impossible. Suppose for example that we have 0 columns with nonzero sum, and that the first three row sums are (in that order) 0, and 0. Then all columns have black cell rows and 3, while only two columns have a black cell in row. Hence it is certain that there are at most two columns in which the black cells are connected. The other eight columns must contribute at least 4 each to the length of the boundary, so the length of the boundary must be at least m + + 8 4. In Chapter 5 we generalise this principle to find a new lower bound on the length of the boundary, depending not only on m and n but on all row and column sums. In many cases our bound gives a better result than the straightforward lower bound m + n. In Chapter 6 we consider the complementary problem: given line sums, can you construct an image that satisfies these line sums and has relatively small boundary? Here we restrict ourselves to the case that the line sums are monotone. In this chapter α makes another appearance. Above we had already seen that when a set is uniquely determined by its line sums (that is equivalent with α = 0) the length of the boundary is equal to m + n. One of the main results of this chapter is a generalisation of this: when for the row and column sums we have n = r r... r m and m = c c... c n, and the line sums are consistent, then there exists a reconstruction for which the length of the boundary is at most m+n+4α.

.7 Shape of binary images 9.7 Shape of binary images In Chapter 7 we study the connection between the length of the boundary, the number of black cells, and the general shape of a binary image. Intuitively, it seems clear that when the number of black cells is large, but the boundary is small, the black cells must form some solid, roundish object. In this chapter, we will make this more precise. Suppose we are given the length of the boundary and the number of black cells of an unknown binary image. We study the following question: what is the minimal size of the largest connected component in this image? Here we use 4-adjacency [] to define connected; that is, two cells are adjacent if they share an edge (and not just a vertex). We can define the distance of a black cell to the boundary as follows: a black cell has distance 0 to the boundary if it is adjacent to a white cell, and it has distance k + to the boundary if k is the minimal distance to the boundary of the cells it is adjacent to. This distance function is also called the city block distance [3]. This leads to the second question we are interested in: what is the largest distance to the boundary that must occur in the image? A different way to phrase this: what is the minimal size of the largest ball of black cells that is contained in the image? We derive results about this question both in the case that the connected components are all simply connected (that is, they do not have any holes []) and in the general case. Note that this chapter is only about properties of binary images, and discrete tomography plays no role here..8 Overview In Chapter we prove new stability results for the reconstruction of binary images from their horizontal and vertical projections. We consider an image that is uniquely determined by its projections and possible reconstructions from slightly different projections. We show that for a given difference in the projections, the reconstruction can only be disjoint from the original image if the size of the image is not too large. We also prove an upper bound on the size of the image given the error in the projections and the size of the intersection between the image and the reconstruction. In Chapter 3 we consider different reconstructions from the same horizontal and vertical projections. We present a condition that the projections must necessarily satisfy when there exist two disjoint reconstructions from those projections. More

0 Chapter Introduction generally, we derive an upper bound on the symmetric difference of two reconstructions from the same projections. We also consider two reconstructions from two different sets of projections and prove an upper bound on the symmetric difference in this case. In Chapter 4 we prove constructively that if there exists more than one reconstruction from given horizontal and vertical projections, then there exist two reconstructions that have a symmetric difference of at least α +. Here α is a parameter depending on the line sums and indicating how close (in some sense) the image is to being uniquely determined. In Chapter 5 we study the following question: for given horizontal and vertical projections, what is the smallest length of the boundary that a reconstruction from those projections can have? We prove a new lower bound that, in contrast to simple bounds that have been derived previously, combines the information of both row and column sums. In Chapter 6 we construct from given monotone row and column sums an image satisfying those line sums that has a small boundary. We prove several bounds on the length of this boundary, and we give a few examples for which we show that no smaller boundary is possible than the one of our construction. In Chapter 7 we consider an unknown binary image, of which the length of the boundary and the area of the image are given. We derive from this some properties about the general shape of the image. First, we prove sharp lower bounds on the size of the largest connected component. Second, we derive some results about the size of the largest ball containing only ones, both in the case that the connected components of the image are all simply connected and in the general case. Each of the chapters can be read independently of the others. When results from earlier chapters are used, these are explicitly referred to. The notation will be defined separately for each chapter. Although the notation is fairly consistent throughout the thesis, there sometimes are subtle changes from one chapter to another.

CHAPTER Stability results for uniquely determined sets This chapter (with minor modifications) has been published as: Birgit van Dalen, Stability results for uniquely determined sets from two directions in discrete tomography, Discrete Mathematics 309 (009) 3905-396.. Introduction An interesting problem in discrete tomography is the stability of reconstructions. This concerns the following question: for a given binary image that is uniquely determined, can there exist a second image that is very different from the first one, but has almost the same line sums? For three or more directions, the answer is yes: there even exist two disjoint, arbitrarily large, uniquely determined images of which the line sums differ only very slightly [, 3]. Here we focus on the same question, but with only two directions. Alpers et al. [, ] showed that in this case a total error of at most in the projections can only cause a small difference in the reconstruction. They also obtained a lower bound on the error if the reconstruction is disjoint from the original image. In this chapter we improve this bound, and we resolve the open problem of stability with a projection error greater than.

Chapter Stability results for uniquely determined sets. Notation and statement of the problems Let F and F be two finite subsets of Z with characteristic functions χ and χ. (That is, χ h (x, y) = if and only if (x, y) F h, h {, }.) For i Z, we define row i as the set {(k, l) Z : k = i}. We call i the index of the row. For j Z, we define column j as the set {(k, l) Z : l = j}. We call j the index of the column. Note that we follow matrix notation: we indicate a point (i, j) by first its row index i and then its column index j. Also, we use row numbers that increase when going downwards and column numbers that increase when going to the right. The row sum r (h) i is the number of elements of the set F h in row i, that is r (h) i = j Z χ h(i, j). The column sum c (h) j of F h is the number of elements of F h in column j, that is c (h) j = i Z χ h(i, j). We refer to both row and column sums as the line sums of F h. Throughout this chapter, we assume that F is uniquely determined by its row and column sums. Such sets were studied by, among others, Ryser [4] and Wang [6]. Let a be the number of rows and b the number of columns that contain elements of F. We renumber the rows and columns such that we have r () r ()... r () a > 0, c () c ()... c () b > 0, and such that all elements of F are contained in rows and columns with positive indices. By [6, Theorem.3] we have the following property of F (see Figure.): in row i the elements of F are precisely the points (i, ), (i, ),..., (i, r () i ), in column j the elements of F are precisely the points (, j), (, j),..., (c () j, j). We will refer to this property as the triangular shape of F. Everywhere except in Section.6 we assume that F = F. Note that we do not assume F to be uniquely determined. As F and F are different and F is uniquely determined by its line sums, F cannot have exactly the same line sums as F. Define the difference or error in the line sums as c () j c () j + r () i r () i. j i

. Notation and statement of the problems 3 3 4 5 3 4 5 Figure.: A uniquely determined set with the assumed row and column ordering. As in general t s t + s mod, the above expression is congruent to j ( ) c () j + c () j + ( ) r () i + r () i F + F 0 mod, i hence the error in the line sums is always even. We will denote it by α, where α is a positive integer. For notational convenience, we will often write p for F F. We consider two problems concerning stability. Problem.. Suppose F F =. How large can F be in terms of α? Alpers et al. [, Theorem 9] proved that F α. They also showed that there is no constant c such that F cα for all F and F. In Section.4 we will prove the new bound F α( + log α) and show that this bound is asymptotically sharp. Problem.. How small can F F be in terms of F and α, or, equivalently, how large can F be in terms of F F and α? Alpers ([, Theorem 5..8]) showed in the case α = that F F F + F + 4. This bound is sharp: if F = n(n+) for some positive integer n, then there exists an example for which equality holds. A similar result is stated in [, Theorem 9]. While [, ] only deal with the case α =, we will give stability results for general α. In Section.5 we will give two different upper bounds for F. The bounds have different asymptotic behaviour. Writing p for F F, the second bound (Theorem.8) reduces to F p + + p +

4 Chapter Stability results for uniquely determined sets in case α =, which is equivalent to p F F. Hence the second new bound can be viewed as a generalisation of Alpers bound. The first new bound (Corollary.5) is different and better in the case that α is very large. In Section.6 we will generalise the results to the case F F..3 Staircases Alpers introduced the notion of a staircase to characterise F F in the case α =. We will use a slightly different definition and then show that for general α the symmetric difference F F consists of α staircases. Definition.. A set of points (p, p,..., p n ) in Z is called a staircase if the following two conditions are satisfied: for each i with i n one of the points p i and p i+ is an element of F \F and the other is an element of F \F ; either for all i the points p i and p i+ are in the same column and the points p i+ and p i+ are in the same row, or for all i the points p i and p i+ are in the same row and the points p i+ and p i+ are in the same column. This definition is different from [, ] in the following way. Firstly, the number of points does not need to be even. Secondly, the points p and p n can both be either in F \F or in F \F. So this definition is slightly more general than the one used in [, ] for the case α =. Figure.: A staircase. The set F consists of the white and the black-and-white points, while F consists of the black and the black-and-white points. The staircase is indicated by the dashed line segments. Consider a point p i F \F of a staircase (p, p,..., p n ). Assume p i is in the same column as p i and p i+ is in the same row as p i. Because of the triangular shape

.3 Staircases 5 of F, the row index of p i must be larger than the row index of p i, and the column index of p i+ must be larger than the column index of p i. Therefore, the staircase looks like a real-world staircase (see Figure.). From now on, we assume for all staircases that p is the point with the largest row index and the smallest column index, while p n is the point with the smallest row index and the largest column index. We say that the staircase begins with p and ends with p n. Lemma.. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, and F = F. Let α be defined as in Section.. Then the set F F is the disjoint union of α staircases. Proof. We will construct the staircases one by one and delete them from F F. For a subset A of F F, define ρ i (A) = {j Z : (i, j) A F } {j Z : (i, j) A F }, i Z, σ j (A) = {i Z : (i, j) A F } {i Z : (i, j) A F }, j Z, τ(a) = i ρ i (A) + j σ j (A). We have α = τ(f F ). Assume that the rows and columns are ordered as in Section.. Because of the triangular shape of F, for any point (i, j) F \F and any point (k, l) F \F we then have k > i or l > j. Suppose we have deleted some staircases and are now left with a non-empty subset A of F F. Let (p, p,..., p n ) be a staircase of maximal length that is contained in A. Let (x, y ) and (x n, y n ) be the coordinates of the points p and p n respectively. Each of those two points can be either in A F or in A F, so there are four different cases. (If n =, so p and p n are the same point, then there are only two cases.) We consider two cases; the other two are similar. First suppose p A F and p n A F. If (x, y ) is a point of A F in the same column as p, then x > x, so we can extend the staircase by adding this point. That contradicts the maximal length of the staircase. So there are no points of A F in column y. Therefore σ y (A) > 0. Similarly, since p n A F, there are no points of A F in the same column as p n. Therefore σ yn (A) < 0.

6 Chapter Stability results for uniquely determined sets All rows and all columns that contain points of the staircase, except columns y and y n, contain exactly two points of the staircase, one in A F and one in A F. Let A = A\{p, p,..., p n }. Then ρ i (A ) = ρ i (A) for all i, and σ j (A ) = σ j (A) for all j y, y n. Furthermore, σ y (A ) = σ y (A) and σ yn (A ) = σ yn (A) +. Since σ y (A) > 0 and σ yn (A) < 0, this gives τ(a ) = τ(a). Now consider the case p A F and p n A F. As above, we have σ y (A) > 0. Suppose (x n, y) is a point of A F in the same row as p n. Then y > y n, so we can extend the staircase by adding this point. That contradicts the maximal length of the staircase. So there are no points of A F in row x n. Therefore ρ xn (A) > 0. All rows and all columns that contain points of the staircase, except column y and row x n, contain exactly two points of the staircase, one in A F and one in A F. Let A = A\{p, p,..., p n }. Then ρ i (A ) = ρ i (A) for all i x n, and σ j (A ) = σ j (A) for all j y. Furthermore, σ y (A ) = σ y (A) and ρ xn (A ) = ρ xn (A). Since σ y (A) > 0 and ρ xn (A) > 0, this gives τ(a ) = τ(a). We can continue deleting staircases in this way until all points of F F have been deleted. Since τ(a) 0 for all subsets A F F, this must happen after deleting exactly α staircases. Remark.. Some remarks about the above lemma and its proof. (i) The α staircases from the previous lemma have α endpoints in total (where we count the same point twice in case of a staircase consisting of one point). Each endpoint contributes a difference of to the line sums in one row or column. Since all these differences must add up to α, they cannot cancel each other. (ii) A staircase consisting of more than one point can be split into two or more staircases. So it may be possible to write F F as the disjoint union of more than α staircases. However, in that case some of the contributions of the endpoints of staircases to the difference in the line sums cancel each other. On the other hand, it is impossible to decompose F F into fewer than α staircases. (iii) The endpoints of a staircase can be in F \F or F \F. For a staircase T of which the two endpoints are in different sets, we have T F = T F. For a staircase T of which the two endpoints are in the same set, we have T F = + T F or T F = + T F. Since F \F = F \F, the number of staircases with two endpoints in F \F must be equal to the number of staircases with two endpoints in F \F. This implies that of the α endpoints, exactly α are in the set F \F and α are in the set F \F. Consider a decomposition of F F as in the proof of Lemma.. We will now show that for our purposes we may assume that all these staircases begin with a

.4 A new bound for the disjoint case 7 point p F \F and end with a point p n F \F. Suppose there is a staircase beginning with a point (x, y) F \F. Then there also exists a staircase ending with a point (x, y ) F \F : otherwise more than half of the α endpoints would be in F \F, which is a contradiction to Remark.(iii). Because of Remark.(i) we must have r () x < r () x and r () x > r() x. Let y be such that (x, y ) F F. Delete the point (x, y) from F and add the point (x, y ) to F. Then r x () decreases by and r () x increases by, so the difference in the row sums decreases by. Meanwhile, the difference in the column sums increases by at most. So α does not increase, while F, F and F F do not change. So the new situation is just as good or better than the old one. The staircase that began with (x, y) in the old situation now begins with a point of F \F. The point that we added becomes the new endpoint of the staircase that previously ended with (x, y ). Therefore, in our investigations we may assume that all staircases begin with a point of F \F and end with a point of F \F. This is an important assumption that we will use in the proofs throughout the chapter. An immediate consequence of it is that r () i = r () i for all i. The only difference between corresponding line sums occurs in the columns..4 A new bound for the disjoint case Using the concept of staircases, we can prove a new bound for Problem.. Theorem.. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, F = F, and F F =. Let α be defined as in Section.. Then F α α i i=. Proof. Assume that the rows and columns are ordered as in Section.. Let a be the number of rows and b the number of columns that contain elements of F. Let

8 Chapter Stability results for uniquely determined sets (k, l) F. Then all the points in the rectangle {(i, j) : i k, j l} are elements of F. Since F and F are disjoint, none of the points in this rectangle is an element of F, and all the points belong to F F. So all of the kl points must belong to different staircases, which implies α kl. For all i with i a we have (i, r () i ) F, hence r () i α i. Since r() i F = Since (a, ) F, we have a α, so a i= F must be an integer, we have r () i α α i i= a α i i=.. Corollary.3. Let F, F and α be defined as in Theorem.. Then F α( + log α). Proof. We have F α α α ( α ) α i i α + x dx = α ( + log α). i= i= The following example shows that the upper bound cannot even be improved by a factor log 0.7. Example.. (taken from []) Let m be an integer. We construct sets F and F as follows (see also Figure.3). Row : (, j) F for j m, (, j) F for m + j m+. Let 0 l m. Row i, where l + i l+ : (i, j) F for j m l, (i, j) F for m l + j m l.

.5 Two bounds for general α 9 Figure.3: The construction from Example. with m = 3. The construction is almost completely symmetrical: if (i, j) F, then (j, i) F ; and if (i, j) F with i >, then (j, i) F. Since it is clear from the construction that each row contains exactly as many points of F as points of F, we conclude that each column j with j m contains exactly as many points of F as points of F as well. The only difference in the line sums occurs in the first column (which has m points of F and none of F ) and in columns m + up to m+ (each of which contains one point of F and none of F ). So we have α = m. Furthermore, m F = m + l m l = m + m m. l=0 Hence for this family of examples it holds that F = α + α log α, which is very close to the bound we proved in Corollary.3..5 Two bounds for general α In case F and F are not disjoint, we can use an approach very similar to Section.4 in order to derive a bound for Problem..

0 Chapter Stability results for uniquely determined sets Theorem.4. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, and F = F. Let α be defined as in Section., and let p = F F. Then α+p α + p F. i i= Proof. Assume that the rows and columns are ordered as in Section.. Let (k, l) F. Then all the points in the rectangle {(i, j) : i k, j l} are elements of F. At most p of the points in this rectangle are elements of F, so at least kl p points belong to F F. None of the points in the rectangle is an element of F \F, so all of the kl p points of F F in the rectangle must belong to different staircases, which implies α + p kl. For all i with i a we have (i, r () i ) F, hence r () i α+p i. Since r () i must be an integer, we have F = a i= r () i Since (a, ) F, we have a α + p, so i= a α + p. i i= α+p α + p F. i Corollary.5. Let F, F, α and p be defined as in Theorem.4. Then F (α + p)( + log(α + p)). Proof. Analogous to the proof of Corollary.3. The following example shows that the upper bound cannot even be improved by a p+ factor log 0.7, provided that α > log log(p + ). Example.. Let k and m be integers satisfying k and m k. We construct sets F and F as follows (see also Figures.4 and.5).

.5 Two bounds for general α Row : (, j) F F for j k, (, j) F for k + j m k +, (, j) F for m k + j m+ k k +. Let 0 l k. Row i, where l + i l+ : (i, ) F F, (i, j) F for j m l k l +, (i, j) F for m l k l + j m l k l +. Let k l m k. Row i, where l + i l+ : (i, j) F for j m l, (i, j) F for m l + j m l. Let m k + l m. Row i, where l l m+k + i l+ l m+k + : (i, j) F for j m l, (i, j) F for m l + j m l. Figure.4: The construction from Example. with k = 3 and m = 4. The construction is almost symmetrical: if (i, j) F, then (j, i) F ; if (i, j) F F, then (j, i) F F ; and if (i, j) F with i >, then (j, i) F. Since it is clear from the construction that each row contains exactly as many points of F as points of F, we conclude that each column j with j m k + contains exactly as many points of F as points of F as well. The only difference in

Chapter Stability results for uniquely determined sets the line sums occurs in the first column (which has m k + points of F and only k of F ) and in columns m k + up to m+ k k + (each of which contains one point of F and none of F ). So we have α = ( ( m k + ) k + ( m+ k k + ) ( m k + ) ) = m k +. It is easy to see that p = F F = k. Now we count the number of elements of F. Row contains m k + elements of F. Let 0 l k. Rows l + up to l+ together contain l ( m l k l + ) = m k + l elements of F. Let k l m k. Rows l + up to l+ together contain l m l = m elements of F. Let m k + l m. Rows l l m+k + up to l+ l m+k + together contain ( l l m+k )( m l ) = m k elements of F. Figure.5: The construction from Example. with k = and m = 4.

.5 Two bounds for general α 3 Hence the number of elements of F is k F = m k + + (k )( m k ) + l=0 +(m k + ) m + (k )( m k ) = m + m m + k k k. For this family of examples we now have F = α + p + α + p log (α + p) + p + p + l log (p + ). We will now prove another bound, which is better if p = F F is large compared to α. Let u be an integer such that u = F F. We will first derive an upper bound on u in terms of a, b and α. Then we will derive a lower bound on F in terms of a, b and α. By combining these two, we find an upper bound on u in terms of α and p. Lemma.6. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, and F = F. Let α, a and b be defined as in Section.. Define u as u = F F. Then we have u α (a + b)(a + b + α ). 4 Proof. Decompose F F into α staircases as in Lemma., and let T be the set consisting of these staircases. Let T T be a staircase and i a + a positive integer. Consider the elements of T F in rows i, i +,..., a. If such elements exist, then let w i (T ) be the largest column index that occurs among these elements. If there are no elements of T F in those rows, then let w i (T ) be equal to the smallest column index of an element of T F (no longer restricted to rows i,..., a). We have w i (T ). Define W i = T T w i(t ). Let d i be the number of elements of F \F in row i. Let y <... < y di be the column indices of the elements of F \F in row i, and let y <... < y d i be the column indices of the elements of F \F in row i. Let T i T be the set of staircases with elements in row i. The elements in F \F of these staircases are in columns y, y,..., y d i, hence the set {w i (T ) : T T i } is equal to the set {y, y,..., y d i }. The elements in F \F are in columns y, y,..., y d and are either the first element of

4 Chapter Stability results for uniquely determined sets a staircase or correspond to an element of F \F in the same column but in a row with index at least i +. In either case, for a staircase T T i we have w i+ (T ) = y j for some j. Hence the set {w i+ (T ) : T T i } is equal to the set {y, y,..., y di }. We have d i d i w i+ (T ) = y j (y di j + ) = d i y di (d i )d i, T T i j= j= and Hence d i d i w i (T ) = y j (y di + j) = d i y di + (d i + )d i. T T i j= j= W i = W i+ + T T i (w i (T ) w i+ (T )) W i+ + (d i + )d i + (d i )d i = W i+ + d i. Since W a+ α, we find W α + d + + d a. We may assume that if (x, y) is the endpoint of a staircase, then (x, y ) is an element of F F for y < y (i.e. there are no gaps between the endpoints and other elements of F F on the same row). After all, by moving the endpoint of a staircase to another empty position on the same row, the error in the columns can only become smaller (if the new position of the endpoint happens to be in the same column as the first point of another staircase, in which case the two staircases fuse together to one) but not larger, and u, a and b do not change. So on the other hand, as W is the sum of the column indices of the endpoints of the staircases, we have We conclude W (b + ) + (b + ) + + (b + α) = αb + α(α + ). α + a i= d i αb + α(α + ). Note that a i= d i = u. By the Cauchy-Schwarz inequality, we have ( a i= d i ) ( a ) ( a ) d i = u, i= i=

.5 Two bounds for general α 5 so From this it follows that or, equivalently, a i= d i u a. αb + u α(α + ) α + a, u αab + α(α )a. By symmetry we also have u αab + α(α )b. Hence u αab + α(α )(a + b). 4 Using that ab a+b, we find ( (a + b) u α + 4 ) (α )(a + b) = α (a + b)(a + b + α ). 4 4 Lemma.7. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, and F = F. Let α, a and b be defined as in Section.. Then we have F (a + b) 4(α + ). Proof. Without loss of generality, we may assume that all rows and columns that contain elements of F also contain at least one point F F : if a row or column does not contain any points of F F, we may delete it. By doing so, F F does not change, while F becomes smaller, so the situation becomes better. First consider the case r () i+ < r() i α for some i. We will show that this is impossible. If a column does not contain an element of F \F, then by the assumption above it contains an element of F \F, which must then be the first point of a staircase.

6 Chapter Stability results for uniquely determined sets Consider all points of F \F and all first points of staircases in columns r i+ +, r i+ +,..., r i. Since these are more than α columns, at least two of those points must belong to the same staircase. On the other hand, if (x, y) F \F is the first point of a staircase with r i+ < y r i, then we have x i, so the second point (x, y ) in the staircase, which is in F \F, must satisfy x i and therefore y > r i. So the second point cannot also be in one of the columns r i+ +, r i+ +,..., r i. If two points of F \F in columns r i+ +, r i+ +,..., r i belong to the same staircase, then they must be connected by a point of F \F in the same columns. However, by a similar argument this forces the next point to be outside the mentioned columns, while we assumed that it was in those columns. We conclude that it is impossible for row sums of two consecutive rows to differ by more than α. By the same argument, column sums of two consecutive columns cannot differ by more than α. Hence we have r () i+ r() i α for all i, and c () j+ c() j α for all j. We now have r () b α, r () 3 b α, and so on. Also, c () a α, c () 3 a α, and so on. Using this, we can derive a lower bound on F for fixed a and b. Consider Figure.6. The points of F are indicated by black dots. The number of points is equal to the grey area in the picture, which consists of all -squares with a point of F in the upper left corner. We can estimate this area from below by drawing a line with slope α through the point (a +, ) and a line with slope α through the point (b +, ); the area closed in by these two lines and the two axes is less than or equal to the number of points of F. Figure.6: The number of points of F (indicated by small black dots) is equal to the grey area. For α = those lines do not have a point of intersection. Under the assumption we made at the beginning of this proof, we must in this case have a = b and the number of points of F is equal to so in this case we are done. a(a + ) a (a + b) = α + 4(α + ), In order to compute the area for α we switch to the usual coordinates in R, see Figure.7. The equation of the first line is y = αx a, and the equation of the

.5 Two bounds for general α 7 second line is y = α x αb. We find that the point of intersection is given by ( aα b (x, y) = α, bα + a ) α. The area of the grey part of Figure.7 is equal to We now have a aα b α + b bα a α = a α + b α ab (α. ) F α(a + b ) ab (α ) α (a+b) (a+b) (α ) = (a + b) 4(α + ). (0, 0) (b, 0) y = α x α b (0, a) y = αx a Figure.7: Computing the area bounded by the two lines and the two axes. Theorem.8. Let F and F be finite subsets of Z such that F is uniquely determined by its row and column sums, and F = F. Let α be defined as in Section., and let p = F F. Write β = α(α +). Then ( α F p + β + β(α ) + 4(α + )p + β 4 + α ) (α ) α. 6 Proof. Write s = a + b for convenience of notation. From Lemma.6 we derive ( α u s + α ).

8 Chapter Stability results for uniquely determined sets We substitute F = u + p in Lemma.7 and use the above bound for u: Solving for s, we find α ( s + α ) s + p F 4(α + ). s α(α + ) + α(α ) + 4(α + )p + α(α + ) = β + β(α + ) + 4(α + )p + β Finally we substitute this in Lemma.6: ( α u β + β(α ) + 4(α + )p + β 4 + α ) (α ) α. 6 This, together with F = u + p, yields the claimed result. Remark.. By a straightforward generalisation of [, Proposition 3 and Lemma 6], we find a bound very similar to the one in Theorem.8: F p + (α + )(α ) + (α + ) p + (α ). 4 Theorem.8 says that F is asymptotically bounded by p + α p + α. The next example shows that F can be asymptotically as large as p + αp + α. Example.3. Let N be a positive integer. We construct F and F with total difference in the line sums equal to α as follows (see also Figure.8). Let (i, j) F F for i N, j N. Furthermore, for i N: Let (i, j), (j, i) F F for N + j N + (N i)α. Let (i, j), (j, i) F for N + (N i)α + j N + (N i + )α. Let (i, j), (j, i) F for N + (N i + )α + j N + (N i + )α. Finally, for t α, let (i, j) F with i = N + t and j = N + α + t. The only differences in the line sums occur in the first column (a difference of α) and in columns N + Nα + up to N + Nα + α (a difference of in each column). We have p = N + N(N )α = N + N α Nα,

.6 Generalisation to unequal sizes 9 N α N α Figure.8: The construction from Example.3 with N = 4 and α = 3. and F = N + N(N + )α = N + N α + Nα. From the first equality we derive N = α (α + ) + p α + + α 4(α + ). Hence F = p + Nα = p + α α + + 4α p α + + α 4 (α + )..6 Generalisation to unequal sizes Until now, we have assumed that F = F. However, we can easily generalise all the results to the case F F. Suppose F > F. Then there must be a row i with r () i > r () i. Let j > b be such that (i, j) F and define F 3 = F {(i, j)}. We have r (3) i = r () i +, so the error