Lecture 9: Random sampling, ɛ-approximation and ɛ-nets

Similar documents
CSE 648: Advanced algorithms

Topological Graph Theory Lecture 4: Circle packing representations

A non-linear lower bound for planar epsilon-nets

Def. A topological space X is disconnected if it admits a non-trivial splitting: (We ll abbreviate disjoint union of two subsets A and B meaning A B =

A Simple Proof of Optimal Epsilon Nets

Geometric Separators

Conflict-Free Coloring and its Applications

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012

Lecture 23: Hausdorff and Fréchet distance

THE UNIT DISTANCE PROBLEM ON SPHERES

Conflict-Free Colorings of Rectangles Ranges

THE VAPNIK- CHERVONENKIS DIMENSION and LEARNABILITY

A Note on Smaller Fractional Helly Numbers

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

Topological properties

On Regular Vertices of the Union of Planar Convex Objects

Quasiconformal Maps and Circle Packings

Lecture 9: Low Rank Approximation

Rose-Hulman Undergraduate Mathematics Journal

Lower Bounds on the Complexity of Simplex Range Reporting on a Pointer Machine

13 th Annual Harvard-MIT Mathematics Tournament Saturday 20 February 2010

47 EPSILON-APPROXIMATIONS & EPSILON-NETS

Generalized Ham-Sandwich Cuts

A Streaming Algorithm for 2-Center with Outliers in High Dimensions

Union of Random Minkowski Sums and Network Vulnerability Analysis

Thus, X is connected by Problem 4. Case 3: X = (a, b]. This case is analogous to Case 2. Case 4: X = (a, b). Choose ε < b a

The unit distance problem for centrally symmetric convex polygons

Partitions and Covers

Voronoi Diagrams for Oriented Spheres

Coloring translates and homothets of a convex body

Chapter 1. Preliminaries

The Vapnik-Chervonenkis Dimension

Statistics for Financial Engineering Session 2: Basic Set Theory March 19 th, 2006

Math 320-2: Midterm 2 Practice Solutions Northwestern University, Winter 2015

Lecture 5: January 30

Lecture 13: Introduction to Neural Networks

On the Beck-Fiala Conjecture for Random Set Systems

Hence C has VC-dimension d iff. π C (d) = 2 d. (4) C has VC-density l if there exists K R such that, for all

EPTAS for Maximum Clique on Disks and Unit Balls

List coloring hypergraphs

Chapter 9: Relations Relations

Learning convex bodies is hard

Computational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar

STRONGLY CONNECTED SPACES

1 The Local-to-Global Lemma

ε-nets and VC Dimension

The 2-Center Problem in Three Dimensions

The fundamental theorem of calculus for definite integration helped us to compute If has an anti-derivative,

Euclidean Shortest Path Algorithm for Spherical Obstacles

Problems for Putnam Training

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Special Topics in Algorithms

""'d+ an. such that L...,

The discrepancy of permutation families

Lecture 9: Conditional Probability and Independence

A Pair in a Crowd of Unit Balls

Locality Lower Bounds

Geometry. On Galleries with No Bad Points. P. Valtr. 1. Introduction

Part IB GEOMETRY (Lent 2016): Example Sheet 1

This chapter reviews some basic geometric facts that we will need during the course.

A Decidable Class of Planar Linear Hybrid Systems

1 Matroid intersection

k-fold unions of low-dimensional concept classes

Lecture Introduction. 2 Brief Recap of Lecture 10. CS-621 Theory Gems October 24, 2012

Geometric Incidences and related problems. Shakhar Smorodinsky. (Ben-Gurion University)

Neighborly families of boxes and bipartite coverings

Tight Hardness Results for Minimizing Discrepancy

Lecture 6 Feb 5, The Lebesgue integral continued

VC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.

(x x 0 ) 2 + (y y 0 ) 2 = ε 2, (2.11)

12.1 A Polynomial Bound on the Sample Size m for PAC Learning

arxiv: v1 [math.dg] 28 Jun 2008

MACHINE LEARNING - CS671 - Part 2a The Vapnik-Chervonenkis Dimension

Spanning Paths in Infinite Planar Graphs

Revisiting Poincaré's Theorem on presentations of discontinuous groups via fundamental polyhedra

Mathematics 530. Practice Problems. n + 1 }

Combinatorial Generalizations of Jung s Theorem

Coin representation. Chapter Koebe s theorem

The longest shortest piercing.

An algorithm to increase the node-connectivity of a digraph by one

VC Dimension and Sauer s Lemma

COMPUTING GENERATOR OF SECOND HOMOTOPY MODULE AND USING TIETZE TRANSFORMATION METHODS

Almost Cross-Intersecting and Almost Cross-Sperner Pairs offamilies of Sets

After taking the square and expanding, we get x + y 2 = (x + y) (x + y) = x 2 + 2x y + y 2, inequality in analysis, we obtain.

Irredundant Families of Subcubes

Bounds on parameters of minimally non-linear patterns

Math 6510 Homework 11

Off-diagonal hypergraph Ramsey numbers

Geometric variants of Hall s Theorem through Sperner s Lemma

1 Radon, Helly and Carathéodory theorems

Solutions to Exercises Chapter 10: Ramsey s Theorem

High Dimensional Geometry, Curse of Dimensionality, Dimension Reduction

FUNDAMENTAL GROUPS AND THE VAN KAMPEN S THEOREM

eigenvalue bounds and metric uniformization Punya Biswal & James R. Lee University of Washington Satish Rao U. C. Berkeley

The Pigeonhole Principle

Topics in Theoretical Computer Science April 08, Lecture 8

On Dominating Sets for Pseudo-disks

Definition We say that a topological manifold X is C p if there is an atlas such that the transition functions are C p.

Discrete Mathematics

Transcription:

CPS296.2 Geometric Optimization February 13, 2007 Lecture 9: Random sampling, -approximation and -nets Lecturer: Pankaj K. Agarwal Scribe: Shashidhara K. Ganjugunte In the next few lectures we will be looking at techniques of approximation, namely, random sampling, - approximations, -nets, and discrepancy. Random sampling methods are useful in obtaining fast algorithms which approximate the desired solution rather than exactly compute it. The notions of -approximations and -nets, and discrepancy are used to bound the error in approximation. 9.1 Random Sampling The need for sampling arises in many scenarios such as performing computations on large data sets, where a fast approximate solution is preferred over a slow exact one. For example, suppose we want to find the median of a set S = {s 1,..., s n } of n numbers. Algorithm 1 finds an element whose rank is close to n/2. Input: S = {s 1,..., s n } Output: s i S such that rank(s i ) rank n/2, where rank(α) denotes the rank of α as an element of S Choose random subset A S of size O( 1 log n) ; return the median of A Algorithm 1: Finding median of n numbers using random sampling In general, one chooses a random subset of suitable size (which depends upon the desired accuracy) and uses computations on the subset as an approximation to the solution to the original problem. Here is another example - computation of a centerpoint of a point set, a generalization of median to higher diemensions. Let P R d be a set of n points and let Γ be the set of halfspaces in R d. The depth of a point x R d is defined as: Depth(x, P ) = min γ P. γ Γ:x γ The depth of the point set P is Depth(P ) = max x R d Depth(x, P ). A centerpoint of P is any point x R d such that, Depth(x, p) always exists; see e.g [2]. n d+1. It is known that a center point We can adapt the paradigm of random sampling to compute the centerpoint of a point set in R d as follows: The general result on random sampling discussed in the next section will imply that the above algorithm returns a center point with probability at least 1/2. Claim 1 Depth(X, S) (1 ) n d+1 with probability at least 1/2. 9-1

Lecture 9: February 13, 2007 9-2 1: Choose a random subset A P, with A = O( 1 2 log 1 ) 2: Compute the centerpoint X of A 3: Return X Algorithm 2: Return a center point of A An application of the notion of centerpoint is in the proof of the Lipton-Tarjan separator theorem. Let G = (V, E) be a planar graph with n vertices. V can be partitioned into A, B, C such that A, B 3 4 n and C < 2 n and there is no edge between a vertex of A and a vertex of B. This result has been the basis of many divide-and-conquer algorithms for planar graphs. Miller and Thurston [3] gave proof of the above theorem using the notions of centerpoint and stereographic projection combined with the following theorem by Andreev (1936). Theorem 1 For a planar graph G, there exist a set D = D 1,..., D n of n-disks with pairwise disjoint interiors in R 2 such that D i and D j touch each other iff the corresponding vertices V i, V j have an edge between them We present a very high level view of this proof below. A planar graph G, corresponds to a system of disks D in R 2, as implied by the theorem above. Let P = {P 1,..., P n } be the centers of the disks as points of R 3. The plane in which the disks lie is chosen as the xy-plane. A stereographic projection is a function that maps a point q on the xy-plane to a point on to a point q on the unit sphere S centered at (0, 0, 1) such that q is the point of intersection with of the segment joining q and the north pole with the sphere S. Let σ be the center point of P. Miller and Thurston used a transformation which was the composition of an inverse stereographic projection, a dilation(scaling) of the xy-plane and a stereographic projection and such that the center-point of P becomes the center of sphere S. Under these conditions they proved that any random plane separates the system of disks. Using this key fact, they then proved the Lipton-Tarjan theorem. A more detailed proof can be found in [4]. 9.2 Approximations In the computation of centerpoint, the set of halfspaces Γ was used to generate subsets of P. Such a set is usually referred to as ranges. In this section, we formalize these notions and study range spaces, -approxmiations, and -nets. 9.2.1 Range spaces and VC-dimension Let X be a finite set of objects and R 2 X be a subset of the power set of X referred to as the family of ranges. The pair Σ = (X, R) is referred to as a range space. For a subset A X, the subspace of Σ induced by A is Σ A = (A, R A ), where R A = {r A : r R}. Furthermore, we say A is shattered by Σ if R A = 2 A. The VC-dimension of Σ, denoted by VC-dim(Σ), is an integer d such that, there is a shattered set of size d but not of size d + 1. We illustrate these notions with a few examples below. In the examples, we assume S to be a finite set of points (in general they can be objects) in R d, we denote by R i the family of ranges and by Σ i the corresponding range space.

Lecture 9: February 13, 2007 9-3 Example:R 1 = {S γ : γ is a half space }. In this example R 1 = O(n d ), and the range space is Σ d 1 = (S, R d ). The VC-dim(Σ 2 1) = 3, as illustrated in Figure 9.1. Figure 9.1: The points with filled interiors cannot be separated from those with hollow interiors by halfspaces. Example:R 2 = {S B : B is a ball} and Σ 2 = (S, R 2 ) and R 2 = O(n d+1 ). Example:S = {Set of n lines in R 2 } and R 3 = {{l S : l r φ} : r is a segment }, R 3 = O(n 4 ), Σ 3 = (S, R 3 ). This is shown in Figure 9.2 Figure 9.2: The thick segment indicates a range, while the dashed lines are the input objects. All the range spaces presented above, have finite VC-dimension. The following example shows a geometric range space whose VC-dimension is infinite. Example:R 4 = {S ρ : ρ is a convex polygon }, R 4 2 n, Σ = (R 4, ρ). The VC-dim(Σ) = as shown in Figure 9.3. Radon s theorem, stated below, gives an upper bound on the size of the family of ranges. Theorem 2 Let Σ = (X, R) be a finite range space such that VC-dim(Σ) = d then R = O( X d ) We now introduce the notion of -approximations and -nets.

Lecture 9: February 13, 2007 9-4 1 2 3 4 5 12 6 11 10 9 8 7 Figure 9.3: A family of convex polygons as ranges has infinite VC-dimension as it can separate any subset of points of arbitrary size lying on a circle. 9.2.2 -nets and -approximations Definition A set N X is an -approximation of Σ if for each r R, r r N X N Definition A set N X is an -net of Σ if for each r R, with r X the intersection r N is non-empty. Using Chernoff s bound one can prove that the size of an -approximation is bounded by O( 1 log R ), and 2 the size of an -net has an upper bound O( 1 log R ). For range spaces of bounded VC-dimension, however the bounds can be improved: Theorem 3 Let Σ = (X, R) be a range space, let VC-dim(Σ) = d, and let 0 <, δ < 1 be parameters. A random subset of size 8d (ln( 1 2 δ )) is an -approximation of Σ with probability 1 δ. Theorem 4 For a range space Σ = (X, R) of bounded VC-dimension d, a random subset of size 8d (ln( 1 δ )) is an -net of Σ with probability 1 δ, where δ, (0, 1). The upper bound on the size of -approximations is not tight, the tight bound being O( d 2 1 d log 1 ), whereas the upper bound on the size of -nets is tight. The following theorems give bounds on the size of a random subset if it to be an -approximation or an -net. The notion of -approximations and -nets have a wide range of applications such as machine learning, computation of center point etc.

Lecture 9: February 13, 2007 9-5 9.2.3 Discrepancy In this section we study discrepancy, a tool which is used to obtain good -approximations to a range space. Let Σ = (X, R) be a range space. A coloring χ of the set X is a map χ : X { 1, +1}. The discrepancy (denoted disc) of Σ is defined below: disc(r, χ) = a r χ(a) disc(σ, χ) = max disc(r, χ) r R disc(σ) = min χ disc(σ, χ) Let X be the set of n points in R 2 and R = {x r : r is an orthogonal rectangle }, then disc((x, R)) = O(log n). For the range space Σ 2 1 described above, disc(σ 2 1) = Θ(n 1 4 ). Intuitively, discrepancy measures imbalance in a set with respect to a set of ranges. A balanced set will have low discrepancy. Range spaces with low discrepancy have good -approximations. Relationship between -approximations and discrepancy can be found in [1]. References [1] B. Chazelle. The Discrepancy Method: Randomness and Complexity. Cambridge University Press, 2000. [2] J. Matou sek. Lectures on Discrete Geometry. Springer, 2001. [3] G. L. Miller and W. Thurston. Separators in two and three dimensions. In Proceedings of the twentysecond annual ACM symposium on Theory of computing, pages 300 309, New York, NY, USA, 1990. ACM Press. [4] J. Pach and P. K. Agarwal. Combinatorial Geometry. John-Wiley & Sons, Inc., 2004.