The Johnson-Lindenstrauss Lemma for Linear and Integer Feasibility

Similar documents
The Johnson-Lindenstrauss Lemma in Linear Programming

Using the Johnson-Lindenstrauss lemma in linear and integer programming

1 Dimension Reduction in Euclidean Space

Lecture 18: March 15

Very Sparse Random Projections

Randomized Algorithms

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

16 Embeddings of the Euclidean metric

Complexity of linear programming: outline

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

19. Basis and Dimension

The Ellipsoid Algorithm

A strongly polynomial algorithm for linear systems having a binary solution

Linear Algebra. Preliminary Lecture Notes

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Linear Algebra. Preliminary Lecture Notes

Topological properties

18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6

Optimal compression of approximate Euclidean distances

Elementary linear algebra

Randomized Algorithms

Supremum of simple stochastic processes

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Lecture 15: Random Projections

Small Ball Probability, Arithmetic Structure and Random Matrices

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Fast Random Projections

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

approximation algorithms I

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Lecture: Algorithms for LP, SOCP and SDP

Z Algorithmic Superpower Randomization October 15th, Lecture 12

7 Convergence in R d and in Metric Spaces

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Faster Johnson-Lindenstrauss style reductions

LINEAR PROGRAMMING III

3. Vector spaces 3.1 Linear dependence and independence 3.2 Basis and dimension. 5. Extreme points and basic feasible solutions

On Approximating the Covering Radius and Finding Dense Lattice Subspaces

CSC373: Algorithm Design, Analysis and Complexity Fall 2017 DENIS PANKRATOV NOVEMBER 1, 2017

Selçuk Demir WS 2017 Functional Analysis Homework Sheet

3. Linear Programming and Polyhedral Combinatorics

MODEL ANSWERS TO THE FIRST QUIZ. 1. (18pts) (i) Give the definition of a m n matrix. A m n matrix with entries in a field F is a function

is equal to = 3 2 x, if x < 0 f (0) = lim h = 0. Therefore f exists and is continuous at 0.

Polynomiality of Linear Programming

An Algorithmist s Toolkit Nov. 10, Lecture 17

3.3 Easy ILP problems and totally unimodular matrices

COS 598D - Lattices. scribe: Srdjan Krstic

Chapter 2: Preliminaries and elements of convex analysis

Optimization (168) Lecture 7-8-9

Immerse Metric Space Homework

Thus, X is connected by Problem 4. Case 3: X = (a, b]. This case is analogous to Case 2. Case 4: X = (a, b). Choose ε < b a

Chapter 2 Metric Spaces

Random regular digraphs: singularity and spectrum

CSC 2414 Lattices in Computer Science September 27, Lecture 4. An Efficient Algorithm for Integer Programming in constant dimensions

Spaces of continuous functions

Functional Analysis HW #1

Measurable functions are approximately nice, even if look terrible.

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Lecture: Expanders, in theory and in practice (2 of 2)

15-780: LinearProgramming

Lecture 1 Introduction

Bare-bones outline of eigenvalue theory and the Jordan canonical form

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Linear and non-linear programming

Branch-and-Bound. Leo Liberti. LIX, École Polytechnique, France. INF , Lecture p. 1

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Inverses and Elementary Matrices

Integer Linear Programs

Assignment 1: From the Definition of Convexity to Helley Theorem

Chapter 1. Preliminaries

Lecture 5: Computational Complexity

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

Chapter 11. Approximation Algorithms. Slides by Kevin Wayne Pearson-Addison Wesley. All rights reserved.

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Some notes on Coxeter groups

REAL AND COMPLEX ANALYSIS

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Lecture 5. Theorems of Alternatives and Self-Dual Embedding

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

MATH FINAL EXAM REVIEW HINTS

Complexity Theory VU , SS The Polynomial Hierarchy. Reinhard Pichler

Outline. Complexity Theory EXACT TSP. The Class DP. Definition. Problem EXACT TSP. Complexity of EXACT TSP. Proposition VU 181.

MA651 Topology. Lecture 9. Compactness 2.

Model Counting for Logical Theories

15.081J/6.251J Introduction to Mathematical Programming. Lecture 18: The Ellipsoid method

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

CHAPTER 1. Metric Spaces. 1. Definition and examples

Final Review Sheet. B = (1, 1 + 3x, 1 + x 2 ) then 2 + 3x + 6x 2

Math Linear Algebra II. 1. Inner Products and Norms

Using Sparsity to Design Primal Heuristics for MILPs: Two Stories

CHAPTER 7. Connectedness

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

Introduction to Arithmetic Geometry Fall 2013 Lecture #17 11/05/2013

7 Complete metric spaces and function spaces

Chapter 8. P-adic numbers. 8.1 Absolute values

MATH 221: SOLUTIONS TO SELECTED HOMEWORK PROBLEMS

Transcription:

The Johnson-Lindenstrauss Lemma for Linear and Integer Feasibility Leo Liberti, Vu Khac Ky, Pierre-Louis Poirion CNRS LIX Ecole Polytechnique, France Rutgers University, 151118

The gist I want to solve huge LPs in standard form min{c x Ax = b x 0} I am prepared to accept approximate solutions I want to randomly project the rows of Ax = b x 0 to a lower dimensional space, while keeping the accuracy with high probability Then I can use the projected LP as a feasibility oracle And I d also like to do the same for ILPs 2

Linear Membership Problems 3

Restricted Linear Membership Restricted Linear Membership (RLM) Given vectors A 1,...,A n,b R m and X R n, is there x X s.t. b = i nx i A i? RLM is a fundamental problem, which subsumes: Linear Feasibility Problem (LFP) Given an m n matrix A and b R m, is there an x R n + s.t. Ax = b? Integer Feasibility Problem (IFP) Given an m n matrix A and b R m, is there an x Z n + s.t. Ax = b? Efficient solution methods for LFP/IFP directly yield fast bisection algorithms for (bounded) Linear Programming (LP) and Integer Linear Programming (ILP) 4

Big data RLM instances Most efficient deterministic methods: LFP: simplex method, ellipsoid algorithm IFP: SAT solution algorithms, Constraint Programming slow if m,n too large guarantees are useless if data not accurate/correct trade guarantee off for efficiency : randomized algorithms The approach: decrease m (i.e., lose some dimensions) make sure geometry of projected prob. is similar to original 5

The shape of a cloud of points RLM: Is b a restricted linear combination of pts A 1,...,A n R m? Lose dimensions but not too much accuracy can we find k m and pts A 1,...,A n R k s.t. A = (A i i n) and A = (A i i n) have the same shape? What is the shape of a set of points? A A Approximate congruence: A,A have almost the same shape if i < j n (1 ε) A i A j A i A j (1+ε) A i A j for some small ε > 0 Assume norms are all Euclidean 6

Losing dimensions in the RLM Given X R n and b,a 1,...,A n R m, find k m, b,a 1,...,A n R k such that: x X b = i nx i A i } {{ } high dimensional with high probability iff x X b = i nx i A i } {{ } low dimensional If this is possible, then solve RLM X (b,a ) Since k m, solving RLM X (b,a ) should be faster RLM X (b,a ) = RLM X (b,a) with high probability Nothing short of a miracle! 7

Losing dimensions = projection In the plane, hopeless line 2 line 1 In 3D: no better 8

Concentration of measure and the Johnson-Lindenstrauss Lemma 9

Summary 90% 90% 90% n=3 n=11 n=101 Thm. Let T be a k m rectangular matrix with each component sampled from N(0, 1 k ), and u R m s.t. u = 1. Then E( Tu 2 ) = 1 dt d S m Tu O S m 1 t 1 1 t 2 10

Controlling the distortion of T u B = set of unit vectors; by concentration of measure u B D, const > 0 Union bound on B: const t2 P(1 t Tu 1+t) 1 De const t2 P( u B (1 t Tu 1+t)) 1 B De We want nonzero probability: B De const t2 < 1 Set const t = ε k: B De ε2k < 1 hiding lotta details here k > ε 2 ln( B )+(other const) const C s.t. k > Cε 2 ln( B ) 11

Application to Euclidean distances Let A R m with A = n x y A we have Tx Ty 2 = T(x y) 2 = x y 2 x y T 2 = x y 2 Tu 2 x y u = 1, so by previous results E( Tx Ty 2 ) = x y 2 Also, T has low distortion with high probability Since {x y x,y A} is O(n 2 ), can choose k > Cε 2 ln(n) Thm.[Johnson-Lindenstrauss lemma] ε (0,1) a k m matrix T such that x,y A (1 ε) x y 2 Tx Ty 2 (1+ε) x y 2 Proof We showed T has nonzero probability of existing, so T such that ( ) holds ( ) 12

JLL properties ε > 0,k 1.8 ε2 lnn [Venkatasubramanian & Wang 2011] Can be shown that T must be sampled at worst O(n) times to ensure low distortion x,y A But on average Tx Ty = x y, and distortion decreases exponentially with n In most cases, you need only sample once We only need a logarithmic number of dimensions in function of the number of points Surprising fact: k is independent of the original number of dimensions m 13

Many possible random projectors for JLL orthogonal proj. on a random k-dim. linear subspace of R m (used in the original proof of JLL) [Johnson & Lindenstrauss, 1984] random k m matrices with entries drawn from N(0, 1 k ) (modern treatments of JLL) E.g. [Dasgupta & Gupta, 2003] random k m matrices with entries in { 1,1} with probability 1 2 ; and random k m matrices with entries = 1 and = 1 with probability 1 6, and 0 with probability 2 3 (used for sparse data sets) [Achlioptas 2003] sparser projectors are possible [Dasgupta et al., 2010; Kane & Nelson 2010; Allen-Zhu et al., 2014] also faster but dense! [Liberty, 2009] 14

Applying the JLL to the RLM 15

The main theorem Thm. Let T : R m R k be a JLL random projection, and b,a 1,...,A n R m,x be a RLM instance. For any given vector x R n, we have: (i) If b = n i=1 (ii) If b n i=1 (iii) If b n i=1 x i A i then Tb = n x i A i then P ( i=1 Tb n x i TA i ; i=1 x i TA i ) 1 2e Ck ; y i A i for all y X R n, where X is finite, then ( P y X Tb n i=1 y i TA i ) 1 2 X e Ck ; for some constant C > 0 (independent of n,k). [VPL, arxiv:1507.00990v1/math.oc] 16

Proof (i) Since T is a matrix, (i) follows by linearity 17

Proof (ii) Cor. ε (0,1) and z R m, there is a constant C such that Proof By the JLL P((1 ε) z Tz (1+ε) z ) 1 2e Cε2 k Lemma If z 0, there is a constant C such that P(Tz 0) 1 2e Ck Proof Consider events A : Tz 0 and B : (1 ε) z Tz (1+ε) z A c B =, othw Tz = 0 (1 ε) z Tz = 0 z = 0, contradiction B A P(A) P(B) 1 e Cε2k by Corollary Holds ε (0,1) hence result Now it suffices to apply the Lemma to Ax b 18

Proof (iii) By the union bound on (ii) ( = 1 P ( n ) ( { n } ) P y X Tb y i TA i = P Tb y i TA i i=1 y X i=1 { n ) } c ( {Tb n ) } c Tb y i TA i 1 y i TA i y XP y X i=1 i=1 [by (ii)] 1 2e Ck = 1 2 X e Ck y X 19

Consequences of the main theorem (i) and (ii) allow us to lose dimensions when checking certificates given x, with high probability b = ix i A i Tb = ix i TA i (iii) allows us to lose dimensions when solving IFP whenever X is polynomially bounded e.g. knapsack set {x {0,1} n α i x i d} for a fixed d with α > 0 i n (iii) also hints as to why LFP is not so easy: LFP Cone Membership Tb cone(ta 1,...,TA n ) can be written as {Tb x i TA i } x R n i n where X = R n = 2 ℵ 0, certainly not polynomially bounded! 20

What to do when X is superpolynomial 21

Separating hyperplanes Project separating hyperplanes instead Convex C R m, x C: then hyperplane c separating x, C In particular, true if C = cone(a 1,...,A n ) for A R m We aim to show x C Tx TC with high probability As above, if x C then Tx TC by linearity of T real issue is proving the converse 22

Projecting the separation Thm. Given c,b,a 1,...,A n R m of unit norm s.t. b / cone{a 1,...,A n } pointed, ε > 0, c R m s.t. c b < ε, c A i ε (i n), and T a random projector: P [ Tb / cone{ta 1,...,TA n } ] 1 4(n+1)e C(ε2 ε 3 )k for some constant C. Proof Let A be the event that T approximately preserves c χ 2 and c + χ 2 for all χ {b,a 1,...,A n }. Since A consists of 2(n + 1) events, by the JLL Corollary (squared version) and the union bound, we get Now consider χ = b P(A) 1 4(n+1)e C(ε2 ε 3 )k Tc,Tb = 1 4 ( T(c+b) 2 T(c b) 2 ) by JLL 1 4 ( c+b 2 c b 2 )+ ε 4 ( c+b 2 + c b 2 ) = c b+ε < 0 and similarly Tc,TA i 0 [VPL, arxiv:1507.00990v1/math.oc] 23

Consequences of projecting separations Result allows solving LFP by random projection hence, by bisection, also LP Probability of being correct depends on ε (the larger the better) Largest ε given by max{ε 0 c b ε i n (c A i ε)} an LP, so this does not help us much If cone(a 1,...,A n ) is almost non-pointed, ε is very small at best 24

Improving ε We can slightly worsen the decrease rate of P(Tb TA) in exchange for a better requirement on ε Proof is based on showing that the minimum distance to a cone is also approximately projected by T This version also works with non-pointed cones 25

When X is large, infinite or uncountable 26

A surprising result Thm. Given p R m, X R m at most countable s.t. p X, T : R m R k a random projector for any k 1, then P(Tp TX) = 1. Proof We have: P(Tp TX) = P ( {Tp Tx} ) = P ( {T(p x) 0} ) x X x X Note that: (a) p X x X (p x 0) (b) T has full rank k with prob. 1 Hence x X P(T(p x) 0) = 1 Intersection of countably many almost sure events is almost sure Surprising since k = 1 suffices: solve IFP on a line! [VPL, arxiv:1509.00630v1/math.oc] 27

Consequences In theory, solve any IFP by random projection to only one constraint! In practice, it doesn t work Assume X = {x Z n + Ax = b} If TAx is extremely close to Tb, a floating point approximation might yield TA = Tb even though A b You would need to compute using actual real numbers, rather than floating point 28

Enforcing a projected minimal distance Instead of Tp TX, require min x X Tp Tx > τ for some τ 0 Thm. Given τ,δ > 0 and p X R m where X finite, d = min x X p x > 0, and T : R m R k a Gaussian random projector with k log( X δ ) log( d τ ), P ( min x X T(p) T(x) > τ) > 1 δ. If p,x not integral, d can get very small and yield more floating point approximation errors; else, d 1 29

What if X is infinite? Let X = {Ax x Z n + }, where A Zmn has at least one positive row For any b Z m the problem b X is equivalent, with high probability, to its projection to a O(log n)-dimensional space The idea is to separate one positive row and apply random projection to the others Not many real problems have a strictly positive row 30

What if there is no positive row? λ X =doubling dimension of X: The doubling dimension of a metric space X is the smallest positive integer λ X such that every ball in X can be covered by 2 λ X balls of half the radius My own interpretation: attach a notion of dimension to any metric space Thm. Given p / X R m, let T : R m R k be a random projector with k Clog 2 (λ X ), where C is a constant. Then P(T(p) / T(X)) can be made arbitrarily close to 1. Proof Sketch: cover X by balls, argue for each X ball along the lines of projecting minimal distances above, and use the union bound on the ball cover. The doubling dimension λ X turns up as a coefficient of a decreasing exponential in a probability bound, which makes it possible to derive the high probability guarantee [VPL, arxiv:1509.00630v1/math.oc] 31

Computational results, a.k.a. the bad news 32

The dream Project m rows Ax = b to k m rows TAx = Tb and solve projected LP/IP 1. Assumption: solving TAx = Tb x 0 yields x s.t. Ax = b 2. Trust the method: solve netlib and miplib (IP) make sure results are good 3. Showdown! Pick huge, bad-ass LP/IP encoding some worldsaving features and solve it! 33

Solving the netlib Assumption verified on almost every instance! However, need k O(m), say k = m 10 or so Most netlib instances are too small Now TA is same size but denser than A: slow 34

Some results on uniform dense LP Matrix product TA takes too long Not counting it: can be streamlined and parallelized Also: faster JL transforms can be used Infeasible instances (sizes from 1000 1500 to 2000 2400): Uniform ǫ k CPU saving accuracy ( 1, 1) 0.1 0.5m 30% 50% ( 1, 1) 0.15 0.25m 92% 0% ( 1, 1) 0.2 0.12m 99.2% 0% (0, 1) 0.1 0.5m 10% 100% (0, 1) 0.15 0.25m 90% 100% (0, 1) 0.2 0.12m 97% 100% Feasible instances: similar CPU, 100% accuracy No solution x of TAx = Tb x 0 satisfies Ax = b Proved this happens almost surely 35

Some results on IP Similar story as for LPs But: every solution x of TAx = Tb x Z n + satisfies Ax = b Proved this also happens almost surely 36

Issues LFP projection only ever preserves feasibility, not the certificate IFP projection preserves feasibility and certificate Paradox: IP solved by BB, which only solves LP relaxations Orthants occupy less volume than whole spaces projection works better for Uniform(0, 1) Low CPU savings and accuracy: we are getting killed by the constant C in k C ε 2 lnn Bet: this method will start paying off at really large m,n 37

Some references K. Vu, P.-L. Poirion, L. Liberti, Using the Johnson-Lindenstrauss lemma in linear and integer programming, arxiv report 1507.00990v1/math.OC, 2015 K. Vu, P.-L. Poirion, L. Liberti, Gaussian random projections for Euclidean membership problems, arxiv report 1509.00630v1/math.OC, 2015 W. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, in Contemporary Mathematics, Vol. 26, 189-206, 1984 S. Dasgupta and A. Gupta, An elementary proof of a theorem of Johnson and Lindenstrauss, Random Structures and Algorithms, 22(1):60-65, 2003 S. Venkatasubramanian, Q. Wang, The Johnson-Lindenstrauss transform: an empirical study, in Proceedings of ALENEX, 2011 J. Nash, C 1 Isometric embeddings, in Annals of Mathematics, 60(3):383-396, 1954 38