Randomized Algorithms

Similar documents
Outline. Martingales. Piotr Wojciechowski 1. 1 Lane Department of Computer Science and Electrical Engineering West Virginia University.

16 Embeddings of the Euclidean metric

Lecture 18: March 15

STAT 200C: High-dimensional Statistics

Randomized Algorithms

An Algorithmist s Toolkit Nov. 10, Lecture 17

Lecture 5: Probabilistic tools and Applications II

18.409: Topics in TCS: Embeddings of Finite Metric Spaces. Lecture 6

Hamming Cube and Other Stuff

Concentration function and other stuff

1 Dimension Reduction in Euclidean Space

Very Sparse Random Projections

Embeddings of finite metric spaces in Euclidean space: a probabilistic view

STAT 200C: High-dimensional Statistics

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Lectures 6: Degree Distributions and Concentration Inequalities

Essentials on the Analysis of Randomized Algorithms

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

JOHNSON-LINDENSTRAUSS TRANSFORMATION AND RANDOM PROJECTION

Finite Metric Spaces & Their Embeddings: Introduction and Basic Tools

Non-Asymptotic Theory of Random Matrices Lecture 4: Dimension Reduction Date: January 16, 2007

Advanced Algorithms 南京大学 尹一通

11.1 Set Cover ILP formulation of set cover Deterministic rounding

Fast Dimension Reduction

Some Useful Background for Talk on the Fast Johnson-Lindenstrauss Transform

15-854: Approximation Algorithms Lecturer: Anupam Gupta Topic: Approximating Metrics by Tree Metrics Date: 10/19/2005 Scribe: Roy Liu

ECE534, Spring 2018: Solutions for Problem Set #4 Due Friday April 6, 2018

Sparse Johnson-Lindenstrauss Transforms

Accelerated Dense Random Projections

The Johnson-Lindenstrauss Lemma

Sparser Johnson-Lindenstrauss Transforms

Probabilistic Combinatorics. Jeong Han Kim

Uniform concentration inequalities, martingales, Rademacher complexity and symmetrization

Concentration inequalities and the entropy method

Lecture 4: Inequalities and Asymptotic Estimates

Concentration inequalities and tail bounds

25 Minimum bandwidth: Approximation via volume respecting embeddings

Dimensionality reduction: Johnson-Lindenstrauss lemma for structured random matrices

Lecture 5: January 30

Randomized algorithm

Methods for sparse analysis of high-dimensional data, II

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Dissertation Defense

Size and degree anti-ramsey numbers

Algorithms, Geometry and Learning. Reading group Paris Syminelakis

Tail Inequalities. The Chernoff bound works for random variables that are a sum of indicator variables with the same distribution (Bernoulli trials).

Basic Properties of Metric and Normed Spaces

Generalization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Notes taken by Costis Georgiou revised by Hamed Hatami

6.1 Occupancy Problem

Random Projections. Lopez Paz & Duvenaud. November 7, 2013

Matrix Concentration. Nick Harvey University of British Columbia

Improved Bounds on the Dot Product under Random Projection and Random Sign Projection

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

The first bound is the strongest, the other two bounds are often easier to state and compute. Proof: Applying Markov's inequality, for any >0 we have

CS5314 Randomized Algorithms. Lecture 18: Probabilistic Method (De-randomization, Sample-and-Modify)

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

17 Random Projections and Orthogonal Matching Pursuit

Vertex colorings of graphs without short odd cycles

A Simple Algorithm for Clustering Mixtures of Discrete Distributions

Induced subgraphs with many repeated degrees

Lecture 1 Measure concentration

Midterm. CS265/CME309, Fall Instructor: Gregory Valiant. Name: SUID Number:

Lecture 4: Applications: random trees, determinantal measures and sampling

Martingale techniques

Balanced Allocation Through Random Walk

Selected Exercises on Expectations and Some Probability Inequalities

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

Sketching as a Tool for Numerical Linear Algebra All Lectures. David Woodruff IBM Almaden

Graph-Constrained Group Testing

Random Methods for Linear Algebra

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1

Concentration of Measures by Bounded Couplings

Assignment 4: Solutions

The Johnson-Lindenstrauss Lemma for Linear and Integer Feasibility

Randomized Algorithms Week 2: Tail Inequalities

Probabilistic Method. Benny Sudakov. Princeton University

Faster Johnson-Lindenstrauss style reductions

Entropy and Ergodic Theory Lecture 15: A first look at concentration

A Note on Discrete Gaussian Combinations of Lattice Vectors

Sliding Windows with Limited Storage

Assignment 2 : Probabilistic Methods

Random Feature Maps for Dot Product Kernels Supplementary Material

Preliminaries and Complexity Theory

Concentration of Measures by Bounded Size Bias Couplings

A Unified Theorem on SDP Rank Reduction. yyye

Non-Uniform Graph Partitioning

Advanced Algorithms 南京大学 尹一通

Random projections. 1 Introduction. 2 Dimensionality reduction. Lecture notes 5 February 29, 2016

List coloring hypergraphs

Optimal compression of approximate Euclidean distances

Linear Analysis Lecture 5

Graph Sparsifiers: A Survey

The Chromatic Number of Random Regular Graphs

Lecture 4. P r[x > ce[x]] 1/c. = ap r[x = a] + a>ce[x] P r[x = a]

Hardness of Embedding Metric Spaces of Equal Size

Functional Analysis Exercise Class

Succinct Data Structures for Approximating Convex Functions with Applications

Transcription:

Randomized Algorithms 南京大学 尹一通

Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1 = x 1,...,X i1 = x i1 ] = x i1

Generalization Definition: Y 0,Y 1,... is a martingale with respect to X 0, X 1,... if, for all i 0, Y i is a function of X 0, X 1,...,X i ; E[Y i+1 X 0,...,X i ] = Y i.

Azuma s Inequality (general version): Let Y 0,Y 1,... be a martingale with respect to X 0, X 1,... such that, for all k 1, Then Y k Y k1 c k, Pr[ Y n Y 0 t] 2exp t 2 2 n k=1 c2 k.

Doob Sequence Definition (Doob sequence): The Doob sequence of a function f with respect to a sequence X 1,...,X n is Y i = E[f (X 1,...,X n ) X 1,...,X i ] Y 0 = E[f (X 1,...,X n )] Y n = f (X 1,...,X n )

Doob Martingale f

Doob sequence: Y i = E[f (X 1,...,X n ) X 1,...,X i ] Doob sequence is a martingale: E[Y i X 1,...,X i 1 ]=Y i 1 Proof: E[Y i X 1,...,X i 1 ] = E[E[f(X 1,...,X n ) X 1,...,X i ] X 1,...,X i 1 ] = E[f(X 1,...,X n ) X 1,...,X i 1 ] = Y i 1

The Power of Doob + Azuma For a function of (dependent) random variables: f (X 1,...,X n ) Doob martingale: Y i = E[f (X 1,...,X n ) X 1,...,X i ] Y 0 = E[f (X 1,...,X n )] Y n = f (X 1,...,X n ) If the differences Y i Y i1 are bounded, Azuma Y n Y 0 is small whp. f (X 1,...,X n ) is tightly concentrated to its mean

The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.

The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then (Azuma) Y i Y i 1 Pr f (X ) E[f (X )] t 2exp Y n Y 0 t 2 2 n i=1 c2 i. Doob martingale: Y i = E[f (X ) X 1,...,X i ]

G(n, p) chromatic number: (G) the smallest number of colors to properly color G X i : subgraph of G induced by the first i vertices Y i = E[ (G) X 1,...,X i ] Doob martingale: Y 0,Y 1,...,Y n Y 0 = E[ (G)] Y n = (G)

chromatic number: (G) X i : subgraph of G induced by the first i vertices Y i = E[ (G) X 1,...,X i ] Doob martingale: Y 0,Y 1,...,Y n Y 0 = E[ (G)] Y n = (G) Observation: a vertex can always be given a new color. Y i Y i 1 1 Pr[ (G) E[ (G)] t n] = Pr[ Y n Y 0 t n] 2e t2 /2

Tight Concentration of Chromatic Number Theorem [Shamir & Spencer (1987)]: Let G G(n, p). Then Pr (G) E (G) t n 2e t 2 /2.

Hoeffding s Inequality Hoeffding s Inequality: Let X = n i=1 X i, where X 1,...,X n are independent random variables with a i X i b i. Then Pr[ X E[X ] t] 2exp 2 n i=1 (b i a i ) 2 t 2

Hoeffding s Inequality: Let X = n i=1 X i, where X 1,...,X n are independent random variables with Then a i X i b i. Pr[ X E[X ] t] 2exp 2 n i=1 (b i a i ) 2 t 2 Proof: X is a function (sum) of X 1,...,X n. Doob martingale: Y i = E[X X 1,...,X i ] Y i Y i1 = X i E[X i ] b i a i Azuma Done!

The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then hard to check! Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.

E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Lipschitz Condition: f (x 1,...,x n ) satisfies the Lipschitz condition with constants c i,1 i n, if f (x 1,...,x i 1, x i, x i+1,...,x n ) f (x 1,...,x i 1, y i, x i+1,...,x n ) c i.

Average-case: E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Worst-case: Lipschitz Condition: f (x 1,...,x n ) satisfies the Lipschitz condition with constants c i,1 i n, if f (x 1,...,x i 1, x i, x i+1,...,x n ) f (x 1,...,x i 1, y i, x i+1,...,x n ) c i.

The method of bounded differences: Let X = (X 1,...,X n ) be n independent random variables and let f be a function satisfying the Lipschitz condition with constants c i,1 i n. Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.

The method of bounded differences: Let X = (X 1,...,X n ) be n independent random variables and let f be a function satisfying the Lipschitz condition with constants c i,1 i n. Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i. Proof: Lipschitz condition + independence bounded averaged differences

Applications of the Method A function f of independent random variable: X 1, X 2,...,X n Lipschitz condition of f: changes of any variable makes little change to the value of f. f (X 1, X 2,...,X n ) is tightly concentrated to its mean.

Occupancy Problem m-balls-into-n-bins: number of empty bins? 1 bin i is empty X i = 0 otherwise # empty bins: X = n X i i=1 E[X ] = n i=1 E[X i ] = n 1 1 n m Xi are deviation: Pr[ X E[X ] t]? dependent

Occupancy Problem m-balls-into-n-bins: # empty bins: X number of empty bins? Y j : the bin of ball j deviation: (Independent!) Pr[ X E[X ] t]? X = f (Y 1,...,Y m ) = [n] {Y 1,...,Y m } Lipschitz: changing any Y j can change X for at most 1 Pr[ X E[X ] t m] 2e t 2 /2

Pattern Matching a random string of length n, a pattern of length k, # of matched substrings? alphabet Σ =m a fixed pattern: k uniform & independent: X 1,...,X n Y : #substrings in (X 1,...,X n ) E[Y ] = (n k + 1) 1 m k Deviation?

Pattern Matching a random string of length n, a pattern of length k, # of matched substrings? alphabet Σ =m a fixed pattern: k uniform & independent: X 1,...,X n Y = f (X 1,...,X n ) changing any X i changes f for at most k Pr Y E[Y ] tk n 2e t 2 /2

Metric Embedding (X, d X ) (Y,d Y ) low-distortion: For a small 1 x 1,x 2 X, 1 dx (x 1,x 2 ) d Y ((x 1 ), (x 2 )) d X (x 1,x 2 )

Why Embedding? Some problems are easier to solve in simple metrics: TSP in trees. Curse of dimensionality proximity search; learning; due to volume explosion. 4 1 1 3 2 2 3 4 graph metric high dimension 1 1 2 2 tree metric low dimension

Dimension Reduction In Euclidian space, it is always possible to embed a set of n points in arbitrary dimension to O(log n) dimension with constant distortion. Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2

Johnson-Lindenstrauss Theorem Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2 (v) =Av. A is a random projection matrix.

Random Projection Random k d matrix A: Projection onto a uniform random subspace. (Johnson-Lindenstrauss) (Dasgupta-Gupta) i.i.d. Gaussian entries. (Indyk-Motiwani) i.i.d. -1/+1 entries. (Achlioptas) rows: A 1,A 2,...,A k random orthogonal unit vectors R d

Johnson-Lindenstrauss Theorem Let V be a set of n points in R d. For some k = O(ln n). Let A be a random k d matrix that projects onto a uniform random k-dimensional subspace. W.h.p., u, v V, u v 2 d 2 k Au (for the case ε=1/2) 2 d k Av 3 u v 2 2 the projection

Dasgupta-Gupta Proof Outline: 1. Reduce to the distortion to unit vectors. 2. Reduce to a fixed projection of random unit vector. 3. Bound the norm of the first k entries of uniform random unit vector. V : n point in R d. A: projection onto a uniform random k-subspace k = O(ln n) W.h.p., u, v V, d k Au d k Au d k Av d k Av 2 2 3 u u 2 v 2 v 2 2

A: projection onto a uniform random k-subspace O(n 2 ) pairs W.h.p., u, v V, d k Au d k Av 2 3 u 2 v 2 d k Au d k Av 2 u 2 v 2 A (u v) u v 2 3k 2d A (u v) u v 2 k 2d treat (u v) as a vector, u v u v is a unit vector New goal: unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d

A: projection onto a uniform random k-subspace unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d probabilistically equivalent random unit vector random subspace fixed unit vector fixed subspace Au

A: projection onto a uniform random k-subspace unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d uniform random unit vector: random unit vector Y =(Y 1,...,Y k,y k+1,...,y d ) Z =(Y 1,...,Y k ) Z 2 > 3k 2d or Z 2 < k 2d fixed subspace

uniform random unit vector: Y =(Y 1,...,Y k,y k+1,...,y d ) Let Z =(Y 1,...,Y k ) with probability o( 1 n 2 ) Z 2 > 3k 2d or Z 2 < k 2d Looks clean, however... Z 2 = Y1 2 + Y2 2 + + Yk 2 Y i s are dependent for uniform unit vector Y.

uniform random unit vector: Y =(Y 1,...,Y k,y k+1,...,y d ) Let Z =(Y 1,...,Y k ) Generating uniform unit vector: X 1,...,X d are i.i.d. normal distributions N(0, 1). X =(X 1,...,X d ), Y = 1 X X Y is a uniform random unit vector. The density: Pr[(x 1,...,x d )] = d i=1 1 2 e x 2 i /2 = (2 ) d/2 e x2 /2 Spherically symmetric!

X 1,...,X d are i.i.d. normal distributions N(0, 1). X =(X 1,...,X d ), Z = 1 X (X 1,...,X k ) Z 2 = X2 1 + + X 2 k X 2 1 + + X2 d Z 2 < k 2d Z 2 > 3k 2d (k 2d) (3k 2d) k i=1 k X 2 i + k X 2 i +3k d i=k+1 d i=1 i=k+1 X 2 i > 0 X 2 i < 0 Sum of independent random variables! Chernoff?

X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s.

X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s.

X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s. = (1 2 (k 2d)) k d k 2 (1 2 k) 2

X 1,...,X d are i.i.d. normal distributions N(0, 1). Pr (k 2d) k i=1 X 2 i + k d i=k+1 X 2 i > 0 (1 2 (k 2d)) k d k 2 (1 2 k) 2 e k/16 minimized when = 1 2d k k Pr (3k 2d) Xi 2 +3k Xi 2 < 0 e k/24 d i=1 i=k+1 Z = 1 X (X 1,...,X k ) Pr Z 2 > 3k 2d for some k = O(ln n) Z 2 < k = o( 1 2d n 3 )

for some k = O(ln n) Z: fixed k-subspace of random unit vector. Pr Z 2 distorted from k/d = o( 1 n 3 ) Au: random k-subspace of fixed unit vector. Pr Au 2 distorted from k/d = o( 1 n 3 ) u, v Pr d/ka(u v) 2 distorted from u v 2 = o( 1 n 3 ) O(n 2 ) pairs of (u, v). Union bound Johnson- Lindenstrauss

Johnson-Lindenstrauss Theorem Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2 (v) =Av. A is a random projection matrix.

Random Projection Random k d matrix A: Projection onto a uniform random subspace. (Johnson-Lindenstrauss) (Dasgupta-Gupta) i.i.d. Gaussian entries. (Indyk-Motiwani) i.i.d. -1/+1 entries. (Achlioptas) rows: A 1,A 2,...,A k random orthogonal unit vectors R d