Randomized Algorithms 南京大学 尹一通
Martingales Definition: A sequence of random variables X 0, X 1,... is a martingale if for all i > 0, E[X i X 0,...,X i1 ] = X i1 x 0, x 1,...,x i1, E[X i X 0 = x 0, X 1 = x 1,...,X i1 = x i1 ] = x i1
Generalization Definition: Y 0,Y 1,... is a martingale with respect to X 0, X 1,... if, for all i 0, Y i is a function of X 0, X 1,...,X i ; E[Y i+1 X 0,...,X i ] = Y i.
Azuma s Inequality (general version): Let Y 0,Y 1,... be a martingale with respect to X 0, X 1,... such that, for all k 1, Then Y k Y k1 c k, Pr[ Y n Y 0 t] 2exp t 2 2 n k=1 c2 k.
Doob Sequence Definition (Doob sequence): The Doob sequence of a function f with respect to a sequence X 1,...,X n is Y i = E[f (X 1,...,X n ) X 1,...,X i ] Y 0 = E[f (X 1,...,X n )] Y n = f (X 1,...,X n )
Doob Martingale f
Doob sequence: Y i = E[f (X 1,...,X n ) X 1,...,X i ] Doob sequence is a martingale: E[Y i X 1,...,X i 1 ]=Y i 1 Proof: E[Y i X 1,...,X i 1 ] = E[E[f(X 1,...,X n ) X 1,...,X i ] X 1,...,X i 1 ] = E[f(X 1,...,X n ) X 1,...,X i 1 ] = Y i 1
The Power of Doob + Azuma For a function of (dependent) random variables: f (X 1,...,X n ) Doob martingale: Y i = E[f (X 1,...,X n ) X 1,...,X i ] Y 0 = E[f (X 1,...,X n )] Y n = f (X 1,...,X n ) If the differences Y i Y i1 are bounded, Azuma Y n Y 0 is small whp. f (X 1,...,X n ) is tightly concentrated to its mean
The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.
The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then (Azuma) Y i Y i 1 Pr f (X ) E[f (X )] t 2exp Y n Y 0 t 2 2 n i=1 c2 i. Doob martingale: Y i = E[f (X ) X 1,...,X i ]
G(n, p) chromatic number: (G) the smallest number of colors to properly color G X i : subgraph of G induced by the first i vertices Y i = E[ (G) X 1,...,X i ] Doob martingale: Y 0,Y 1,...,Y n Y 0 = E[ (G)] Y n = (G)
chromatic number: (G) X i : subgraph of G induced by the first i vertices Y i = E[ (G) X 1,...,X i ] Doob martingale: Y 0,Y 1,...,Y n Y 0 = E[ (G)] Y n = (G) Observation: a vertex can always be given a new color. Y i Y i 1 1 Pr[ (G) E[ (G)] t n] = Pr[ Y n Y 0 t n] 2e t2 /2
Tight Concentration of Chromatic Number Theorem [Shamir & Spencer (1987)]: Let G G(n, p). Then Pr (G) E (G) t n 2e t 2 /2.
Hoeffding s Inequality Hoeffding s Inequality: Let X = n i=1 X i, where X 1,...,X n are independent random variables with a i X i b i. Then Pr[ X E[X ] t] 2exp 2 n i=1 (b i a i ) 2 t 2
Hoeffding s Inequality: Let X = n i=1 X i, where X 1,...,X n are independent random variables with Then a i X i b i. Pr[ X E[X ] t] 2exp 2 n i=1 (b i a i ) 2 t 2 Proof: X is a function (sum) of X 1,...,X n. Doob martingale: Y i = E[X X 1,...,X i ] Y i Y i1 = X i E[X i ] b i a i Azuma Done!
The method of bounded differences: Let X = (X 1,...,X n ) and let f be a function of X 0, X 1,...,X n satisfying that, for all 1 i n, E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Then hard to check! Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.
E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Lipschitz Condition: f (x 1,...,x n ) satisfies the Lipschitz condition with constants c i,1 i n, if f (x 1,...,x i 1, x i, x i+1,...,x n ) f (x 1,...,x i 1, y i, x i+1,...,x n ) c i.
Average-case: E[f (X ) X 1,...,X i ] E[f (X ) X 1,...,X i1 ] c i, Worst-case: Lipschitz Condition: f (x 1,...,x n ) satisfies the Lipschitz condition with constants c i,1 i n, if f (x 1,...,x i 1, x i, x i+1,...,x n ) f (x 1,...,x i 1, y i, x i+1,...,x n ) c i.
The method of bounded differences: Let X = (X 1,...,X n ) be n independent random variables and let f be a function satisfying the Lipschitz condition with constants c i,1 i n. Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i.
The method of bounded differences: Let X = (X 1,...,X n ) be n independent random variables and let f be a function satisfying the Lipschitz condition with constants c i,1 i n. Then Pr f (X ) E[f (X )] t 2exp t 2 2 n i=1 c2 i. Proof: Lipschitz condition + independence bounded averaged differences
Applications of the Method A function f of independent random variable: X 1, X 2,...,X n Lipschitz condition of f: changes of any variable makes little change to the value of f. f (X 1, X 2,...,X n ) is tightly concentrated to its mean.
Occupancy Problem m-balls-into-n-bins: number of empty bins? 1 bin i is empty X i = 0 otherwise # empty bins: X = n X i i=1 E[X ] = n i=1 E[X i ] = n 1 1 n m Xi are deviation: Pr[ X E[X ] t]? dependent
Occupancy Problem m-balls-into-n-bins: # empty bins: X number of empty bins? Y j : the bin of ball j deviation: (Independent!) Pr[ X E[X ] t]? X = f (Y 1,...,Y m ) = [n] {Y 1,...,Y m } Lipschitz: changing any Y j can change X for at most 1 Pr[ X E[X ] t m] 2e t 2 /2
Pattern Matching a random string of length n, a pattern of length k, # of matched substrings? alphabet Σ =m a fixed pattern: k uniform & independent: X 1,...,X n Y : #substrings in (X 1,...,X n ) E[Y ] = (n k + 1) 1 m k Deviation?
Pattern Matching a random string of length n, a pattern of length k, # of matched substrings? alphabet Σ =m a fixed pattern: k uniform & independent: X 1,...,X n Y = f (X 1,...,X n ) changing any X i changes f for at most k Pr Y E[Y ] tk n 2e t 2 /2
Metric Embedding (X, d X ) (Y,d Y ) low-distortion: For a small 1 x 1,x 2 X, 1 dx (x 1,x 2 ) d Y ((x 1 ), (x 2 )) d X (x 1,x 2 )
Why Embedding? Some problems are easier to solve in simple metrics: TSP in trees. Curse of dimensionality proximity search; learning; due to volume explosion. 4 1 1 3 2 2 3 4 graph metric high dimension 1 1 2 2 tree metric low dimension
Dimension Reduction In Euclidian space, it is always possible to embed a set of n points in arbitrary dimension to O(log n) dimension with constant distortion. Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2
Johnson-Lindenstrauss Theorem Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2 (v) =Av. A is a random projection matrix.
Random Projection Random k d matrix A: Projection onto a uniform random subspace. (Johnson-Lindenstrauss) (Dasgupta-Gupta) i.i.d. Gaussian entries. (Indyk-Motiwani) i.i.d. -1/+1 entries. (Achlioptas) rows: A 1,A 2,...,A k random orthogonal unit vectors R d
Johnson-Lindenstrauss Theorem Let V be a set of n points in R d. For some k = O(ln n). Let A be a random k d matrix that projects onto a uniform random k-dimensional subspace. W.h.p., u, v V, u v 2 d 2 k Au (for the case ε=1/2) 2 d k Av 3 u v 2 2 the projection
Dasgupta-Gupta Proof Outline: 1. Reduce to the distortion to unit vectors. 2. Reduce to a fixed projection of random unit vector. 3. Bound the norm of the first k entries of uniform random unit vector. V : n point in R d. A: projection onto a uniform random k-subspace k = O(ln n) W.h.p., u, v V, d k Au d k Au d k Av d k Av 2 2 3 u u 2 v 2 v 2 2
A: projection onto a uniform random k-subspace O(n 2 ) pairs W.h.p., u, v V, d k Au d k Av 2 3 u 2 v 2 d k Au d k Av 2 u 2 v 2 A (u v) u v 2 3k 2d A (u v) u v 2 k 2d treat (u v) as a vector, u v u v is a unit vector New goal: unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d
A: projection onto a uniform random k-subspace unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d probabilistically equivalent random unit vector random subspace fixed unit vector fixed subspace Au
A: projection onto a uniform random k-subspace unit vector u, with probability o( 1 n 2 ) Au 2 > 3k 2d or Au 2 < k 2d uniform random unit vector: random unit vector Y =(Y 1,...,Y k,y k+1,...,y d ) Z =(Y 1,...,Y k ) Z 2 > 3k 2d or Z 2 < k 2d fixed subspace
uniform random unit vector: Y =(Y 1,...,Y k,y k+1,...,y d ) Let Z =(Y 1,...,Y k ) with probability o( 1 n 2 ) Z 2 > 3k 2d or Z 2 < k 2d Looks clean, however... Z 2 = Y1 2 + Y2 2 + + Yk 2 Y i s are dependent for uniform unit vector Y.
uniform random unit vector: Y =(Y 1,...,Y k,y k+1,...,y d ) Let Z =(Y 1,...,Y k ) Generating uniform unit vector: X 1,...,X d are i.i.d. normal distributions N(0, 1). X =(X 1,...,X d ), Y = 1 X X Y is a uniform random unit vector. The density: Pr[(x 1,...,x d )] = d i=1 1 2 e x 2 i /2 = (2 ) d/2 e x2 /2 Spherically symmetric!
X 1,...,X d are i.i.d. normal distributions N(0, 1). X =(X 1,...,X d ), Z = 1 X (X 1,...,X k ) Z 2 = X2 1 + + X 2 k X 2 1 + + X2 d Z 2 < k 2d Z 2 > 3k 2d (k 2d) (3k 2d) k i=1 k X 2 i + k X 2 i +3k d i=k+1 d i=1 i=k+1 X 2 i > 0 X 2 i < 0 Sum of independent random variables! Chernoff?
X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s.
X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s.
X 1,...,X d are i.i.d. normal distributions N(0, 1). Upper bound: (for > 0) (Markov) (independence) = Pr E = Pr k i=1 exp exp E (k 2d) e (k k i=1 (k 2d) (k 2d) 2d)X 2 i X 2 i + k k i=1 k i=1 d i=k+1 X 2 i + k X 2 i + k E d i=k+1 e kx2 i d X 2 i i=k+1 d X 2 i i=k+1 X 2 i > 0 > 1 E e sx2 i = 1 1 2s. = (1 2 (k 2d)) k d k 2 (1 2 k) 2
X 1,...,X d are i.i.d. normal distributions N(0, 1). Pr (k 2d) k i=1 X 2 i + k d i=k+1 X 2 i > 0 (1 2 (k 2d)) k d k 2 (1 2 k) 2 e k/16 minimized when = 1 2d k k Pr (3k 2d) Xi 2 +3k Xi 2 < 0 e k/24 d i=1 i=k+1 Z = 1 X (X 1,...,X k ) Pr Z 2 > 3k 2d for some k = O(ln n) Z 2 < k = o( 1 2d n 3 )
for some k = O(ln n) Z: fixed k-subspace of random unit vector. Pr Z 2 distorted from k/d = o( 1 n 3 ) Au: random k-subspace of fixed unit vector. Pr Au 2 distorted from k/d = o( 1 n 3 ) u, v Pr d/ka(u v) 2 distorted from u v 2 = o( 1 n 3 ) O(n 2 ) pairs of (u, v). Union bound Johnson- Lindenstrauss
Johnson-Lindenstrauss Theorem Johnson-Lindenstrauss Theorem: For any 0 < < 1, for any set V of n points in R d, there is a map : R d R k with k = O(ln n), such that u, v V, (1 ) u v 2 (u) (v) 2 (1 + ) u v 2 (v) =Av. A is a random projection matrix.
Random Projection Random k d matrix A: Projection onto a uniform random subspace. (Johnson-Lindenstrauss) (Dasgupta-Gupta) i.i.d. Gaussian entries. (Indyk-Motiwani) i.i.d. -1/+1 entries. (Achlioptas) rows: A 1,A 2,...,A k random orthogonal unit vectors R d