Mathematical Foundation for Compressed Sensing

Size: px

Start display at page:

Download "Mathematical Foundation for Compressed Sensing"

Dinah Sherman
5 years ago
Views:

1 Mathematical Foundation for Compressed Sensing Jan-Olov Strömberg Royal Institute of Technology, Stockholm, Sweden Lecture 7, March 19, 2012

2 An outline for today Last time we stated:

3 An outline for today Last time we stated:

4 An outline for today Last time we stated: A RIP estimates for Structured Random matrices.

5 An outline for today Last time we stated: A RIP estimates for Structured Random matrices. Today and next 1-2 week: go through a proof of it

6 An outline for today Last time we stated: A RIP estimates for Structured Random matrices. Today and next 1-2 week: go through a proof of it Today: Go through some basic tools and usefull lemmas for the proof.

7 An outline for today Last time we stated: A RIP estimates for Structured Random matrices. Today and next 1-2 week: go through a proof of it Today: Go through some basic tools and usefull lemmas for the proof. Reduce the proof to an estimate that remains to prove.

8 An outline for today Last time we stated: A RIP estimates for Structured Random matrices. Today and next 1-2 week: go through a proof of it Today: Go through some basic tools and usefull lemmas for the proof. Reduce the proof to an estimate that remains to prove. Next week: Joel will present a proof, puttings things together About examiniation.

11 Let A be a random m N matrix. We define δ s = δ s (A) as a random variable: Definition: δ s (A) = sup x X (s) B2 (0,1) Ax 2 x 2

12 Rauhut state a RIP result for structured m N matrices: Theorem 6.2 There is a constant C > 0 and D < 0 such that for any K 1, ɛ > 0, and 0 < δ 1 and m satisfying we have: m ln(10m) > C K 2 s δ 2 ln2 (100s) ln(n), m > D K 2 s δ 2 ln(1 ɛ ).

13 Rauhut state a RIP result for structured m N matrices: Theorem 6.2 There is a constant C > 0 and D < 0 such that for any K 1, ɛ > 0, and 0 < δ 1 and m satisfying m ln(10m) > C K 2 s δ 2 ln2 (100s) ln(n), m > D K 2 s δ 2 ln(1 ɛ ). we have: For any m N structured random matrix A with coheerence bound: NmA K the following holds: P{δ s > δ} ɛ. The constants satisfy C < and D < 456.

14 In our project Joel Anderson and S. we will get: Theorem 6.2* There is a constant C > 0 and D < 0 such that for any ɛ > 0, and 0 < δ 1, the following holds:

15 In our project Joel Anderson and S. we will get: Theorem 6.2* There is a constant C > 0 and D < 0 such that for any ɛ > 0, and 0 < δ 1, the following holds: If m ln(2m) > C K 2 s δ 2 ln2 (cs/ log(2m)) ln(n),

16 In our project Joel Anderson and S. we will get: Theorem 6.2* There is a constant C > 0 and D < 0 such that for any ɛ > 0, and 0 < δ 1, the following holds: If then m ln(2m) > C K 2 s δ 2 ln2 (cs/ log(2m)) ln(n), m > D K 2 s δ 2 ln(1 ɛ ),

17 In our project Joel Anderson and S. we will get: Theorem 6.2* There is a constant C > 0 and D < 0 such that for any ɛ > 0, and 0 < δ 1, the following holds: If m ln(2m) > C K 2 s δ 2 ln2 (cs/ log(2m)) ln(n), m > D K 2 s δ 2 ln(1 ɛ ), then the m N structured random matrix A (as above) satisfy P{δ s > δ} ɛ.

18 In our project Joel Anderson and S. we will get: Theorem 6.2* There is a constant C > 0 and D < 0 such that for any ɛ > 0, and 0 < δ 1, the following holds: If m ln(2m) > C K 2 s δ 2 ln2 (cs/ log(2m)) ln(n), m > D K 2 s δ 2 ln(1 ɛ ), then the m N structured random matrix A (as above) satisfy P{δ s > δ} ɛ. The constants C 10 3 and D somewhat smaller (10 2?) and c is rather small The proof is much shorter than Rauhut-Yanis proof.

19 We may alse define the LEIP constant δ t = δ t (A) as a stochastic variable depending on the random matrice A: Definition: δ t = sup Ax 2 2 x 2 2 x B 1 (0,t) B 2 (0,1) Remark: The proof of Theorem 6.2* gives almost the same esitmate of m when δ s is replaced by δ t, t = s.

20 Our proof is much inspired by Rauhut-Yanis paper. Most ideas can be found there, some of the ideas are twisted or quite new.

21 Our proof is much inspired by Rauhut-Yanis paper. Most ideas can be found there, some of the ideas are twisted or quite new. Our proof hopefully give a little bit better constants. But most of all we think it is much shorter and dont not require so many sofisticated arguments.

22 Basic tools We list here the basics which the proof and the Preliminarey Lemmas are based on. We mark with an (R) if these ideas alse are used in Rauhu-Yanis paper:

23 (R)The triangle inequality

24 (R)The triangle inequality (R)Jensens inequality: EX p E X p when 1 p

25 (R)The triangle inequality (R)Jensens inequality: EX p E X p when 1 p (R)If X j, 1 j m are independent random variables which are symmetric (X j and X j has same distribution) and J = (j 1,..., j m ) is a multi index with j k non-negativ integer and X J = X j1, X jm then Ex J = 0 unless all j k are even.

26 (R)The triangle inequality (R)Jensens inequality: EX p E X p when 1 p (R)If X j, 1 j m are independent random variables which are symmetric (X j and X j has same distribution) and J = (j 1,..., j m ) is a multi index with j k non-negativ integer and X J = X j1, X jm then Ex J = 0 unless all j k are even. The norms X p = (E X p ) 1 p are increasing with p, i.e X p X q when p q

27 (R)The triangle inequality (R)Jensens inequality: EX p E X p when 1 p (R)If X j, 1 j m are independent random variables which are symmetric (X j and X j has same distribution) and J = (j 1,..., j m ) is a multi index with j k non-negativ integer and X J = X j1, X jm then Ex J = 0 unless all j k are even. The norms X p = (E X p ) 1 p are increasing with p, i.e X p X q when p q (R)Markov s inequality: Let X be a random variable then for λ > 0 P{ X > λ} E X λ.

28 (R)Stirlings formula: even if not quite trivial, we set it up here as a prerequisite: n! = ( n ) n 2πn e λ n, e for some numbers 1 13n λ n 1 12n.

29 (R)Stirlings formula: even if not quite trivial, we set it up here as a prerequisite: n! = ( n ) n 2πn e λ n, e for some numbers 1 13n λ n 1 12n. If a j 0,is a finite set of numbers, then (trivial put yet so powerful!):

30 (R)Stirlings formula: even if not quite trivial, we set it up here as a prerequisite: n! = ( n ) n 2πn e λ n, e for some numbers 1 13n λ n 1 12n. If a j 0,is a finite set of numbers, then (trivial put yet so powerful!): max j a j j a j.

31 (R)Stirlings formula: even if not quite trivial, we set it up here as a prerequisite: n! = ( n ) n 2πn e λ n, e for some numbers 1 13n λ n 1 12n. If a j 0,is a finite set of numbers, then (trivial put yet so powerful!): max j a j j a j. The last observation is not used, (at least not so extensivly) in Rauhut-Yani s paper, and it is the key to simplify the calculations in their proof.

32 (R)Stirlings formula: even if not quite trivial, we set it up here as a prerequisite: n! = ( n ) n 2πn e λ n, e for some numbers 1 13n λ n 1 12n. If a j 0,is a finite set of numbers, then (trivial put yet so powerful!): max j a j j a j. The last observation is not used, (at least not so extensivly) in Rauhut-Yani s paper, and it is the key to simplify the calculations in their proof.the following equivalent formulation might look less trivial: 1 max a j j j a p j p

33 Lemma 7.1: Let X j, 1 j M be realvalued st. variables, uniformly bounded in p-norm, i.e let p 1 and assume that there is a constant B such that X j p B for all j. Then max X j M 1 p B. j

34 Lemma 7.1: Let X j, 1 j M be realvalued st. variables, uniformly bounded in p-norm, i.e let p 1 and assume that there is a constant B such that X j p B for all j. Then The proof : max X j M 1 p B. j

35 Lemma 7.1: Let X j, 1 j M be realvalued st. variables, uniformly bounded in p-norm, i.e let p 1 and assume that there is a constant B such that X j p B for all j. Then The proof : HOMEWORK? max X j M 1 p B. j

36 Lemma 7.2:Asumme that the increasing sequence p 0 < p 1 < < p k < < p M satisfies p k /p k 1 γ for some γ > 1 and assume that there are st. variables X k, 1 k M and a constant B such that X k pk B for 1 k M. Then there is a number A depending only on γ such that The proof : max X j p 0 p j 0 AB p 0.

37 Lemma 7.2:Asumme that the increasing sequence p 0 < p 1 < < p k < < p M satisfies p k /p k 1 γ for some γ > 1 and assume that there are st. variables X k, 1 k M and a constant B such that X k pk B for 1 k M. Then there is a number A depending only on γ such that The proof : - HOMEWORK 1 max X j p 0 p j 0 AB p 0.

38 Let {ɛ j } be the Rademacher set independent random variable taking the value ±1 with equal probability. Then we have

39 Let {ɛ j } be the Rademacher set independent random variable taking the value ±1 with equal probability. Then we have Lemma 7.3 Kintchine inequality: Let a j, 1 j m be real numbers. Then for any positive integer n: E ɛ j a j j 2n (2n)! 2 n n! j Remark. The Rademacher set may be replaced by a set of independent bounded symmetric real random variables X j with Var(X j ) 1. a 2 j n.

40 Proof: After taking the Expectation value on the left hand side, only terms with even powers remains.

41 Proof: After taking the Expectation value on the left hand side, only terms with even powers remains. Thus the left hand side will be: J =n (2n)! (2j 1 )! (2j m )! (a J) 2.

42 Proof: After taking the Expectation value on the left hand side, only terms with even powers remains. Thus the left hand side will be: J =n (2n)! (2j 1 )! (2j m )! (a J) 2. The nth power of the sum on the right hand side will be J =n n! j 1! j m! (a J) 2.

43 Proof: After taking the Expectation value on the left hand side, only terms with even powers remains. Thus the left hand side will be: J =n (2n)! (2j 1 )! (2j m )! (a J) 2. The nth power of the sum on the right hand side will be J =n n! j 1! j m! (a J) 2. We get max J =n (2n)! (2j 1 )! (2j m)! n! j 1! j m! = (2n)! 2! 2! n! 1! 1! = (2n)! 2 2 n!.

44 Using Stirling s formula on Lemma 7.3 we get Lemma 7.4 Kintchine inequality: Let a j, 1 j m be real numbers. Then for any positive integer n: m E ɛ j a j j 2n 2 ( ) 2n n e m j a 2 j n.

45 Using Stirling s formula on Lemma 7.3 we get Lemma 7.4 Kintchine inequality: Let a j, 1 j m be real numbers. Then for any positive integer n: m E ɛ j a j j 2n 2 ( ) 2n n e By also using the trivial Chauchy s ineqauality: m E ɛ j a j j 2n m j a 2 j { min m n, ( ) 2n n } 2 e Remark: By interpolation between the even powers 2n the result may be extended to any power p 2 this at the expense of replacing the 2 factor by n m j. a 2 j n.

46 ( R)Lemma 7.5, Hoeffding s inequality for Radermader sums. Let a j and ɛ j as above, then for u 2 P j ɛ j X j u( j aj 2 ) e u 2 /2. We will not use Höffding s inequality!. Rather go back to the proof of it each time.

47 Lemma 7.5, Symmetrization. Let X = {X j } be a finite set independent random with average values EX j = X j, and let p 1 then E X j X j X j p 2 p E X E ɛ j ɛ j X j p. Proof: Let X j be an independent copy of X j. Use in order Jensen s inequality: E X j X j E X X j p E X E X j X j X j p.

48 Since (X j X j ) is symmetric for each j we may replace last expression by p E X E X E ɛ ɛ(x j X j ) = ɛ(x j X j ) p p. j By the triangle inequality: j j ɛ(x j X j ) p j ɛx j p + j ɛ X j p 2 j ɛx j p.

49 Let X = {X i } m 1 be a set of m row vectors in RN, with X i K for some constant K > 0. We use these vectors to define a quasi-metric on R N : Definitions d X (x, y) = max X i (x y). i Define the quasi-balls B X (x, r) = {y R n : d x (y, x) r} Note that the unit l 1 - ball is contained in B X (0, K). Note that B X (x, r), contains the the l 2 ball B 2 (x, r 1 ) with where r 1 = r/(k s. Thus we inherit the covering esimatite (Lemma 4.3b) from eucleadan balls.

50 Let X = {X i } m 1 be a set of m row vectors in RN, with X i K for some constant K > 0. We use these vectors to define a quasi-metric on R N : Definitions d X (x, y) = max X i (x y). i Define the quasi-balls B X (x, r) = {y R n : d x (y, x) r} Note that the unit l 1 - ball is contained in B X (0, K). Note that B X (x, r), contains the the l 2 ball B 2 (x, r 1 ) with where r 1 = r/(k s. Thus we inherit the covering esimatite (Lemma 4.3b) from eucleadan balls.

51 Let X = {X i } m 1 be a set of m row vectors in RN, with X i K for some constant K > 0. We use these vectors to define a quasi-metric on R N : Definitions d X (x, y) = max X i (x y). i Define the quasi-balls B X (x, r) = {y R n : d x (y, x) r} Note that the unit l 1 - ball is contained in B X (0, K). Note that B X (x, r), contains the the l 2 ball B 2 (x, r 1 ) with where r 1 = r/(k s. Thus we inherit the covering esimatite (Lemma 4.3b) from eucleadan balls. Lemma 4.3: Let 0 < r < 1, then N X (s) ( N s ) (1 + 2 r )s (Ne(1 + 2 r )/s)s / 2πs

52 We get Lemma 7.5: Let 0 < r K s. The set of s sparse vectors of length less then one, X (s) can be covered by N X (s) (r) balls B X (x i, r) where N X (s) (r) ( N s ) (1 + 2K s r ) s (Ne(1 + 2K s )/s) s / 2πs. r

53 For large r we can get a better covering estimate: Covering Lemma 7.6: Let 0 < r < K. Let M 8K 2 r 2 (log( 2m) ), and let {x i } be the set of grid points in the l 1 unit cube, with mesh size 1 M, i.e the set of points satisfying x 1 1 and Mx Z N. Then {x, x 1 1} is contained in i B X (x i, r). The number of grid points is less than ( N + M M ) 2 M ( ) 2Ne M. M

54 Proof:The proof is probabilistisk. Fix an arbitary point x = (x i ) with x 1 1. We will construct a random vector Z with EZ = x, by assigning to it the value e i sgn(x i ) with probability x i whenever x i 0, and assigning the value 0 with probability 1 x 1.

55 Let Z K, 1 K M be independent copies of the random vector Z and set z = 1 M Z k. M K=1 z is a random vector with values in the grid set mentioned above and with Ez = x.

56 Enough to show E Z (d X (z x)) p r p, (*) for some p > 1. E Z (d X (z x)) p = E Z max X i (z x) p i i E Z X i (z x) p. Now fix i and note that E Z X i (z) = X i (x)

57 Symmetrization lemma and Kintchine inequality gives E Z X i (z) X i (x) 2n 2 2n 1 M m ɛ k X i (Z k ) k=1 2 2n m n 2(2n/e) n M 2n (X i (Z k )) 2 2 2n 2(2n/e) n M n K 2n. k=1 To get i E Z X i (z x) 2n r 2n, we need 2 2n 2(2n/e) n M n K 2n m r 2n. 2n

58 Thus we need 2m(p/e) p 2 ( 2K r M ) p 1, for p = 2n (**) Set u = r M 2K and set p = u2 and choose 2n to be the even integer nearest to p. The inequality (**) holds if e 1 8 2me u 2 /2 1 Remark. The factor e 1 8 arise the local minimun of the analytic expression is at p = u 2 which may not be an even integer.

59 Derivating w.r.t p the logarithm of the left hand side expresion side of ( ) we get 1 log p log u, 2 and second derivate 1 2p which is no more than 1 4 if p 2. We conclude that at the even integer nearest the local min point p = u 2. the value is within av factor e b(u) from its minimum value, where b(u) < 1 8 provided u 2.Thus That is r M 2K = u 2(log(e 1 8 2m). M 8K 2 r 2 (log( 2m) ).

60 Following the idea from Rauhut-Yanis we want to proof: E δ s 2n C(K, N, m, s)(e δ s n + 1), where the constant C(K, N, m, s) is small enough. This will give us an estimate of E δ s 2n from which the probability P(δ s > δ) can be calculated.

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method CS 395T: Sublinear Algorithms Fall 2016 Prof. Eric Price Lecture 13 October 6, 2016 Scribe: Kiyeon Jeon and Loc Hoang 1 Overview In the last lecture we covered the lower bound for p th moment (p > 2) and