SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX. By Weijie Su and Emmanuel Candès Stanford University

Size: px
Start display at page:

Download "SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX. By Weijie Su and Emmanuel Candès Stanford University"

Transcription

1 arxiv: arxiv: SUPPLEMENT TO SLOPE IS ADAPTIVE TO UNKNOWN SPARSITY AND ASYMPTOTICALLY MINIMAX By Weijie Su and Emmanuel Candès Stanford University APPENDIX A: PROOFS OF TECHNICAL RESULTS As is standard, we write a n b n for two positive sequences a n and b n if there exist two constants C and C 2 possibly depending on q such that C a n b n C 2 a n for all n. Also, we write a n b n if a n /b n. A.. Proofs for Section 2. We remind the reader that the proofs in this subsection rely on some lemmas to be stated later in the Appendix. Proof of 2.. For simplicity, denote by β the full Lasso solution β Lasso, and b S the solution to the reduced Lasso problem minimize b R k 2 y X Sb 2 + λ b, where S is the support of the ground truth β. We show that i A. X S z + c/2 2 log p and ii A.2 X S X Sβ S b S < C k log 2 p/n for some constant C, both happen with probability tending to one. Now observe that X S y X S b S = X S z + X S X Sβ S b S. Hence, combining A. and A.2 and using the fact that k log p/n 0 give X S y X S b S X S X Sβ S b S + X S z C k log 2 p/n + + c/2 2 log p = o 2 log p + + c/2 2 log p < + c 2 log p Supported in part by a General Wang Yaowu Stanford Graduate Fellowship. Supported in part by NSF under grant CCF and by the Math + X Award from the Simons Foundation.

2 2 W. SU AND E. CANDÈS with probability approaching one. This last inequality together with the fact that b S obeys the KKT conditions for the reduced Lasso problem imply that padding b S with zeros on S obeys the KKT conditions for the full Lasso problem and is, therefore, solution. We need to justify A. and A.2. First, Lemmas A.6 and A.5 imply A.. Next, to show A.2, we rewrite the left-hand side in A.2 as X S X Sβ S b S = X S X SX SX S X Sy X S bs X Sz. By Lemma A.7, we have that X Sy X S bs X Sz kλ+ X Sz kλ+ 32k logp/k C k log p holds with probability at least e n/2 2ek/p k. In addition, Lemma A. with t = /2 gives XS X SX S /n k /n /2 < 3 with probability at least e n/8. Hence, from the last two inequalities it follows that A.3 X S X SX S X Sy X S bs X Sz C k log p with probability at least e n/2 2ek/p k e n/8. Since X S is independent of X S X S X S X S y X S b S X S z, Lemma A.6 gives X S X SX SX S X Sy X S bs X Sz 2 log p X S X n SX S X Sy X S bs X Sz with probability approaching one. Combining this with A.3 gives A.2. Let b S be the solution to minimize β S + X b R k 2 Sz b 2 + λ b. To complete the proof of 2., it suffices to establish i that for any constant δ > 0, b S β S 2 A.4 sup P β 0 k 2 + c 2 k log p > δ,

3 and ii A.5 b S b S = o P b S β S since A.4 and A.5 give A.6 sup P β 0 k SUPPLEMENT 3 b S β S c 2 k log p > δ for each δ > 0. Note that taking δ = / + c 2 in A.6 and using the fact that b S is solution to Lasso with probability approaching one finish the proof Proof of A.4. Let β i = if i S and otherwise zero treat as a sufficiently large positive constant. For each i S, b S,i = β i + X i z λ, and b S,i β i = X iz λ λ X iz. On the event {max i S X i z λ}, which happens with probability tending to one, this inequality gives b S β S 2 i S λ X iz 2 = kλ 2 2λ i S X iz + i S = + o P 2 + c 2 k log p, X iz 2 where we have used that both i S X i z2 and i S X i z are O Pk. This proves the claim. Proof of A.5. Apply Lemma 4.2 with T replaced by S here each of bs, b S and β is supported on S. Since k/p 0, for any constant δ > 0, all the singular values of X S lie in δ, +δ with overwhelming probability see, for example, [5]. Consequently, Lemma 4.2 ensures A.5. Proof of 2.2. We assume σ = and put λ = λ BH. As in the proof of Theorem., we decompose the total loss as β Seq β 2 = β Seq,S β S 2 + β Seq,S β S 2 = β Seq,S β S 2 + β Seq,S 2. The largest possible value of the loss off support is achieved when y S is sequentially soft-thresholded by λ [k]. Hence, by the proof of Lemma 3.3, we obtain E β Seq,S 2 = o 2k logp/k for all k-sparse β.

4 4 W. SU AND E. CANDÈS Now, we turn to consider the loss on support. For any i S, the loss is at most zi + λ ri 2 = λ 2 ri + z 2 i + 2 z i λ ri. Summing the above equalities over all i S gives E β Seq,S β S 2 k λ 2 i + zi z i λ ri. i= i S i S Note that the first term k i= λ2 i = + o 2k logp/k, and the second term has expectation E i S z2 i = k = o2k logp/k, so that it suffices to show that [ A.7 E 2 i S z i λ ri ] = o 2k logp/k. We emphasize that both z i and ri are random so that {λ ri } i S and {z i } i S may not be independent. Without loss of generality, assume S = {,..., k} and for i k, let r i be the rank of the ith observation among the first k. Since λ is nonincreasing and r i ri, we have i k z i λ ri i k z i λ r i i k z i λ i, where z z k are the order statistics of z,..., z k. The second inequality follows from the fact that for any nonnegative sequences {a i } and {b i }, i a ib i i a ib i. Therefore, letting ζ,..., ζ k be i.i.d. N 0,, A.7 follows from the estimate A.8 k λ i E ζ i = o 2k logp/k. i= To argue about A.8, we work with the approximations λ i 2 logp/i 2 and E ζ i = O log2k/i see e.g. A.5, so that the claim is a consequence of k log p i log 2k = o 2k logp/k, i i=

5 which is justified as follows: k log p i log 2k i i= k k SUPPLEMENT = C k log p/k x log 2 x dx log p log 2 k x + log x log 2 x 2 logp/k dx log p k + C 2 k logp/k for some absolute constants C, C 2. Since logp/k, it is clear that the right-hand side of the above display is of o2k logp/k. A.2. Proofs for Section 3. To begin with, we derive a dual formulation of the SLOPE program.6, which provides a nice geometrical interpretation. This dual formulation will also be used in the proof of Lemma 4.3. Our exposition largely borrows from [2]. Rewrite.6 as A.9 minimize b,r whose Lagrangian is 2 r 2 + i λ i b i subject to Xb + r = y, Lb, r, ν := 2 r 2 + i λ i b i ν Xb + r y. Hence, the dual objective is given by inf b,r Lb, r, ν = ν y sup {ν r 2 } { r 2 sup X ν b r b i { = ν y 0 ν C λ,x 2 ν 2 + otherwise, λ i b i } where C λ,x := {ν : X ν is majorized by λ} is a convex polytope. It thus follows that the dual reads A.0 maximize ν ν y 2 ν 2 subject to ν C λ,x. The equality ν y ν 2 /2 = y ν 2 /2 + y 2 /2 reveals that the dual solution ν is indeed the projection of y onto C λ,x. The minimization of

6 6 W. SU AND E. CANDÈS the Lagrangian over r is attained at r = ν. This implies that the primal solution β and the dual solution ν obey A. y X β = ν. We turn to proving the facts. Proof of Fact 3.. Without loss of generality, suppose both a and b are nonnegative and arranged in nonincreasing order. Denote by Tk a the sum of the first k terms of a with T0 a 0, and similarly for b. We have p p a 2 = a k Tk a T k a = Tk a a k a k+ +a p Tp a Similarly, k= k= p Tk b a k a k+ +a p Tp b = k= k= p a k b k. p b 2 = Tk b b k b k+ +b p Tp b Tk a b k b k+ +b p Tp a = k= which proves the claim. p k= p b k Tk a T k a = Proofs of Facts 3.2 and 3.3. Taking X = I p in the dual formulation, A. immediately implies that a prox λ a is the projection of a onto the polytope C λ,ip. By definition, C λ,ip consists of all vectors majorized by λ. Hence, a prox λ a is always majorized by λ. In particular, if a is majorized by λ, then the projection a prox λ a of a is identical to a itself. This gives prox λ a = 0. Proof of Fact 3.4. Assume a is nonnegative without loss of generality. It is intuitively obvious that b a = prox λ b prox λ a, where as usual b a means that b a R p +. In other words, if the observations increase, the fitted values do not decrease. To save time, we directly verify this claim by using Algorithm 3 FastProxSL from [2]. By the averaging step of that algorithm, we can see that for each i, j p, [prox λ a] i a j = { k= #{ k p: [prox λ a] k =[prox λ a] j }, prox λ a j = prox λ a i > 0, 0, otherwise. This holds for all a R p except for a set of measure zero. The nonnegativity of [prox λ a] i / a j along with the Lipschitz continuity of the prox imply p a k b k, k=

7 SUPPLEMENT 7 the monotonicity property. A consequence is that [prox λ a] T does not decrease as we let a i for all i T. In the limit, [prox λ a] T monotonically converges to prox λ T at. This gives the desired inequality. As a remark, we point out that the proofs of Facts 3.2 and 3.3 suggest a very simple proof of Lemma 3.. Since a prox λ a is the projection of a onto C λ,ip, prox λ a is thus the distance between a and the polytope C λ,ip. Hence, it suffices to find a point in the polytope at a distance of a λ + away from a. The point b defined as b i = min{ a i, λ i } does the job. Now, we proceed to prove the preparatory lemmas for Theorem., namely, Lemmas A.3 and A.4. The first two lemmas below can be found in []. Lemma A.. Let U be a Betaa, b random variable. Then E log U = log Γa log Γa + b, where Γ denotes the Gamma function and log Γx is the derivative with respect to x. Lemma A.2. For any integer m, m log Γm = γ + j = log m + O, m j= where γ = is the Euler constant. Lemma A.3. Let ζ N 0, I p k. Under the assumptions of Theorem., for any constant A > 0, Ak E ζ 2k logp/k i λk+i BH i= Proof of Lemma A.3. Write λ i = λ BH i for simplicity. It is sufficient to prove a stronger version in which the order statistics ζ i come from p i.i.d. N 0,. The reason is that the order statistics will be stochastically larger, thus enlarging E 2 ζ i λk+i BH +, since ζ λ2 + is nondecreasing in ζ. Applying the bias-variance decomposition, we get A.2 E 2 ζ i λ k+i + E 2 ζ i λ k+i = Var ζi + 2 E ζ i λ k+i.

8 8 W. SU AND E. CANDÈS We proceed to control each term separately. For the variance, a direct application of Proposition 4.2 in [3] gives A.3 i= Var ζ i = O i logp/i for all i p/2. Hence, Ak Ak Var ζ i = O = o2k logp/k, i logp/i i= where the last step makes use of logp/k. It remains to show that A.4 Ak i= E ζi λ k+i 2 = o2k logp/k. Let U,..., U p be i.i.d. uniform random variables on 0, and U i be the i th smallest please note that for a change, the U i s are sorted in increasing order. We know that U i is distributed as Betai, p + i and that ζ i has the same distribution as Φ U i /2. Making use of Lemmas A. and A.2 then gives E ζ 2 i = E [ Φ U i /2 2] E [ 2 log2/u i ] = 2 log 2+2 where the second step follows from + o P 2 log2/u i Φ U i /2 2 2 log2/u i for i = op. As a result, A.5 E ζ i E ζ 2 i = + o 2 logp/i E ζ i = E ζ 2 i Var ζ i = + o 2 logp/i. Similarly, since k + i = op and q is constant, we have the approximation λ k+i = + o 2 logp/k + i, which together with A.5 reveals that A.6 E ζi λ k+i 2 +o 2 [ logp/i logp/k + i ] 2+o logp/i. p j=i j = +o2 logp/i,

9 SUPPLEMENT 9 The second term in the right-hand side contributes at most o Ak logp/ak = o 2k logp/k in the sum A.4. For the first term, we get [ logp/i logp/k + i ] 2 = log 2 + k/i [ logp/i + logp/k + i ] 2 = o log 2 +k/i. Hence, it contributes at most A.7 o Ak i= Ak log 2 + k/i o = ok i= A 0 k i k i k log 2 + /xdx log 2 + /xdx = o2k logp/k. Combining A.7, A.6 and A.4 concludes the proof. Lemma A.4. Let ζ N 0, I p k and A > 0 be any constant satisfying q + A/A <. Then, under the assumptions of Theorem., 2k logp/k p k i= Ak E ζ i λ BH k+i Proof of Lemma A.4. Again, write λ i = λ BH i for simplicity. As in the proof of Lemma A.3 we work on a stronger version by assuming ζ N 0, I p. Denote by q = q + A/A. For any u 0, let α u := P N 0, > λ k+i + u = 2Φ λ k+i u. Then P ζ i > λ k+i + u is just the tail probability of the binomial distribution with p trials and success probability α u. By the Chernoff bound, this probability is bounded as A.8 P ζ i > λ k+i + u e p KLi/p αu, where KLa b := a log a b Note that A.9 KLi/p b b a + a log b is the Kullback-Leibler divergence. = i/p b α u + i/p b i pb + for all 0 < b < i/p. Hence, from A.9 it follows that A.20 α0 KL α0 KL i/p α u KL i/p α 0 = b db i pb db α u α0 e uλ k+iα 0 = iuλ k+i p i pb db α 0 e uλ k+i,

10 0 W. SU AND E. CANDÈS where the second inequality makes use of α u e uλ k+iα 0. With the proviso that q + A/A < and i Ak, it follows that A.2 α 0 = qk + i/p q i/p. Hence, substituting A.2 into A.20, we see that A.8 yields P ζ i > λ k+i + u e p KL i p αu KL i p α 0 A.22 e p KL i p αu KL i p α 0 e p KL i p α 0 exp iuλ k+i + q i exp uλ k+i. With this preparation, we conlude the proof of our lemma as follows: E 2 ζ i λ k+i + = P ζ i λ k+i 2 + > x dx and plugging A.22 gives E ζ i λ k+i = 2 0 λ 2 k+i 0 2 λ 2 p This yields the upper bound p k = = 2 0 u exp i= Ak E ζ i λ k+i λ 2 p Since the integrand obeys lim x 0 0 = 0 0 P ζ i > λ k+i + xdx up ζ i > λ k+i + udu, iuλ k+i + q i exp uλ k+i du xe x q e x i dx xe x q e x i dx. p k i= Ak 0 2 Φ q/2 2 2 Φ q/2 2 xe x q e x i dx i= 0 xe x q e x e x q e x = q 0 xe x q e x i dx xe x q e x e x q e x dx.

11 SUPPLEMENT and decays exponentially fast as x, we conclude that p k i= Ak E ζ i λ k+i 2 + is bounded by a constant. This is a bit more than we need since 2k logp/k. A.3. Proofs for Section 4. In this paper, we often use the Borell inequality to show that P N 0, I n > n + t exp t 2 /2. Lemma A.5 Borell s inequality. Let ζ N 0, I n and f be an L- Lipschitz continuous function in R n. Then for every t > 0. P fζ > E fζ + t e t2 2L 2 Lemma A.6. Let ζ,..., ζ p be i.i.d. N 0,. Then max ζ i 2 log p i holds with probability approaching one. The latter classical result can be proved in many different ways. Suffices to say that it follows from a more subtle fact, namely, that 2 log p max ζ i log log p + log 4π 2 log p + i 2 2 log p converges weakly to a Gumbel distribution [4]. Proof of Lemma 4.3. Let β lift be the lift of b T in the sense that β lift T = bt and β lift T = 0 and let T = m. Further, set ν := y X T b T = y X β lift. Applying A.0 and A. to the reduced SLOPE program, we get that X T ν λ[m]. By the assumption, X T ν is majorized by λ [m]. Hence, X ν the concatenation of X T ν and X T ν is majorized by λ = λ[m], λ [m]. This confirms that ν is dual feasible with respect to the full SLOPE program. If additionally we show that A.23 2 y X β lift 2 + i λ i β lift i = ν y 2 ν 2,

12 2 W. SU AND E. CANDÈS then the strong duality claims that β lift and ν must, respectively, be the optimal solutions to the full primal and dual. In fact, A.23 is self-evident. The right-hand side is the optimal value of the reduced dual i.e., replacing X and λ by X T and λ [m] in A.0, while the left-hand side agrees with the optimal value of the reduced primal since 2 y X β lift 2 = 2 y X b T 2 and p λ i β lift i = i= m λ i b T i. Since the reduced primal only has linear equality constraints and is clearly feasible, strong duality holds, and A.23 follows from this. Lemma A.7. i= Let k < p be any deterministic integer, then sup T =k X T z 32k logp/k with probability at least e n/2 2ek /p k. Above, the supremum is taken over all the subsets of {,..., p} with cardinality k. Proof of Lemma A.7. Conditional on z, it is easy to see that X z is distributed as i.i.d. centered Gaussian random variables with variance z 2 /n. This observation enables us to write X z d = z n ζ,..., ζ p, where ζ := ζ,..., ζ p consists of i.i.d. N 0, independent of z. Hence, it is sufficient to prove that z 2 n, ζ ζ 2 k 8k logp/k simultaneously with probability at least e n/2 2ek /p k. From Lemma A.5, we know that P z > 2 n e n/2 so we just need to establish the other inequality. To this end, observe that P ζ ζ 2 k > 8k logp/k E e 4 = ζ ζ 2 k e 2k log p k i < <i k E e 4 p k 2 k /2 e 2k log p k 2ek p k. e 2k log p k ζ 2 i + + ζ 2 i k

13 SUPPLEMENT 3 We record an elementary result which simply follows from Φ c/2 2 log /c for each 0 < c <. Lemma A.8. Fix 0 < q <. Then for all k p/2, k i= for some constant C q > 0. λ BH i 2 C q k logp/k, In the next two lemmas, we use the BHq critical values λ BH to majorize sequences of Gaussian order statistics. Again, a b means that b majorizes a. Lemma A.9. Given any constant c > / q, suppose max{ck, k+d} k < p for any deterministic sequence d that diverges to. Let ζ,..., ζ p k be i.i.d. N 0,. Then ζ k k+, ζ k k+2,..., ζ p k λ BH k +, λbh k +2,..., λbh p with probability approaching one. Proof of Lemma A.9. It suffices to prove the stronger case where ζ N 0, I p. Let U,..., U p be i.i.d. uniform random variables on [0, ] and U U p the corresponding order statistics. Since ζ k k+,..., ζ p k d = Φ U k k+/2,..., Φ U p k /2, the conclusion would follow from P U k k+j qk + j/p, j {,..., p k }. Let E,..., E p+ be i.i.d. exponential random variables with mean and denote by T i = E + + E i. Then the order statistics U i have the same joint distribution with T i /T p+. Fixing an arbitrary constant q q, /c, we have P U k k+j qk + j/p, j P T k k+j q k + j, j P T p+ > q p/q. Since P T p+ > q p/q 0 by the law of large numbers, it is sufficient to prove A.24 P T k k+j q k + j, j {,..., p k }.

14 4 W. SU AND E. CANDÈS This event can be rewritten as T k k+j T k k q j q k T k k for all j p k. Hence, A.24 is reduced to proving A.25 P min T j p k k k+j T k k q j q k T k k. As a random walk, T k k+j T k k q j has i.i.d. increments with mean q > 0 and variance. Thus min j p k T k k+j T k k q j converges weakly to a bounded random variable in distribution. Consequently, A.25 holds if one can demonstrate that q k T k k diverges to as p in probability. To see this, observe that q k T k k = q k k k k k T k k q c c k k T k k, where we use the fact k ck. Under our hypothesis q c/c <, the process {q ct/c T t : t N} is a random walk drifting towards. Recognizing that k k d, we see that q ck k/c T k k weakly diverges to since it corresponds to a position of the preceding random walk at t. This concludes the proof. Lemma A.0. Let ζ,..., ζ p k be i.i.d. N 0,. Then there exists a constant C q only depending on q such that log p ζ,..., ζ p k C q λ BH logp/k k+,..., λ BH p with probability tending to one as p and k/p 0. Proof of Lemma A.0. Let U,..., U p k be i.i.d. uniform random variables on [0, ] and replace ζ i by Φ U i /2. Note that Φ U i /2 2 log 2, λ BH k+i U 2 log 2p i k + i ; Hence, it suffices to prove that for some constant κ q, A.26 log2/u i logp/k κ q log p log2p/k + i

15 SUPPLEMENT 5 holds for all i =,..., p k with probability approaching one. Applying the representation given in the proof of Lemma A.9 and noting that T p+ = + o P p, we see that A.26 is implied by A.27 log3p/t i logp/k κ q log p log2p/k + i. We consider i 4 p and i > 4 p separately. Suppose first that i 4 p. In this case, Thus A.27 would follow from log2p/k + i = + o logp/k. log3p/t i = Olog p for all such i. This is, however, self-evident since T i E /p with probability e /p = o. Suppose now that i > 4 p. In this case, we make use of the fact that T i > i/2 p for all i with probability tending to one as p. Then we prove a stronger result, namely, log 3p i/2 p log p k κ q log p log 2p k + i. for all i > 4 p. This follows from the two observations below: log 3p i/2 p log p 2p, log {log i k + i min p i, log p }. k In the proofs of the next two lemmas, namely, Lemma A. and Lemma A.2, we introduce an orthogonal matrix Q R n n that obeys Qz = z, 0,..., 0. In the proofs, Q is further set to be measurable with respect to z. Hence, Q is independent of X. There are many options available to construct such a Q, including the Householder transformation. Set [ ] W = := QX, w W

16 6 W. SU AND E. CANDÈS where w R p and W R n p. The independence between Q and X suggests that W is still a Gaussian random matrix, consisting of i.i.d. N 0, /n entries. Note that X iz = QX i Qz = z QX i = z w i. This implies that S is constructed as the union of S and the k k indices in {,..., p} \ S with the largest w i. Since w and W are independent, we see that both W S and W S are also Gaussian random matrices. These points are crucial in the proof of these two lemmas. Lemma A.. Let k < k < min{n, p} be any deterministic integer. Denote by σ min and σ max, respectively, the smallest and the largest singular value of X S. Then for any t > 0, σ min > /n k /n t holds with probability at least e nt2 /2. Furthermore, σ max < /n + k /n + 8k logp/k /n + t holds with probability at least e nt2 /2 2ek /p k. Proof of Lemma A.. Recall that W S R n k is a Gaussian design with i.i.d. N 0, /n entries. Since W S and X S have the the same set of singular values, we consider W S. Classical theory on Wishart matrices see [5], for example asserts that i all the singular values of W S are larger than /n k /n t with probability at least e nt2 /2, and ii are all smaller than /n + k /n + t with probability at least e nt2 /2. Clearly, all the singular values larger of W S are at least as large as σ min W S. Thus, i yields the first claim. For the other, Lemma A.7 asserts that the event w S 8k logp/k happens with probability at least 2ek /p k. On this event, W S W S 2 + 8k logp/k, where denotes the spectral norm. Hence, ii gives W S W S + 8k logp/k /n+ k /n+t+ 8k logp/k with probability at least e nt2 /2 2ek /p k.

17 SUPPLEMENT 7 Lemma A.2. Denote by b S the solution to the reduced SLOPE problem 4.6 with T = S and λ = λ ɛ. Keep the assumptions from Lemma A.9, and additionally assume k / min{n, p} 0. Then there exists a constant C q only depending on q such that X X S S β S b k S C q log p λ BH n k +,..., λbh p with probability tending to one. Proof of Lemma A.2. In this proof, C is a constant that only depends on q and whose value may change at each occurrence. Rearrange the objective term as X S X S β S b S = X S X S X S X S X S y X S bs X S z = X S Q QX S X S X S X S y X S bs X S z = X S Q ξ, where ξ := QX S X S X S X S y X S bs X S z. For future usage, note that ξ only depends on w and W S and is, therefore, independent of W S. We begin by bounding ξ. It follows from the KKT condition of SLOPE that X S y X S bs is majorized by λ [k ]. Hence, it follows from Fact 3. that A.28 X S y X S bs λ [k ]. Lemma A. with t = /2 gives A.29 X S X S X S /n k /n /2 < 2.0 with probability at least e n/8 for sufficiently large p, where in the last step we have used k /n 0. Hence, from A.28 and A.29 we get ξ X S X S X S X S y X S bs X S z 2.0 λ [k ] + 4 2k logp/k A ɛ C k logp/k = C k logp/k

18 8 W. SU AND E. CANDÈS with probability at least e n/2 2ek /p k e n/8 ; we used Lemma A.7 in the second line and Lemma A.8 in the third. A.30 will help us in finishing the proof. Write A.3 X X S S β S b S = X Q ξ = W ξ = w, 0 ξ+ 0, W S ξ. S S S It follows from Lemma A.9 that w S is majorized by λ BH k +, λbh k +2,..., λbh p / n in probability. As a result, the first term in the right-hand side obeys A.32 w, 0 k ξ = ξ S w ξ w C S S n log p λ BH k k +, λbh k +2,..., λbh p with probability tending to one. For the second term, by exploiting the independence between ξ and W S, we have 0, W S ξ = d ξ ξn 2 ζ,..., ζ p k, n where ζ,..., ζ p k are i.i.d. N 0, /n. Since k /p 0, applying Lemma A.0 gives log p ζ,..., ζ p k C λ BH logp/k k +,..., λbh p with probability approaching one. Hence, owing to A.30, A.33 0, W k S ξ C log p λ BH n k +,..., λbh p holds with probability approaching one. Finally, combining A.32 and A.33 gives that X X S S β S b S = w, 0 ξ + 0, W S ξ S k C n log p k k + log p λbh k n +,..., λbh p k C log p λ BH n k +,..., λbh p holds with probability tending to one.

19 A.4. Proofs for Section 5. SUPPLEMENT 9 Lemma A.3. Keep the assumptions from Lemma 5. and let ζ,..., ζ p be i.i.d. N 0,. Then in probability. # {2 i p : ζ i > τ + ζ } Proof of Lemma A.3. With probability tending to one, τ := τ + ζ also obeys τ / 2 log p and 2 log p τ. This shows that we only need to prove a simpler version of this lemma, namely, # { i p : ζ i > τ} in probability. Put = 2 log p τ = o 2 log p and a = Pξ > τ. Then, # { i p : ζ i > τ} is a binomial random variable with p trials and success probability a. Hence, it suffices to demonstrate that ap. To this end, note that a = Φτ τ 2π e τ2 2 2 log p e log p 2 /2+ 2 log p which gives ap = p 2 log p 2 log p e+o, 2 log p e +o 2 log p. Since in fact, it is sufficient to have bounded away from 0 from below, we have e +o 2 log p / 2 log p, as we wish. Proof of Lemma 5.. For sufficiently large p, 2 ɛ log p ɛ/2τ 2. Hence, it is sufficient to show P π β β 2 ɛ/2τ 2 0 uniformly for all estimators β. Letting I be the random coordinate, β β 2 = j I β 2 j + β I τ 2 = β 2 + τ 2 2τ β I, which is smaller than or equal to ɛ/2τ 2 if and only if β I 2 β 2 + ɛτ 2. 4τ

20 20 W. SU AND E. CANDÈS Denote by A = Ay; β the set of all i {,..., p} such that β i 2 β 2 + ɛτ 2 /4τ, and let b be the minimum value of these β i. Then which gives b 2 β 2 + ɛτ 2 4τ 2 A b 2 + ɛτ 2 4τ 2 2 A b 2 ɛτ 2, 4τ A.34 A 2/ɛ. Recall that β β 2 ɛ/2τ 2 if and only if I is among these A components. Hence, A.35 P π β β 2 ɛ/2τ 2 y = P π I A y = i A P π I = i y = eτy i p, i A i= eτy i where we use the fact that A is almost surely determined by y. Since A.35 is maximal if A is the set of indices with the largest y i s, A.35 and A.34 together yield P π β β 2 ɛ/2τ 2 P π y I = τ + z I is at least the 2/ɛ th largest among y,..., y p 0, where the last step is provided by Lemma A.3. Proof of Lemma 5.3. To closely follow the proof of Lemma 5., denote by A = Ay, X; β the set of all i {,..., p} such that β i 2 β 2 + ɛα 2 τ 2 /4ατ, and keep the same notation b as before. Then β β 2 ɛ/2α 2 τ 2 if and only if I A. Hence, A.36 P π β β 2 ɛ/2α 2 τ 2 y, X = P π I A y, X = i A P π I = i y, X = i A exp ατx i y α2 τ 2 X i 2 /2 p i= exp ατx i y α2 τ 2 X i 2 /2 and this quantity is maximal if A is the set of indices i with the largest values of X i y/α τ X i 2 /2. As shown in Lemma 5., A 2/ɛ, which

21 gives SUPPLEMENT 2 A.37 P π β β 2 ɛ/2α 2 τ 2 P π X Iy/α τ X I 2 /2 is at least the 2/ɛ th largest. We complete the proof by showing that the probability in the right-hand side of A.37 is negligible uniformly over all estimators β as p. By the independence between I and X, z, we can assume I = while evaluating this probability. With this in mind, we aim to show that there are sufficiently many i s such that X iy/α τ 2 X i 2 X y/α+ τ 2 X 2 = X i z/α + τx τ 2 X i 2 X z/α τ 2 X 2 is positive. Since X z/α + τ 2 X 2 = O P /α + τ 2 + OP / n, it suffices to show that A.38 { # 2 i p : X i z/α + τx τ 2 X i 2 > C α + τ 2 + C } 2τ 2/ɛ n holds with vanishing probability for all positive constants C, C 2. By the independence between X i and z/α + τx, we can replace z/α + τx by z/α + τx, 0,..., 0 in A.38. That is, X i z/α + τx τ 2 X i 2 d = z/α + τx X i, τ 2 X2 i, τ 2 X i, 2, where X i, R n is X i without the first entry. To this end, we point out that the following three events all happen with probability tending to one: A.39 #{2 i p : X i, }/p /2, max Xi, 2 2 log p i n, n z/α + τx log p /α. Making use of this and A.38, we only need to show that { N # 2 i 0.49p : nxi, log p/n > τ log p α n nxi, > τ + τ log p # { 2 i 0.49p : log p/n α n + τ 2 + C α + τ 2 + C } 2τ n + C α + C } 2τ n

22 22 W. SU AND E. CANDÈS obeys A.40 N 2/ɛ with vanishing probability. The first line of A.39 shows that there are at least 0.49p many i s such that X i, and we assume they correspond to indices 2 i 0.49p without loss of generality. Note that N is independent of all X i, s. Observe that τ := τ + τlog p/n + C /α + C 2 τ/ n log p = α + 2 τ + O log p/n /α n for sufficiently large p to ensure log p/n is small. Hence, plugging the specific choice of τ and using α, we obtain τ + 2 log p/n τ + O 2 log p log 2 log p + O, which reveals that 2 log0.49p τ = 2 log p τ + o. Since nxn,i are i.i.d. N 0,, Lemma A.3 validates A.40. Proof of Corollary.5. Let c > 0 be a sufficiently small constant to be determined later. It is sufficient to prove the claim with p replaced by a possibly smaller value given by p := min{ cn, p} if we knew that β i = 0 for p + i p, the loss of any estimator X β would not increase after projecting onto the linear space spanned by the first p columns. Hereafter, we assume X R n p and β R p. Observe that p = On implies p = Op and, therefore, A.4 logp /k logp/k. In particular, k/p 0 and n/ logp /k. This suggests that we can apply Theorem 5.4 to our problem, obtaining inf sup P β β 2 β β 0 k 2k logp /k > ɛ. for every constant ɛ > 0. Because of A.4, we also have A.42 inf sup P β β 2 β β 0 k 2k logp/k > ɛ for any ɛ > 0.

23 SUPPLEMENT 23 Since p /n c, the smallest singular value of the Gaussian random matrix X is at least c + o P see, for example, [5]. This result, together with A.42, yields inf β sup P β 0 k X β Xβ 2 2k logp/k > c 2 ɛ for each ɛ > 0. Finally, choose c and ɛ sufficiently small such that c 2 ɛ > ɛ. REFERENCES [] M. Abramowitz and I. Stegun. Handbook of mathematical functions: with formulas, graphs, and mathematical tables. Courier Corporation, 202. [2] M. Bogdan, E. van den Berg, C. Sabatti, W. Su, and E. J. Candès. SLOPE adaptive variable selection via convex optimization. arxiv preprint arxiv: , 204. [3] S. Boucheron and M. Thomas. Concentration inequalities for order statistics. Electronic Communications in Probability, 75: 2, 202. [4] L. de Haan and A. Ferreira. Extreme value theory: An introduction. Springer Science & Business Media, [5] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing, Theory and Applications, 202. Departments of Statistics Stanford University Stanford, CA USA wjsu@stanford.edu Departments of Statistics and Mathematics Stanford University Stanford, CA USA candes@stanford.edu

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions Yin Zhang Technical Report TR05-06 Department of Computational and Applied Mathematics Rice University,

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Statistical Estimation and Testing via the Sorted l 1 Norm

Statistical Estimation and Testing via the Sorted l 1 Norm Statistical Estimation and Testing via the Sorted l Norm Ma lgorzata Bogdan a Ewout van den Berg b Weijie Su c Emmanuel J. Candès c,d a Departments of Mathematics and Computer Science, Wroc law University

More information

Random regular digraphs: singularity and spectrum

Random regular digraphs: singularity and spectrum Random regular digraphs: singularity and spectrum Nick Cook, UCLA Probability Seminar, Stanford University November 2, 2015 Universality Circular law Singularity probability Talk outline 1 Universality

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

SUPPLEMENT TO FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH

SUPPLEMENT TO FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH The Annals of Statistics arxiv: arxiv:1511.01957 SUPPLEMENT TO FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH By Weijie Su and Ma lgorzata Bogdan and Emmanuel Candès University of Pennsylvania, University

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures

Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon Measures 2 1 Borel Regular Measures We now state and prove an important regularity property of Borel regular outer measures: Stanford Mathematics Department Math 205A Lecture Supplement #4 Borel Regular & Radon

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

False Discovery Rate

False Discovery Rate False Discovery Rate Peng Zhao Department of Statistics Florida State University December 3, 2018 Peng Zhao False Discovery Rate 1/30 Outline 1 Multiple Comparison and FWER 2 False Discovery Rate 3 FDR

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Supplementary Materials to Convex Banding of the Covariance Matrix

Supplementary Materials to Convex Banding of the Covariance Matrix Supplementary Materials to Convex Banding of the Covariance Matrix Jacob Bien, Florentina Bunea, Luo Xiao May 25, 2015 A.1 Dual problem Define LΣ, A 1,..., A = 1 2 S Σ 2 F + λ W l A l, Σ. Observe that

More information

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij

Weak convergence and Brownian Motion. (telegram style notes) P.J.C. Spreij Weak convergence and Brownian Motion (telegram style notes) P.J.C. Spreij this version: December 8, 2006 1 The space C[0, ) In this section we summarize some facts concerning the space C[0, ) of real

More information

ABELIAN SELF-COMMUTATORS IN FINITE FACTORS

ABELIAN SELF-COMMUTATORS IN FINITE FACTORS ABELIAN SELF-COMMUTATORS IN FINITE FACTORS GABRIEL NAGY Abstract. An abelian self-commutator in a C*-algebra A is an element of the form A = X X XX, with X A, such that X X and XX commute. It is shown

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical

More information

4 Sums of Independent Random Variables

4 Sums of Independent Random Variables 4 Sums of Independent Random Variables Standing Assumptions: Assume throughout this section that (,F,P) is a fixed probability space and that X 1, X 2, X 3,... are independent real-valued random variables

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series

C.7. Numerical series. Pag. 147 Proof of the converging criteria for series. Theorem 5.29 (Comparison test) Let a k and b k be positive-term series C.7 Numerical series Pag. 147 Proof of the converging criteria for series Theorem 5.29 (Comparison test) Let and be positive-term series such that 0, for any k 0. i) If the series converges, then also

More information

Lecture 21 Representations of Martingales

Lecture 21 Representations of Martingales Lecture 21: Representations of Martingales 1 of 11 Course: Theory of Probability II Term: Spring 215 Instructor: Gordan Zitkovic Lecture 21 Representations of Martingales Right-continuous inverses Let

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

CHAPTER III THE PROOF OF INEQUALITIES

CHAPTER III THE PROOF OF INEQUALITIES CHAPTER III THE PROOF OF INEQUALITIES In this Chapter, the main purpose is to prove four theorems about Hardy-Littlewood- Pólya Inequality and then gives some examples of their application. We will begin

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

On a class of stochastic differential equations in a financial network model

On a class of stochastic differential equations in a financial network model 1 On a class of stochastic differential equations in a financial network model Tomoyuki Ichiba Department of Statistics & Applied Probability, Center for Financial Mathematics and Actuarial Research, University

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

Trace Class Operators and Lidskii s Theorem

Trace Class Operators and Lidskii s Theorem Trace Class Operators and Lidskii s Theorem Tom Phelan Semester 2 2009 1 Introduction The purpose of this paper is to provide the reader with a self-contained derivation of the celebrated Lidskii Trace

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2) 14:17 11/16/2 TOPIC. Convergence in distribution and related notions. This section studies the notion of the so-called convergence in distribution of real random variables. This is the kind of convergence

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA) Least singular value of random matrices Lewis Memorial Lecture / DIMACS minicourse March 18, 2008 Terence Tao (UCLA) 1 Extreme singular values Let M = (a ij ) 1 i n;1 j m be a square or rectangular matrix

More information

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor)

Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Dynkin (λ-) and π-systems; monotone classes of sets, and of functions with some examples of application (mainly of a probabilistic flavor) Matija Vidmar February 7, 2018 1 Dynkin and π-systems Some basic

More information

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016

Linear Programming. Larry Blume Cornell University, IHS Vienna and SFI. Summer 2016 Linear Programming Larry Blume Cornell University, IHS Vienna and SFI Summer 2016 These notes derive basic results in finite-dimensional linear programming using tools of convex analysis. Most sources

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 59 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Asymptotic behavior of infinity harmonic functions near an isolated singularity

Asymptotic behavior of infinity harmonic functions near an isolated singularity Asymptotic behavior of infinity harmonic functions near an isolated singularity Ovidiu Savin, Changyou Wang, Yifeng Yu Abstract In this paper, we prove if n 2 x 0 is an isolated singularity of a nonegative

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems

UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems UNDERGROUND LECTURE NOTES 1: Optimality Conditions for Constrained Optimization Problems Robert M. Freund February 2016 c 2016 Massachusetts Institute of Technology. All rights reserved. 1 1 Introduction

More information

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539 Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory

More information

Fundamental Domains, Lattice Density, and Minkowski Theorems

Fundamental Domains, Lattice Density, and Minkowski Theorems New York University, Fall 2013 Lattices, Convexity & Algorithms Lecture 3 Fundamental Domains, Lattice Density, and Minkowski Theorems Lecturers: D. Dadush, O. Regev Scribe: D. Dadush 1 Fundamental Parallelepiped

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Branch-and-cut Approaches for Chance-constrained Formulations of Reliable Network Design Problems

Branch-and-cut Approaches for Chance-constrained Formulations of Reliable Network Design Problems Branch-and-cut Approaches for Chance-constrained Formulations of Reliable Network Design Problems Yongjia Song James R. Luedtke August 9, 2012 Abstract We study solution approaches for the design of reliably

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d

More information

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1

Lecture 9. d N(0, 1). Now we fix n and think of a SRW on [0,1]. We take the k th step at time k n. and our increments are ± 1 Random Walks and Brownian Motion Tel Aviv University Spring 011 Lecture date: May 0, 011 Lecture 9 Instructor: Ron Peled Scribe: Jonathan Hermon In today s lecture we present the Brownian motion (BM).

More information

On the smoothness of the conjugacy between circle maps with a break

On the smoothness of the conjugacy between circle maps with a break On the smoothness of the conjugacy between circle maps with a break Konstantin Khanin and Saša Kocić 2 Department of Mathematics, University of Toronto, Toronto, ON, Canada M5S 2E4 2 Department of Mathematics,

More information

RANDOM WALKS WHOSE CONCAVE MAJORANTS OFTEN HAVE FEW FACES

RANDOM WALKS WHOSE CONCAVE MAJORANTS OFTEN HAVE FEW FACES RANDOM WALKS WHOSE CONCAVE MAJORANTS OFTEN HAVE FEW FACES ZHIHUA QIAO and J. MICHAEL STEELE Abstract. We construct a continuous distribution G such that the number of faces in the smallest concave majorant

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

Erdős-Renyi random graphs basics

Erdős-Renyi random graphs basics Erdős-Renyi random graphs basics Nathanaël Berestycki U.B.C. - class on percolation We take n vertices and a number p = p(n) with < p < 1. Let G(n, p(n)) be the graph such that there is an edge between

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY Uplink Downlink Duality Via Minimax Duality. Wei Yu, Member, IEEE (1) (2) IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 2, FEBRUARY 2006 361 Uplink Downlink Duality Via Minimax Duality Wei Yu, Member, IEEE Abstract The sum capacity of a Gaussian vector broadcast channel

More information

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

Optimal Power Control in Decentralized Gaussian Multiple Access Channels 1 Optimal Power Control in Decentralized Gaussian Multiple Access Channels Kamal Singh Department of Electrical Engineering Indian Institute of Technology Bombay. arxiv:1711.08272v1 [eess.sp] 21 Nov 2017

More information

THROUGHOUT this paper, we let C be a nonempty

THROUGHOUT this paper, we let C be a nonempty Strong Convergence Theorems of Multivalued Nonexpansive Mappings and Maximal Monotone Operators in Banach Spaces Kriengsak Wattanawitoon, Uamporn Witthayarat and Poom Kumam Abstract In this paper, we prove

More information

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define

HILBERT SPACES AND THE RADON-NIKODYM THEOREM. where the bar in the first equation denotes complex conjugation. In either case, for any x V define HILBERT SPACES AND THE RADON-NIKODYM THEOREM STEVEN P. LALLEY 1. DEFINITIONS Definition 1. A real inner product space is a real vector space V together with a symmetric, bilinear, positive-definite mapping,

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

Concentration Inequalities

Concentration Inequalities Chapter Concentration Inequalities I. Moment generating functions, the Chernoff method, and sub-gaussian and sub-exponential random variables a. Goal for this section: given a random variable X, how does

More information

Lasso Screening Rules via Dual Polytope Projection

Lasso Screening Rules via Dual Polytope Projection Journal of Machine Learning Research 6 (5) 63- Submitted /4; Revised 9/4; Published 5/5 Lasso Screening Rules via Dual Polytope Projection Jie Wang Department of Computational Medicine and Bioinformatics

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication)

Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Appendix B for The Evolution of Strategic Sophistication (Intended for Online Publication) Nikolaus Robalino and Arthur Robson Appendix B: Proof of Theorem 2 This appendix contains the proof of Theorem

More information

arxiv: v1 [math.st] 26 Jun 2011

arxiv: v1 [math.st] 26 Jun 2011 The Shape of the Noncentral χ 2 Density arxiv:1106.5241v1 [math.st] 26 Jun 2011 Yaming Yu Department of Statistics University of California Irvine, CA 92697, USA yamingy@uci.edu Abstract A noncentral χ

More information

FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH. Weijie Su Małgorzata Bogdan Emmanuel Candès

FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH. Weijie Su Małgorzata Bogdan Emmanuel Candès FALSE DISCOVERIES OCCUR EARLY ON THE LASSO PATH By Weijie Su Małgorzata Bogdan Emmanuel Candès Technical Report No. 2015-20 November 2015 Department of Statistics STANFORD UNIVERSITY Stanford, California

More information

Z Algorithmic Superpower Randomization October 15th, Lecture 12

Z Algorithmic Superpower Randomization October 15th, Lecture 12 15.859-Z Algorithmic Superpower Randomization October 15th, 014 Lecture 1 Lecturer: Bernhard Haeupler Scribe: Goran Žužić Today s lecture is about finding sparse solutions to linear systems. The problem

More information

Supremum of simple stochastic processes

Supremum of simple stochastic processes Subspace embeddings Daniel Hsu COMS 4772 1 Supremum of simple stochastic processes 2 Recap: JL lemma JL lemma. For any ε (0, 1/2), point set S R d of cardinality 16 ln n S = n, and k N such that k, there

More information

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, 2011 R. M. Dudley 1 A. Dvoretzky, J. Kiefer, and J. Wolfowitz 1956 proved the Dvoretzky Kiefer Wolfowitz

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Estimation of large dimensional sparse covariance matrices

Estimation of large dimensional sparse covariance matrices Estimation of large dimensional sparse covariance matrices Department of Statistics UC, Berkeley May 5, 2009 Sample covariance matrix and its eigenvalues Data: n p matrix X n (independent identically distributed)

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

On Backward Product of Stochastic Matrices

On Backward Product of Stochastic Matrices On Backward Product of Stochastic Matrices Behrouz Touri and Angelia Nedić 1 Abstract We study the ergodicity of backward product of stochastic and doubly stochastic matrices by introducing the concept

More information

Duality of multiparameter Hardy spaces H p on spaces of homogeneous type

Duality of multiparameter Hardy spaces H p on spaces of homogeneous type Duality of multiparameter Hardy spaces H p on spaces of homogeneous type Yongsheng Han, Ji Li, and Guozhen Lu Department of Mathematics Vanderbilt University Nashville, TN Internet Analysis Seminar 2012

More information

The Codimension of the Zeros of a Stable Process in Random Scenery

The Codimension of the Zeros of a Stable Process in Random Scenery The Codimension of the Zeros of a Stable Process in Random Scenery Davar Khoshnevisan The University of Utah, Department of Mathematics Salt Lake City, UT 84105 0090, U.S.A. davar@math.utah.edu http://www.math.utah.edu/~davar

More information

Chapter 1. Preliminaries

Chapter 1. Preliminaries Introduction This dissertation is a reading of chapter 4 in part I of the book : Integer and Combinatorial Optimization by George L. Nemhauser & Laurence A. Wolsey. The chapter elaborates links between

More information

MATH 426, TOPOLOGY. p 1.

MATH 426, TOPOLOGY. p 1. MATH 426, TOPOLOGY THE p-norms In this document we assume an extended real line, where is an element greater than all real numbers; the interval notation [1, ] will be used to mean [1, ) { }. 1. THE p

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization Agenda Applications of semidefinite programming 1 Control and system theory 2 Combinatorial and nonconvex optimization 3 Spectral estimation & super-resolution Control and system theory SDP in wide use

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Notes 6 : First and second moment methods

Notes 6 : First and second moment methods Notes 6 : First and second moment methods Math 733-734: Theory of Probability Lecturer: Sebastien Roch References: [Roc, Sections 2.1-2.3]. Recall: THM 6.1 (Markov s inequality) Let X be a non-negative

More information