Derivations for the Fglasso Algorithm

Size: px
Start display at page:

Download "Derivations for the Fglasso Algorithm"

Transcription

1 Supplementary Material to Functional Graphical Models Xinghao Qiao, Shaojun Guo, and Gareth M. James This supplementary material contains the details of the algorithms with derivations in Appendix B, technical proofs of Propositions 1 2, Theorems 1 4, Lemmas 1-15 in Appendix C, and further discussion in Appendix D. B Derivations for the Fglasso Algorithm In Appendix B, we provide some further details about the fglasso algorithm and the joint fglasso algorithm. B.1 Step 2b of Algorithm 1 Note 14 is equivalent to finding w j1,, w jp 1 to minimize trace p 1 p 1 p 1 p 1 S jj wjlθ T 1 j lk w + 2 s T w + 2γ n w F. l=1 k=1 k=1 k=1 B.1 Setting the derivative of B.1 with respect to w to be zero and applying Lemma 4 yields B.1 = Θ 1 w j kk w S jj + Θ 1 j T kkw S T jj + Θ 1 j T lkw jl S T jj + Θ 1 j kl w jl S jj + 2s + 2γ n ν l k = 2 Θ 1 j kk w S jj + Θ 1 j T lkw jl S jj + s + γ n ν = 0, l k where ν = w w F We define the block residual by if w 0, and ν R M M with ν F 1 otherwise, k = 1,..., p 1. r = l k Θ 1 j T lkw jl S jj + s. B.2 1

2 If w = 0, then r F = γ n ν F γ n. Otherwise we need to solve for w in the following equation Θ 1 w j kk w S jj + r + γ n = 0. B.3 w F We replace B.3 by B.4, and standard packages in R/MatLab can be used to solve the following M 2 by M 2 nonlinear equation Θ 1 j kk S jj vecw + vecr + γ n vecw w F = 0. B.4 Hence, the block coordinate descent algorithm for solving w j in 14 is summarized in Algorithm 3. Algorithm 3 Block Coordinate Descent Algorithm for Solving w j 1. Initialize ŵ j. 2. Repeat until convergence for k = 1,..., p 1. a Compute r via B.2. b Set ŵ = 0 if r F γ n ; otherwise solve for ŵ via B.4. B.2 Steps 2a and 2c of Algorithm 1 At the jth step, we need to compute Θ 1 j in 14 given current Σ = Θ 1. Then step 2a follows by the blockwise inversion formula. Next we solve for w j via Algorithm 3, and then update Θ 1 given current w j, Θ jj, and Θ 1 j, by applying the blockwise inversion formula again. Rearranging the row and column blocks such that the j, j-th block is the last one, we obtain the permuted version of Θ 1 by Θ 1 j + U j V j U T j U j V j, where U j = Θ 1 jw j V j U T j V j and V j = Θ jj wj T U j 1 = S jj. Step 2c follows as a consequence. B.3 Joint Fglasso Algorithm We put superscript q on the terms used in Section 3.1 to denote the corresponding ones for the q-th class, 1 q Q. Then, for a fixed value of Θ q j, some calculations show that 2

3 11 with the addition of the penalty 12 is minimized by setting Θ q jj = S q jj 1 + ŵ q j T Θ q j 1 ŵ q j, B.5 where ŵ 1 j,..., ŵ Q j are obtained by minimizing Q q=1 trace S q p 1 +2γ 1n l=1 jj wq j q=1 T Θ q j 1 w q j l=1 + 2s q j q=1 T w q p 1 Q w q jl F + 2γ 2n Q w q jl 2 F, j B.6 and w q jl represents the lth M M block of w q j. Analogously to the fglasso algorithm, we summarize the joint fglasso algorithm, which is developed to solve the optimization problem 11 in Algorithm 4. Algorithm 4 Joint Functional Graphical Lasso Algorithm 1. Initialize Θ q = I and Σ q = I, q = 1,..., Q. 2. Repeat until convergence for j = 1,..., p, q = 1,..., Q. q a Compute Θ j 1 j σ q q Σ jj 1 σ q j T. Σ q j b Solve for ŵ q j in B.6 using Algorithm 5. c Reconstruct U q j S q 3. Set Êq = jj Uq j Σ q using Σ q jj T, where U q j = S q jj, σq j q = Θ j 1 ŵ q j. = U q j S q jj } q j, l : Θ jl F 0, j, l V 2, j l, q = 1,..., Q. and Σ q j q = Θ j 1 + 3

4 Setting the derivative of B.6 with respect to w q to be zero and applying Lemma 4 yield B.6 w q = Θ q j 1 kk w q Sq jj + Θ 1 j q T kkw q Sq jj T + l k Θ q j 1 T lkw q jl Sq jj T + Θ q j 1 kl w q jl Sq jj = 2 +2s q + 2λνq Θ q j 1 kk w q Sq jj + l k Θ q j 1 T lkw q jl Sq jj + sq + γ 1nν q + γ 2nµ q = 0, where ν q F 1, Q ν q F 1, µ q ν q = wq w q F q=1 µq 2 F 1, = w q Q, µ q q=1 wq 2 F = w q Q q=1 wq 2 F if Q q=1 wq 2 F = 0., if Q q=1 wq 2 F, if Q q=1 wq 2 F 0 and wq = 0. 0 and wq 0. We define the qth block residual by r q = l k Θ q j 1 T lkw q jl Sq jj + sq. B.7 If w q = 0 for all Q classes, then Q q=1 rq F Q q=1 γ 1n ν q F + γ 2n µ q F γ 1n Q + γ 2n. Otherwise if w q following equation = 0, then rq F γ 1n ; if w q 0 we need to solve for wq in the Θ q j 1 kk w q Sq jj + rq + γ 1n w q w q F w q + γ 2n Q q=1 wq 2 F = 0. B.8 Hence, the block coordinate descent algorithm for solving w q j Algorithm 5. in B.6 is summarized in 4

5 Algorithm 5 Block Coordinate Descent Algorithm for Solving w q j 1. Initialize ŵ 1 j,..., ŵ Q j. 2. Repeat until convergence for k = 1,..., p 1, q = 1,..., Q. a Compute r q b Set ŵ q via B.7. = 0 for all Q classes if Q q=1 rq F γ 1n Q + γ 2n ; otherwise go to c c For q = 1,..., Q, set ŵ q = 0 if rq F γ 1n ; otherwise solve for ŵ q via B.8. C Proofs of Technical Details C.1 Proof of Proposition 1 Substituting Θ = diagθ 1,..., Θ K into 9 yields K K max log det Θ k traces k Θ k γ n Θ 1,,Θ K k=1 k=1 which is equivalent to K separate fglasso problems in 15. j l } K Θ k,jl F, C.1 k=1 C.2 Proof of Proposition 2 If Θ is block diagonal, and i and i belong to separate index sets G k and G k, then Θ ii = 0 and hence Θ 1 ii = 0. By C.12, we have S ii F γ n Z ii F γ n. This completes the proof for the sufficient condition. Next we prove the condition is necessary. We construct Θ k by solving the fglasso problem 9 applied to the symmetric submatrix of S given by index set G k for k = 1,..., K, and let Θ = diag Θ 1,..., Θ K. Since S ii F γ n for all i G k, i G k, k k, and Θ ii = 0 by construction, we have Θ 1 ii = 0 and hence the i, i -th equation of C.12 is satisfied. Moreover, the k, k-th equation of C.12 is satisfied by construction. Therefore, Θ satisfies the KKT condition C.12 and is the solution to the fglasso problem 9. 5

6 C.3 Proof of Theorem 1 We begin with some notation. For any Hs, t, s, t T 2 with the corresponding Karhunen- Loève decomposition Hs, t = j=1 λ jφ j sφ j t, define H S = 1/2. λj 2 For two square-integrable functions ft, gt, define f, g = t T ftgtdt and f 2 = f, f. Denote also a i = λ 1/2 ξ i, where ξ i N0, 1 and λ 0 = sup jp k=1 λ. j 1 We now prove Theorem 1. We first consider σ jk for j = 1,..., p and k = 1,..., M. Note that n σ jk = n â2 i nā2 and σ jk = Ea2 1 with ā = n 1 n âi, and, for each i, j, k, â i = g ij, φ = a i + g ij, φ φ. Then n σ jk σ jk is rewritten as n σ jk σ jk = λ n + ξ 2 i 1 + 2λ 1/2 n ξ i g ij, φ φ n g ij, φ φ 2 nā 2 = I 1 + I 2 + I 3 + I 4. Note that for δ > 0, P σjk σ jk 4δ 4 m=1 P Im nδ. To derive the concentration inequality of n σ jk σ jk, it suffices to derive the tail behaviors of all I m s m = 1,..., 4. a Since ξ i s are independent N0, 1, we have that all 0 < δ 1, n P ξ 2 i 1 } nδ 2 exp nδ2 2 exp δ nδ2. 36 Hence, it follows that there exists a constants C 1 such that for 0 < δ C 1, P I 1 nδ = P n ξi 2 nδ 1 2 exp C 1 nδ 2. λ 0 b First, the term I 2 can be bounded by I 2 2λ 1/2 n ξ ig ij φ φ. Let Y n1 = n 2 } 1/2, i ξ2 Yn2 = λ n jm ξ } 2 1/2. iξ ijm Then, m k n 2 n ξ i g ij = λ ξ 2 i 2 n 2 + λ jm ξ i ξ ijm = λ Yn1 2 + Yn2, 2 m k 6

7 which implies that n ξ ig ij λ 1/2 Y n1 + Y n2. By the condition λ k β and d λ = Ok, we have that d λ d 0 k and d d 0 k 1+β for some positive constant d 0. By Lemma 8, φ φ d K jj K jj S, where, w.l.o.s., φ sgn φ, φ = 1. As a result, I 2 can be further bounded by can be chosen to satisfy I 2 2d 0 ky n1 K jj K jj S + 2d 0 λ 1/2 ky n2 K jj K jj S. C.2 We first bound Y n1 and Y n2. On one hand, n } P Y n1 2n = P ξi 2 1 n exp n. C.3 36 On the other hand, since ξ ij1, ξ ij2 N0, 1 for each j, k, n E ξ ij1ξ ij2 k neξ 2k 1j1 k!n2 k. As a result, it follows that for all δ > 0 n P ξ ij1 ξ ij2 δ 2 exp δ 2 2 exp δ2 + 2 exp δ. 16n + 4δ 32n 8 Consequently, using integration by parts, there exist two positive constants L 1 and L 2 not depending on n such that E n ξ ij1 ξ ij2 2k k!nl1 k + 2k!L 2k 2, k = 1, 2, 3,.... This further implies E Y n2 EY n2 2k k!2λ0 L 1 n k + 2k!2λ 1/2 0 L 2 2k, k 1. Hence we obtain from Theorem 2.3 of Boucheron et al that for all δ > 0 and n 2L 2 2L 1 1, P Y n2 EY n2 δ exp δ 2 32λ 0 L 1 n + 8λ 1/2 0 L 2 δ. Note that EY n2 λ 0 n 1/2. Hence, for δ 2λ 0 n 1/2 and n 2L 2 2L 1 1, P Y n2 δ P Y n2 EY n2 δ/2 exp δ 2 168λ 0 L 1 n + λ 1/2 0 L 2 δ I2 Now consider P nδ. By C.2, we can bound this term by } C.4 2P K jj K jj S δ + P Y n1 2n + P 8kd 0 Y n2 2nλ 1/2. 7

8 Together with C.3, C.4 and Lemma 6, it follows that there exist two positive constants C k k = 2, 3 free of n and p such that for 0 < δk 1 C 2, I2 P nδ C 3 exp C 2 nk 2 δ 2 + exp C 2 nk β. c In a similar way to I 2, we can show that there exist three positive constants C k k = 4, 5, 6 not depending on n and p such that for 0 < δ C 5, I3 P nδ 2 exp C 4 n + C 6 exp C 5 nk 2+2β δ. d Consider the last term I 4. First, we have ā λ 1/2 ξ + ḡ j φ φ λ 1/2 ξ + d 0 k ḡ j K jj K jj S, where ḡ j 2 = m=1 λ ξ jm 2. Note that the following inequalities hold for all δ > 0: P ξ δ 2 exp C 7 nδ 2 and P ḡ j δ 2 exp C 7 nδ 2. for some positive constant C 7. Hence, together with Lemma 6, we obtain that P ā 2 δ can be bounded by P ξ δ 1/2 λ 1/2 /2 + P ḡ j K jj K jj S d 1 P ξ δ 1/2 λ 1/2 /2 + P ḡ j d 2δ 1/4 /2 } } +P K jj K jj S λ 2 δ 1/4 /2 2 exp C 7 nk β/2 δ + C 9 exp C 8 nk β δ 1/2 for all 0 < δ C 8 with some positive constants C 8 and C 9. δ1/2 /2 Combining a, b, c and d and choosing suitable constants, the inequality 16 follows consequently. For general cases of j, l, k, m with j l or m k, σ jlkm = 1 n n âiâ ilm ā ā lm and σ jlkm = Ea ia ilm. Hence n σ jlkm σ jlkm can be expressed as the sum of the following five terms: n +λ 1/2 lm ai a ilm σ lm + λ 1/2 n ξ i g il, φ lm φ lm n ξ ilm g ij, φ φ + = I 1 + I I 5. n g ij, φ φ g il, φ lm φ lm nā ā lm 8

9 Observe that I 2 O1 k β/2 m 1+β n ξ ig il Kll K ll S, I 3 O1 m β/2 k 1+β n ξ S ilmg ij Kjj K jj, and I 4 O1 km 1+β n g ij g il K S S ll K ll Kjj K jj. Hence the proof techniques for n σ jk σjk can be applied here and as a result, 17 follows. The proof is completed. C.4 Proof of Theorem 2 First we obtain the general error bound for Θ in Section C.4.1. Second in Section C.4.2 we present the general model selection consistency of fglasso in Theorem 4. Finally in Section C.4.3 we prove Theorem 2 based on the results of Lemma 3 and Theorem 4. For convenient presentation, we adopt the definition of tail condition for the random variable given in Ravikumar et al Definition 1 Tail condition The random vector a R Mp satisfies the tail condition if there exists a constant v 0, ] and a function f : N 0, 0,, such that for any i, j 1,..., Mp} 2, let S ij, Σ ij be the i, j-th entry of S, Σ respectively, then P S ij Σ ij δ 1/fn, δ for all δ 0, 1/v ]. C.5 The tail function f is required to be monotonically increasing in δ and n. functions of n and δ are respectively defined as The inverse δ f w; n = argmax δ fn, δ w} and n f δ; w = argmax n fn, δ w}, where w [1,. Then we assume that the Hessian of the negative log determinant satisfies the following general irrepresentable-type assumption. Condition 6 There exists some constant η 0, 1] such that Γ S c S Γ S S 1 M 2 1 η. C.6 9

10 C.4.1 General Error Bound In this section, we present Theorem 3 on the general error bound. We first begin with some notation. Denote by κ Γ = Γ S S 1 M 2,κ B = Θ 1 B M κ 1 Σ, where B,jl = Θ jl for j, l S c and B,jl = 0 for j, l S, and d = max l V : Θ jl F >. j V Theorem 3 Let Θ be the unique solution to the fglasso problem 9 with regularization parameter γ n = 16η 1 M δ f n, Mp τ. Suppose that Conditions 2-4 and 6 hold, the bias term satisfies B M max γ n ηκ 2 Σ /16 and the sample size n satisfies the lower bound κσ κ Γ κ n > n f 1/ 3 Σ max v, 6c η Md max, κ2 Γ c }} η 1 3κ B κ Σ 1 3κ B κ 3 Σ κ Γ c, Mp τ η with c η = η 1, then with probability at least 1 Mp 2 τ, we have i The estimate Θ satisfies the error bound C.7 Θ Θ M max 2c η κ Γ M δ f n, Mp τ ; C.8 ii The estimated edge set Ê is a subset of E. C.4.2 General Model Selection Consistency Theorem 4 Let Θ min = min Θ jl F. Under the same conditions as in Theorem 3, if the j,l E sample size n satisfies the lower bound }} n > n f 1/ max 2κ Γ c η Θ 1 min M, v, 6c η Md max then Ê = E } holds with probability at least 1 Mp 2 τ. κ Σ κ Γ κ 3 Σ, κ2 Γ c η 1 3κ B κ Σ 1 3κ B κ 3 Σ κ Γ c η, Mp τ, C.4.3 Proof of Theorem 2 By 18 in Theorem 1, the sample covariance matrix satisfies the tail condition C.5 with some constants v = C 1 1 and fn, δ = C 1 2 expc 1 n 1 2α1+β δ 2 }. Therefore, the corresponding inverse functions take the following forms δ f n, Mp τ logc 2 Mp = τ } τlogmp + logc2 =, C.9 C 1 n 1 2α1+β C 1 n 1 2α1+β 10

11 τlogmp + n f δ, Mp τ logc2 = C 1 δ 2 } 1 2α1+β} 1. C.10 It follows from Lemma 3 with = C E 2 n α1 2ν β that E = E. Thus we have S = S, d = d, B = B, κ Γ = κ Γ and κ B = κ B. By substituting these terms into Theorem 4, some calculations using C.9 and C.10 lead to the lower bound for the sample size, i.e. n > C 3 M 2 d 2 τlogmp + logc 2 /c 2 1 and n > C 4 M 2 Θ 2 min τlogmp + τlogc 2/c 2 1 and the desired regularization parameter γ n. Under Conditions 2 4, it follows from Lemma 3 that E = E. By satisfying Condition 6 and the lower bound condition, Theorem 4 indicates that E = Ê} holds with probability at least 1 1/c 1 n α p τ 2. Combining these two results completes the proof. C.5 Proof of Theorem 3 We let the sub-differential of j l jl F matrices Z R Mp Mp with M by M blocks defined by evaluated at some Θ involves all symmetric 0 if j = l Z jl = Θ jl Θ jl F if j l and Θ jl 0 Zjl R M M : Z jl F 1 } if j l and Θ jl = 0. C.11 By the Karush-Kuhn-Tucker KKT condition, a necessary and sufficient condition for Θ to maximize 9 is Θ 1 S γ n Ẑ = 0, C.12 where Ẑ belongs to the family of sub-differential of j l Θ jl F defined in C.11. The main idea of the proof is based on constructing the primal-dual witness solution Θ and Z in the following four steps. First, Θ is obtained by the following restricted fglasso problem } min tracesθ log detθ + γ n Θ jl F, C.13 Θ S c =0 j l 11

12 where Θ R Mp Mp is symmetric positive definite. Second, for each j, l S, we choose Z jl from the family of sub-differential of j l Θ jl F Third, for each j, l S c, where Θ jl F, Z jl is replaced by 1 γ n evaluated at Θ jl defined in C.11. S jl + Θ 1, C.14 jl} which satisfies the KKT condition C.12. Finally, we need to verify strict dual feasibility condition, that is, Z jl F < 1 uniformly in j, l S c. The following terms are needed in the proof of Theorem 3. Let W be the noise matrix, and the difference between the primal witness matrix Θ and the truth Θ, W = S Θ 1, = Θ Θ = Θ Θ + Θ Θ = + B, C.15 where Θ,jl = 0 for j, l S c and Θ,jl = Θ jl for j, l S. Hence for each j, l S, c jl F. Note B corresponds to the bias matrix caused by M-dimensional approximation in 5 to a larger dimensional function. The second order remainder for Θ 1 near Θ is given by R = Θ 1 Θ 1 + Θ 1 Θ 1. C.16 To prove Theorem 3, we need use Lemmas 9-15 as stated in Supplementary Material. We organize our proof in the following six steps. Step 1. It follows from the tail condition C.5 and Lemma 14 that with probability at least 1 Mp 2 τ the event W M max M δ } f n, Mp τ holds. We need to verify that the conditions in Lemma 10 hold. Choosing the regularization parameter γ n = 16η 1 M δ f n, Mp τ and applying the inequalities in Lemma 15 together with the bound condition for the bias term, we have W M max W M max+ Θ 1 B Θ 1 M max W M max+ κ 2 Σ B M max ηγ n /16 + ηγ n /16 = ηγ n /8. It remains to prove R M max is also bounded by ηγ n /8 = 2M δ f n, Mp τ. Step 2. Let r = 2κ Γ W M max + γ n 2κ Γ c η M δ f n, Mp τ. By δ f n, Mp τ 1/v and monotonicity of the inverse tail function, for any n satisfying the lower bound condition, 12

13 we have 2κ Γ c η M δ 1 f n, Mp τ 3κB min κ Σ, 3κ Σ d 1 1 min, 3κ Σ d 3κ 3 Σ κ Γ d 1 3κB κ3 Σ κ Γ c } η 3κ 3 Σ κ Γ d c η } κ B d. Then the conditions in Lemma 12 are satisfied, and hence the error bound satisfies M max = Θ Θ M max r. Step 3. The condition M max 1 3κ Σ d 11 and results in step 2, we have κ B d is satisfied by step 2. Thus by Lemma R M max 3 2 κ3 Σ M max d M max + κ B 3κ 3 Σ κ Γ c η d 2κ Γ c ηm δ } f n, Mp τ ηγ n + κ B 8 ηγ n 8, where the last inequality comes from the monotonicity of the tail function, the bound condition for the sample size n, and the fact that 2d κ Γ c η M δ f n, Mp τ 1 3κ B κ3 Σ κ Γ c η 3κ 3 Σ κ Γ c η = 1 3κ 3 Σ κ Γ c η κ B. Step 4. Steps 1 and 3 imply the strict dual feasibility in Lemma 10, and hence Θ = Θ by Lemma 9. Step 5. It follows from the results in steps 2 and 4 that the error bound C.8 holds with probability at least 1 Mp 2 τ. Step 6. For j, l S c, Θ jl F. Step 4 implies Θ S c = Θ S c. In the restricted fglasso problem C.13, we have Θ S c = Θ S c = 0. Therefore, E c Êc and part ii follows by taking the complement. C.6 Proof of Theorem 4 It follows from the proof and results of Theorem 3i that Θ Θ M max r 2c η κ Γ M δ f n, Mp τ and Θ = Θ hold with probability at least 1 Mp 2 τ. The lower bound for the sample size n in C.9 implies Θ min > 2c η κ Γ M δ f n, Mp τ r. By Lemma 13 we have Θ jl 0 for all j, l S, which entails that E Ê. Combining this result with Theorem 3ii yields E = Ê. 13

14 C.7 Proof of Lemma 1 Since both a = a T 1,..., a T p T and φ = φ T 1,..., φ T p T depend on M, we omit the corresponding superscripts to simplify the notation for readability. Let U = V \j, l} and a U, φ U denote p 2M-dimensional vectors excluding the jth and lth subvectors from a and φ, respectively. By definition 6, we have that, for any pair j, l V 2, j l, C M jl s, t = Cov a T j φ j s, a T l φ l t a T k φ k u, k j, l, u T = Cov a T j φ j s, a T l φ l t a k, k j, l = φ j s T Cova j, a l a U φ l t. C.17 The second equality comes from the following argument. For any k U and u T, g M k u = M m=1 a kmφ km u = a T k φ ku. By the orthogonality of φ km, it follows that there exists a one to one correspondence between a k } and g M k in k. u, u T }, which holds uniformly Since C.17 holds for all s, t T 2, we have that, for fixed pair j, l V 2, j l, C M jl s, t = 0 for all s, t T 2 if and only if Cova j, a l a U = 0. Let C jl = Cova j, a l a U for each pair j, l. Then it follows from multivariate normal theory that, for each j, l V 2, j l, C jl = Θ 1 jj Θ jl Θ 1 ll. Since both Θ jj and Θ ll are positive definite, we have C jl = 0 if and only if Θ jl = 0 for each pair j, l V 2, j l. This completes the proof. C.8 Lemma 2 and its Proof Lemma 2 Suppose that Conditions 2 3 hold. Then, for each j, l V 2, Θ jl Ω M O E 2 n α1 2ν β}, F jl,1 C.18 where Ω M jl,1 is the upper left M M submatrix of Ω jl. Proof. First we give some notations. For any p p matrix A = A ij 1i,jp, let tra = i A ii and A F = tra T A } 1/2. For any M1 p M 2 p block matrix A = A ij with A ij R M 1 M 2, 1 i, j p, we define A M 1,M 2 max 14 = max A ij F, and A M 1,M 2 = 1i,jp

15 max p 1ip j=1 A ij F. In a special case when M 1 = M 2 = M, denote A M 1,M 1 max and A M 1,M 1 by A max M and A M, respectively. For any block matrix A = A ij with A ij R M M, 1 i, j p, we define A M tr = max 1i,jp traii tra jj } 1/2. We now prove Lemma 2. For convenience, for j = 1,..., p, denote a ij = b T ij, c T ij T where b ij = a ij1,..., a ijm T and c ij = a ijm+1,..., a ijmn T. Define Σ to be the covariance matrix of b T 11,..., b T 1p, c T 11,..., c T 1p T. Then we can find that there exists a permutation matrix P π such that P π ΣP T π = Σ. Since P 1 π = P T π, Ω = P π Ω 1 P T π, which means that Ω is only a permutation of Θ. Let Σ = Σ 11 Σ 21 Σ12 Σ22 and Ω = Ω 11 Ω 21 Ω12 Ω22, where Ω 11 and Ω 11 are pm pm matrices and Ω 11 and Ω 22 are pm 2 pm 2 matrices with M 2 = M n M. Now we apply Lemma 5 to prove this lemma. By Condition 3, we see that Ω 12 M 1,M 2 O E n αν. Furthermore, since the diagonal entries of Σ 22 are eigenvalues λ s, we have Σ 22 M 2 tr O n α1 β}. Hence, it follows from Lemma 5 that Ω 11 Θ M max O E 2 n α1 2ν β}. As a result, for each pair j, l V 2, Θ jl Ω M F O E 2 n α1 2ν β}. This completes the proof for Lemma 2. jl,1 C.9 Lemma 3 and its Proof In general, for any 0, we define the corresponding truncated edge set E = j, l V 2 : j l, Θ jl F > }. Let S = E 1, 1,, p, p}. Denote S c to be the complement of S in V 2 with Θ jl F for j, l S. c Lemma 3 below ensures the equivalence between the true and truncated edge sets. Lemma 3 Under Conditions 2 4, let = C E 2 n α1 2ν β for some large constant C > 0, we have E = E. Proof. First, Lemma 2 implies that for each j, l V 2, Θ jl Ω M jl,1 F O E 2 n α1 2ν β. Hence, for each pair j, l E, Θ jl F Ω M jl,1 F Θ jl Ω M jl,1 F E 2 n α1 2ν β, and for j, l S c, Θ jl = Θ jl Ω M F O E 2 n α1 2ν β, since min Ω M F jl,1 15 j,l E jl,1 F

16 E 2 n α1 2ν β by Condition 4 and Ω M jl,1 = 0 if j, l S c. This means that for = F C E 2 n α1 2ν β with a large constant C, we obtain Θ jl F if j, l E but Θ jl F if j, l S c. Therefore, E = E as claimed. C.10 Lemma 4 and its Proof Lemma 4 For any A R p q, B R r r, and X R q r, we have traceax T BX X = BXA + B T XA T. C.19 Proof. Since dtraceax T BX = tracedax T BX + traceax T dbx, we have dtraceax T BX = tracedx T BXA + traceax T BdX = tracea T X T B T + AX T BdX. Hence traceaxt BX X = A T X T B T + AX T B T, which completes the proof. C.11 Lemma 5 and its Proof Lemma 5 Suppose that for a positive definite matrix H = is H11 H 12 H 21 H 22 H 11 H 12 H 21 H 22, its inverse H 1, where H 11 and H 11 are pm 1 pm 1 matrices and H 22 and H 22 are pm 2 pm 2 matrices. If H 22 M 2 tr λ and H 12 M 1,M 2 δ, then Proof. For a positive definite matrix H = H11 H 12 H 21 H 22 = H 11 H 1 11 M 1 max δ 2 λ. C.20 H 11 H 12 H T 12 H 22, its inverse H 1 is expressed as H H 1 11 H 12 D 1 H T 12H 1 11 H 1 11 H 12 D 1 D 1 H T 12H 1 11 D 1 with D = H 22 H T 12H 1 11 H 12. Since D is positive definite, D M 2 max D M 2 tr H 22 M 2 tr λ. Since H 12 = H 1 11 H 12 D 1, we have H 1 11 H 12 M 1,M 2 max = H 12 D M 1,M 2 max H 12 M 1,M 2 D M 2 max δλ. 16

17 Hence, The lemma is proved. H 11 H 1 11 M 1 max H 12 M 1,M 2 H 1 11 H 12 M 1,M 2 max δ 2 λ. C.12 Lemma 6 and its Proof Lemma 6 Suppose that Condition 1 holds. Then there exist two positive constants C k k = 1, 2 not depending on n and p such that, for 0 < δ C 1 and each j = 1,..., p, S P Kjj K jj δ C 2 exp C 1 nδ 2, where K jj and K jj are defined in Section 2.3. Proof. Without ambiguity, we drop the index j in the following. For a function Ks, t, define a functional l K φt = 1 0 Ks, tφsds and its norm l K S = k 1 l Kφ k 2 1/2. Then K K S = l K l K S. For j = 1,..., p, let X ij s, t = g ij sg ij t and D j s, t = ḡ j sḡ j t with ḡ j t = n 1 n g ijt. We know that nl K l K = n l X i l K + nl D and hence n K n K S l Xi l K + n l D S. S To prove this lemma, we are going to derive the following tail inequalities: a There exist two constants L 1 and L 2 such that for any δ > 0, n } P l Xi l K nδ S 2 exp b There exist two positive constants L 3 and L 4 such that for δ > 2λ 0 n 1, nδ 2 ; C.21 2L 1 + 2L 2 δ n 2 δ 2 P l D S δ exp. C.22 8L 3 + 8L 4 nδ 17

18 After getting the above two inequalities C.21 and C.22, we have that for all δ > λ 0 /2n, P n K K S nδ can be bounded by } n P l Xi l K nδ + P n l D S nδ 2 2 S nδ 2 n 2 δ 2 2 exp + exp. 8L 1 + 8L 2 δ 32L L 4 nδ Take C 1 = min1, L 1 L 1 2, 16L 1 1, 64L 4 1 } and C 2 = 3 expc 2 3 with C 3 = max2λ 0, L 3 L 1 4 }. As a result, we obtain for any δ with 0 < δ C 1, This lemma follows. P K } K S δ C 2 exp C 1 nδ 2. Now we turn to prove C.21. Note that E l Xi l K = 0 for each i. By Lemma 7, it suffices to show that there exist two positive constants L 1 and L 2 such that Note that l Xi l K 2 S n E l Xi l K k S 1 2 k!nl 1L k 2 2, k = 2,.... C.23 = m,m =1 Im = m. By Jensen s inequality, E l Xi l K k S = E a im a im λ mm 2 where λmm = λ m δ mm and δ mm = m,m =1 m,m =1 2 m=1 λ m λ m λ m λ m } k/2 2 ξ im ξ im δ mm k/2 1 m,m =1 k λ m Eξi1 2k + 1, k λ m λ m E ξ im ξ im δ mm where the inequality Eξi1 2 1 k 2 k Eξi1 2k k 2 is used. Since ξ i1 N0, 1, 2k + 1 E ξ i1 2k = π 1/2 2 k Γ 2 k k!. 2 Let L 2 = 4 m=1 λ m = 4λ 0 < and L 1 = 4L 2 2. Then, for k = 2, 3,..., n E l Xi l K k S L 2/2 k 2 2 k k! 1 2 k!nl 1L k

19 Next we consider to prove the inequality C.22. Suppose that we have shown E l D k S 1 2 n k k!l 3 L k 2 4, k = 2, 3,.... C.24 Then, the following inequality follows from Lemma 7: n 2 δ 2 P l D S E l D S δ exp 2L 3 + 2L 4 nδ for all δ > 0. Note that l D 2 S = n 2 m,m =1 λ mλ m ξ m ξm 2, where ξ m = n 1/2 n ξ im. Hence E l D S n 1 λ 0. As a result, for δ > 2n 1 λ 0, we have that n 2 δ 2 P l D S δ P l D S E l D S δ/2 exp. 8L 3 + 8L 4 nδ Hence, C.22 follows. Now we derive the upper bound of E l D k S for k 2 as in C.24. By Jensen s inequality, E l D k S 1 n k E 1 n k m,m =1 m,m =1 1 n k m=1 λ m λ m ξ im ξ im 2 } k/2 λ m λ m λ m k Eξ 2k i1 k/2 1 m,m =1 2 n λ m λ m E ξ im ξ im k k λ m k!. Let L 4 = 2 m=1 λ m and L 3 = 2L 2 4. Then E l D k S 2 1 n k k!l 3 L k 2 4. Lemma 6 is proved. m=1 C.13 Lemma 7 and its Proof Lemma 7 Let X 1,..., X n } be independent random variables in a separable Hilbert space with norm. If EX i = 0 i = 1,..., n and n E X i k k! 2 nl 1L k 2 2, k = 2, 3,..., for two positive constants L 1 and L 2, then for all δ > 0, n P X i nδ 2 exp nδ 2 2L 1 + 2L 2 δ Proof. This lemma can be derived directly from Theorem of Bosq 2000 and hence its proof is omitted. 19.

20 C.14 Lemma 8 and its Proof Lemma 8 Suppose that Condition 1 holds. Denote φ = sgn φ, φ φ. Then φ φ d Kjj K jj S, where d = 2 2 maxλ jk 1 λ 1, λ λ jk+1 1 } if k 2, and d j1 = 2 2λ j1 λ j2 1. Proof. omitted. This lemma can be found in Lemma 4.3 of Bosq 2000 and hence the proof is C.15 Lemma 9 and its Proof Lemma 9 For any γ n 0, the fglasso problem 9 has a unique solution that satisfies the optimal condition C.12 with Ẑ defined in C.11. Proof. The fglasso problem can be written in the constrained form min tracesθ log detθ}, C.25 j l Θ jl F Cγ n where Θ R Mp Mp is symmetric positive definite. The objective function is strictly convex in view of its Hessian and the constraint on the parameter space, if the minimum is attained then the solution is uniquely determined. We need to show that the minimum is achieved. Note the off block diagonal elements are bounded by satisfying j l Θ jl F Cλ <. By the fact that max A ij maxa ii for a positive definite matrix A, we only need to consider i,j i the possibly unbounded diagonal elements. By Hadamard s inequality for positive definite matrices, we have Mp tracesθ log detθ S ii Θ ii log detθ ii. The diagonal elements of S are positive. The objective function goes to infinity as any sequence Θ k 11,..., Θ k Mp,Mp, k 1, goes to infinity. Thus the minimum is uniquely achieved. 20

21 C.16 Lemma 10 and its Proof } Lemma 10 Suppose that max W M max, R M max Then Z S c constructed in C.14 satisfies Z S c M max < 1. ηγn 8, where W = W+Θ 1 B Θ 1. Proof. The optimal condition C.12 can be replaced by Θ 1 Θ 1 + W R + γ n Z = 0, and can be rewritten as Θ 1 Θ 1 + W R + γ n Z = 0. C.26 Note vecθ 1 Θ 1 = Θ 1 Θ 1 vec. Taking vectorization for C.26, we have Γ S S Γ S cs Γ S S c Γ S c Sc vec,s vec,s c + vecw,s vecw,s c vecr S vecr S c + γ n vec Z S vec Z S c = 0. C.27 We solve for vec,s from the first line and substitute it into the second line. vec Z S c can be represented as Then vec Z S c = 1 γ n Γ S c SΓ S S 1 vecw,s vecr S +Γ S c SΓ S S 1 vec Z S 1 γ n vecw,s c vecr S c. For any vector v = v j with v j R M 2, 1 j p, define v M 2 max = max v j 2 as the j M 2 -group version of l norm. Taking the M 2 -group l norm on both sides, it follows from C.33 and C.34 in Lemma 15 that vec Z S c M 2 max 1 Γ S γ csγ S S 1 M 2 vecw,s M 2 max + vecr S M 2 max n + Γ S csγ S S 1 M 2 vec Z S M 2 max + 1 vecw,s c γ M 2 max + vecr S c M 2 max. n 21

22 Note that vec Z S M 2 max 1 by construction. Applying C.30 in Lemma 15, the bound condition for W M max, R M max and Condition 6 yield Z S c M max 2 η γ n 2 η γ n W M max + R M ηγn 4 max + 1 η + 1 η η η < 1. C.17 Lemma 11 and its Proof Lemma 11 Suppose that M max 1 3κ Σ d κ B d, then J T 3 and 2 R M max 3 2 κ3 Σ M max d M max + κ B, where J = k=0 1k Θ 1 k and R = Θ 1 Θ 1 JΘ 1. Proof. By the fact that has at most d M M blocks whose Frobenius norm is at least for each column block, then M Lemma 15 and the bound condition for M max that Θ 1 M Θ 1 M M d M max. It follows from C.31, C.32 in + Θ 1 B M κ Σ d M max + κ B 1/3. Hence it follows from we have the convergent matrix expansion via Neumann series Θ + 1 = Θ 1 Θ 1 Θ 1 + Θ 1 Θ 1 JΘ 1. By the definitions of R and, we obtain R = Θ 1 Θ 1 JΘ 1. Let e j R Mp M with identity matrix in the jth block and zero matrix elsewhere, and x R Mp M with jth block x j R M M. Define x M max = max x j F and x M 1 = p j=1 x j F. Recall that given an M-block matrix A, we have defined M-block version of matrix -norm as A M = max p i j=1 A ij F. Define the corresponding M-block version of matrix 1-norm j 22

23 by A M 1 = max p A ij F. It follows from the inequalities in Lemma 15 that j R M max = max et i Θ 1 Θ 1 JΘ 1 e j F i,j max e T i Θ 1 M i maxmax Θ 1 JΘ 1 e j M 1 j max e T i Θ 1 M 1 M i maxmax Θ 1 JΘ 1 e j M 1 j = Θ 1 M M max Θ 1 JΘ 1 e j M 1 κ Σ M max Θ 1 J T Θ 1 M κ 2 Σ M max J T M Θ 1 M Note that J = k=0 1k Θ 1 k. It follows from C.32 in Lemma 15 that J T M k=0 Θ 1 M k 1 = 1 Θ 1 M 3 2. Hence it follows from C.28 that we can bound the second order remainder R by R M max 3 2 κ3 Σ M max d M max + κ B. C.18 Lemma 12 and its Proof Lemma 12 Suppose that r = 2κ Γ W M max + γ n min M max = Θ Θ M max r. } 1 1 3κ Σ d, 3κ 3 Σ κ Γ d κ B d. then Proof. Let GΘ S = Θ 1 S + S S + γ n ZS. We define a continuous map F : R M 2 S R M 2 S by F vec S = Γ S S 1 vecgθ S + S + vec S. C.28 Note that F vec S = vec S holds if and only if GΘ S + S = G Θ S = 0 by construction. We need to show that the function F maps the following ball Br onto itself Br = Θ S : Θ S M max r}, C.29 where r = 2κ Γ W M max +γ n. Note F is continuous and Br is convex and compact, then by Brouwer s fixed point theorem, there exists some fixed point S Br, which implies 23

24 that Θ S Θ S M max r. It remains to prove the claim F Br Br. Note that GΘ S + S = [Θ + 1 ] S + S S + γ n ZS = [Θ + 1 Θ 1 ] S + [S Θ 1 ] S + γ n ZS = [R Θ 1 Θ 1 ] S + W,S + γ n ZS. Then C.28 can be substituted by F vec S = Γ S S 1 vecr S }} T 1 Γ S S 1 vecw,s + γ n vec Z S. }} T 2 By the definition of κ Γ and C.33 in Lemma 15, T 2 can be bounded by T 2 M 2 max κ Γ W,S M max + γ n = r/2. With the assumed bound for r, we have M max r 1 3κ Σ d κ B d. Then an application of the bound for R in Lemma 11 yields T 1 M 2 max 3 2 κ Γ κ3 Σ M max d M M max max + κ B 2 r 2, where we have used the assumption M 1 max r 3κ 3 Σ κ Γ d κ B d. Therefore, we obtain which proves the claim. F vec S M 2 max T 1 M 2 max + T 2 M 2 max r, C.19 Lemma 13 and its Proof Lemma 13 Suppose that all conditions in Lemma 12 hold and Θ min = Θ min > 2κ Γ W M max + γ n, then Θ jl 0 for all j, l S. min j,l E Θ jl F satisfies Proof. From Lemma 12, we have Θ jl Θ jl F r for any j, l S. Thus Θ jl 0 for all j, l S follows immediately from the lower bound condition on Θ min. 24

25 C.20 Lemma 14 and its Proof Lemma 14 For any τ > 2 and sample size n such that δ f n, Mp τ 1/v, we have P W M max M δ f n, Mp τ Mp 2 τ. Proof. By the definition of the tail function in C.5, we have P W kl > δ 1 fn,δ, where W R Mp Mp and k, l 1,..., Mp} 2. It then follows from union bound of probability and δ = δ f n, Mp τ that P W M max M δ f n, Mp τ = P max i,j W ij F > Mδ M 2 p 2 fn, δ = Mp2 τ. C.21 Lemma 15 and its proof Lemma 15 Let A = A ij, B = B ij with A ij, B ij R M M, 1 i, j p, u = u j, v = v j with u j, v j R M, 1 j p, and x, y R Mp M with jth block x j, y j R M M, respectively. Then the following norm properties hold: A M max = veca M 2 max, C.30 A + B M AB M A M + B M, C.31 A M B M, C.32 Au M max A M u M max, u + v M max u M max + v M max, C.33 C.34 x T y M F x M max y M 1, C.35 Ax M max A M max x M 1, C.36 A M = A T M 1. C.37 Proof. Here we will only prove one inequality C.32. Other properties can be proved 25

26 using similar techniques, so we skip the details. From definition, we write which completes the proof. AB M = max i max i = max i max i p j=1 p p A ik B kj F k=1 j=1 k=1 p A ik F B kj F p A ik F k=1 p k=1 p B kj F j=1 A ik F max k = A M B M, p B kj F j=1 D Further Discussion D.1 Approximation for Multivaraite Functional Data One referee was concerned that, for multivariate functional data, the truncation approach through performing FPCA separately for each individual curve does not provide the best M- dimensional approximation. We refer to Chiou et al and Happ and Greven 2017 for some recent developments on the Karhunen-Loeve expansion for multivariate functional data with fixed p. However, this multivariate FPCA approach cannot handle high dimensional functional data when p is very large, posing additional challenges to derive the relevant concentration bounds. In contrast, our approach is easy to implement and we are able to derive the relevant concentration bounds. Under certain regularity conditions, we can prove that our truncation approach indeed can control the bias which approaches zero as M. Roughly speaking, suppose that for each j = 1,..., p, g j t = gj M t + ξ j t, t T, with ξ j 0 as M and E gj M t = E ξ j t = 0. It follows from the expansion, where Cov g j s, g k t Cov gj M s, gk Mt = Cov gj M s, ξ j t + Cov ξ j s, gk Mt + Cov ξ j s, ξ k t, and Cauchy-Schwarz inequality that E Cov g s,t T 2 j s, g k t Cov gj M s, gk Mt} 2 dsdt 9 supj g j 2 sup j ξ j 2. In 26

27 other words, if sup j g j 2 C with some positive constant C, the truncated bias can be controlled at the same order as sup j ξ j 2. D.2 Connection between the Fglasso Approach and 24 We discuss the connection between our proposed fglasso approach and the alternative method using the inverse correlation matrix discussed in Section 6. Let S M = D M R M D M, where D M is the diagonal matrix of S M with its j-th block given by D M j R M M, j = 1,..., p. We modify the penalty term in 9 and consider maximizing log detθ M traced M R M D M Θ M γ n D M j Θ M jl D M l F, C.38 over symmetric positive definite matrices Θ M R pm pm. Let Q M = D M Θ M D M, it is clear that the solution to the optimization problem C.38 is equivalent to 24 in Section 6. j l D.3 The Algorithm to Solve 24 Since the fglasso criterion in 9 and 24 discussed in Section 6 take a similar form, we develop Algorithm 6 to solve the optimization problem in 24 following an analogous procedure described in Section 3.1. Let Q j, P j and R j respectively be Mp 1 Mp 1 sub matrices excluding the jth row and column block of Q, P = Q 1 and R, and let q j, p j and r j be Mp 1 M matrices representing the jth column block after excluding the jth row block. Finally, let Q jj, P jj and R jj be the j, jth M M blocks in Q, P and R respectively. Then, for a fixed value of Q j, 24 can be solved by setting Q jj = R 1 jj + qt j Q 1 j q j, C.39 where q j = arg min q j tracer jj q T j Q 1 j q j + 2tracer T j q j + 2γ n } p 1 q jl F, C.40 where q jl represents the lth M M block of q j. The algorithm to solve 24 is summarized in Algorithm 6 below. l=1 27

28 Algorithm 6 The Algorithm to Solve Initialize Q = I Mp and P = I Mp. 2. Repeat until convergence for j = 1,..., p. a Compute Q 1 j P j p j P 1 jj pt j. b Solve for q j in C.40 using Algorithm 3 in Section B.1. c Reconstruct P using P jj = R jj, p j = V j R jj and P j = Q 1 j +V jr jj V T j, where 1 V j = Q j q j. 3. Set j, Ê = l : Q } jl F 0, j, l V 2, j l. References Bosq, D Linear Processes in Function Spaces, Springer, New York. Boucheron, S., Lugosi, G. and Massart, P Concentration Inequalities: A Nonasymptotic Theory of Independence, Oxford University Press. Chiou, J.-M., Chen, Y.-T. and Yang, Y.-F Multivariate functional principal component analysis: a normalization approach. Statistica Sinica., 24, Happ, C. and Greven, S Multivariate functional principal component analysis for data observed on different dimensional domains. Journal of the American Statistical Association, in press. Ravikumar, P., Wainwright, M., Raskutti, G. and Yu, B High-dimensional covariance estimation by minimizing l 1 -penalized log-determinant deivergence. Electronic Journal of Statistics., 5,

Functional Graphical Models

Functional Graphical Models Functional Graphical Models Xinghao Qiao 1, Shaojun Guo 2, and Gareth M. James 3 1 Department of Statistics, London School of Economics, U.K. 2 Institute of Statistics and Big Data, Renmin University of

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

On Expected Gaussian Random Determinants

On Expected Gaussian Random Determinants On Expected Gaussian Random Determinants Moo K. Chung 1 Department of Statistics University of Wisconsin-Madison 1210 West Dayton St. Madison, WI 53706 Abstract The expectation of random determinants whose

More information

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme M. Paul Laiu 1 and (presenter) André L. Tits 2 1 Oak Ridge National Laboratory laiump@ornl.gov

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Supplementary Materials to Convex Banding of the Covariance Matrix

Supplementary Materials to Convex Banding of the Covariance Matrix Supplementary Materials to Convex Banding of the Covariance Matrix Jacob Bien, Florentina Bunea, Luo Xiao May 25, 2015 A.1 Dual problem Define LΣ, A 1,..., A = 1 2 S Σ 2 F + λ W l A l, Σ. Observe that

More information

Submitted to the Brazilian Journal of Probability and Statistics

Submitted to the Brazilian Journal of Probability and Statistics Submitted to the Brazilian Journal of Probability and Statistics Multivariate normal approximation of the maximum likelihood estimator via the delta method Andreas Anastasiou a and Robert E. Gaunt b a

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Lecture: Algorithms for LP, SOCP and SDP

Lecture: Algorithms for LP, SOCP and SDP 1/53 Lecture: Algorithms for LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

ELEMENTARY LINEAR ALGEBRA

ELEMENTARY LINEAR ALGEBRA ELEMENTARY LINEAR ALGEBRA K R MATTHEWS DEPARTMENT OF MATHEMATICS UNIVERSITY OF QUEENSLAND First Printing, 99 Chapter LINEAR EQUATIONS Introduction to linear equations A linear equation in n unknowns x,

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23 rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives, The derivative Let O be an open subset of R n, and F : O R m a continuous function We say F is differentiable at a point x O, with derivative L, if L : R n R m is a linear transformation such that, for

More information

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations

Linear Algebra in Computer Vision. Lecture2: Basic Linear Algebra & Probability. Vector. Vector Operations Linear Algebra in Computer Vision CSED441:Introduction to Computer Vision (2017F Lecture2: Basic Linear Algebra & Probability Bohyung Han CSE, POSTECH bhhan@postech.ac.kr Mathematics in vector space Linear

More information

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS

L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Rend. Sem. Mat. Univ. Pol. Torino Vol. 57, 1999) L. Levaggi A. Tabacco WAVELETS ON THE INTERVAL AND RELATED TOPICS Abstract. We use an abstract framework to obtain a multilevel decomposition of a variety

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Geometry of log-concave Ensembles of random matrices

Geometry of log-concave Ensembles of random matrices Geometry of log-concave Ensembles of random matrices Nicole Tomczak-Jaegermann Joint work with Radosław Adamczak, Rafał Latała, Alexander Litvak, Alain Pajor Cortona, June 2011 Nicole Tomczak-Jaegermann

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS

SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Random matrices: Distribution of the least singular value (via Property Testing)

Random matrices: Distribution of the least singular value (via Property Testing) Random matrices: Distribution of the least singular value (via Property Testing) Van H. Vu Department of Mathematics Rutgers vanvu@math.rutgers.edu (joint work with T. Tao, UCLA) 1 Let ξ be a real or complex-valued

More information

Convex Geometry. Carsten Schütt

Convex Geometry. Carsten Schütt Convex Geometry Carsten Schütt November 25, 2006 2 Contents 0.1 Convex sets... 4 0.2 Separation.... 9 0.3 Extreme points..... 15 0.4 Blaschke selection principle... 18 0.5 Polytopes and polyhedra.... 23

More information

A matrix over a field F is a rectangular array of elements from F. The symbol

A matrix over a field F is a rectangular array of elements from F. The symbol Chapter MATRICES Matrix arithmetic A matrix over a field F is a rectangular array of elements from F The symbol M m n (F ) denotes the collection of all m n matrices over F Matrices will usually be denoted

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

arxiv: v5 [math.na] 16 Nov 2017

arxiv: v5 [math.na] 16 Nov 2017 RANDOM PERTURBATION OF LOW RANK MATRICES: IMPROVING CLASSICAL BOUNDS arxiv:3.657v5 [math.na] 6 Nov 07 SEAN O ROURKE, VAN VU, AND KE WANG Abstract. Matrix perturbation inequalities, such as Weyl s theorem

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Nonlinear function on function additive model with multiple predictor curves. Xin Qi and Ruiyan Luo. Georgia State University

Nonlinear function on function additive model with multiple predictor curves. Xin Qi and Ruiyan Luo. Georgia State University Statistica Sinica: Supplement Nonlinear function on function additive model with multiple predictor curves Xin Qi and Ruiyan Luo Georgia State University Supplementary Material This supplementary material

More information

Schwarz Preconditioner for the Stochastic Finite Element Method

Schwarz Preconditioner for the Stochastic Finite Element Method Schwarz Preconditioner for the Stochastic Finite Element Method Waad Subber 1 and Sébastien Loisel 2 Preprint submitted to DD22 conference 1 Introduction The intrusive polynomial chaos approach for uncertainty

More information

Stein s Method and Characteristic Functions

Stein s Method and Characteristic Functions Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Optimal series representations of continuous Gaussian random fields

Optimal series representations of continuous Gaussian random fields Optimal series representations of continuous Gaussian random fields Antoine AYACHE Université Lille 1 - Laboratoire Paul Painlevé A. Ayache (Lille 1) Optimality of continuous Gaussian series 04/25/2012

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016 Joint work

More information

Vector fields Lecture 2

Vector fields Lecture 2 Vector fields Lecture 2 Let U be an open subset of R n and v a vector field on U. We ll say that v is complete if, for every p U, there exists an integral curve, γ : R U with γ(0) = p, i.e., for every

More information

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 11 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,, a n, b are given real

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric

More information

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5.

VISCOSITY SOLUTIONS. We follow Han and Lin, Elliptic Partial Differential Equations, 5. VISCOSITY SOLUTIONS PETER HINTZ We follow Han and Lin, Elliptic Partial Differential Equations, 5. 1. Motivation Throughout, we will assume that Ω R n is a bounded and connected domain and that a ij C(Ω)

More information

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS

CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Statistica Sinica 23 (2013), 1117-1130 doi:http://dx.doi.org/10.5705/ss.2012.037 CONSTRUCTION OF SLICED ORTHOGONAL LATIN HYPERCUBE DESIGNS Jian-Feng Yang, C. Devon Lin, Peter Z. G. Qian and Dennis K. J.

More information

Four new upper bounds for the stability number of a graph

Four new upper bounds for the stability number of a graph Four new upper bounds for the stability number of a graph Miklós Ujvári Abstract. In 1979, L. Lovász defined the theta number, a spectral/semidefinite upper bound on the stability number of a graph, which

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

arxiv: v1 [stat.ml] 5 Jan 2015

arxiv: v1 [stat.ml] 5 Jan 2015 To appear on the Annals of Statistics SUPPLEMENTARY MATERIALS FOR INNOVATED INTERACTION SCREENING FOR HIGH-DIMENSIONAL NONLINEAR CLASSIFICATION arxiv:1501.0109v1 [stat.ml] 5 Jan 015 By Yingying Fan, Yinfei

More information

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications

Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Uniqueness of Generalized Equilibrium for Box Constrained Problems and Applications Alp Simsek Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Asuman E.

More information

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then

Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then Lecture 2 - Unconstrained Optimization Definition[Global Minimum and Maximum]Let f : S R be defined on a set S R n. Then 1. x S is a global minimum point of f over S if f (x) f (x ) for any x S. 2. x S

More information

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018 MATH 57: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 18 1 Global and Local Optima Let a function f : S R be defined on a set S R n Definition 1 (minimizers and maximizers) (i) x S

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

19.1 Problem setup: Sparse linear regression

19.1 Problem setup: Sparse linear regression ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 19: Minimax rates for sparse linear regression Lecturer: Yihong Wu Scribe: Subhadeep Paul, April 13/14, 2016 In

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Copositive Plus Matrices

Copositive Plus Matrices Copositive Plus Matrices Willemieke van Vliet Master Thesis in Applied Mathematics October 2011 Copositive Plus Matrices Summary In this report we discuss the set of copositive plus matrices and their

More information

Elementary linear algebra

Elementary linear algebra Chapter 1 Elementary linear algebra 1.1 Vector spaces Vector spaces owe their importance to the fact that so many models arising in the solutions of specific problems turn out to be vector spaces. The

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

1 Quantum states and von Neumann entropy

1 Quantum states and von Neumann entropy Lecture 9: Quantum entropy maximization CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: February 15, 2016 1 Quantum states and von Neumann entropy Recall that S sym n n

More information

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema Chapter 7 Extremal Problems No matter in theoretical context or in applications many problems can be formulated as problems of finding the maximum or minimum of a function. Whenever this is the case, advanced

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

RESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices

RESEARCH ARTICLE. An extension of the polytope of doubly stochastic matrices Linear and Multilinear Algebra Vol. 00, No. 00, Month 200x, 1 15 RESEARCH ARTICLE An extension of the polytope of doubly stochastic matrices Richard A. Brualdi a and Geir Dahl b a Department of Mathematics,

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Vector spaces. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Vector space Consists of: A set V A scalar

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

A Vector-Space Approach for Stochastic Finite Element Analysis

A Vector-Space Approach for Stochastic Finite Element Analysis A Vector-Space Approach for Stochastic Finite Element Analysis S Adhikari 1 1 Swansea University, UK CST2010: Valencia, Spain Adhikari (Swansea) Vector-Space Approach for SFEM 14-17 September, 2010 1 /

More information

Sparse Hanson-Wright inequalities for subgaussian quadratic forms

Sparse Hanson-Wright inequalities for subgaussian quadratic forms Sparse Hanson-Wright inequalities for subgaussian quadratic forms Shuheng Zhou Department of Statistics, University of Michigan, Ann Arbor, MI TR539, October 05 This Version: February 8, 07 Abstract In

More information

arxiv: v2 [math.pr] 16 Aug 2014

arxiv: v2 [math.pr] 16 Aug 2014 RANDOM WEIGHTED PROJECTIONS, RANDOM QUADRATIC FORMS AND RANDOM EIGENVECTORS VAN VU DEPARTMENT OF MATHEMATICS, YALE UNIVERSITY arxiv:306.3099v2 [math.pr] 6 Aug 204 KE WANG INSTITUTE FOR MATHEMATICS AND

More information

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES IJMMS 25:6 2001) 397 409 PII. S0161171201002290 http://ijmms.hindawi.com Hindawi Publishing Corp. A PROJECTED HESSIAN GAUSS-NEWTON ALGORITHM FOR SOLVING SYSTEMS OF NONLINEAR EQUATIONS AND INEQUALITIES

More information

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i ) Direct Methods for Linear Systems Chapter Direct Methods for Solving Linear Systems Per-Olof Persson persson@berkeleyedu Department of Mathematics University of California, Berkeley Math 18A Numerical

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

CSCI : Optimization and Control of Networks. Review on Convex Optimization

CSCI : Optimization and Control of Networks. Review on Convex Optimization CSCI7000-016: Optimization and Control of Networks Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one

More information

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming CSC2411 - Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming Notes taken by Mike Jamieson March 28, 2005 Summary: In this lecture, we introduce semidefinite programming

More information

Chapter 2. Optimization. Gradients, convexity, and ALS

Chapter 2. Optimization. Gradients, convexity, and ALS Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Kaczmarz algorithm in Hilbert space

Kaczmarz algorithm in Hilbert space STUDIA MATHEMATICA 169 (2) (2005) Kaczmarz algorithm in Hilbert space by Rainis Haller (Tartu) and Ryszard Szwarc (Wrocław) Abstract The aim of the Kaczmarz algorithm is to reconstruct an element in a

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

(K + L)(c x) = K(c x) + L(c x) (def of K + L) = K( x) + K( y) + L( x) + L( y) (K, L are linear) = (K L)( x) + (K L)( y).

(K + L)(c x) = K(c x) + L(c x) (def of K + L) = K( x) + K( y) + L( x) + L( y) (K, L are linear) = (K L)( x) + (K L)( y). Exercise 71 We have L( x) = x 1 L( v 1 ) + x 2 L( v 2 ) + + x n L( v n ) n = x i (a 1i w 1 + a 2i w 2 + + a mi w m ) i=1 ( n ) ( n ) ( n ) = x i a 1i w 1 + x i a 2i w 2 + + x i a mi w m i=1 Therefore y

More information

Grouped Network Vector Autoregression

Grouped Network Vector Autoregression Statistica Sinica: Supplement Grouped Networ Vector Autoregression Xuening Zhu 1 and Rui Pan 2 1 Fudan University, 2 Central University of Finance and Economics Supplementary Material We present here the

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout 1 Luca Trevisan October 3, 017 Scribed by Maxim Rabinovich Lecture 1 In which we begin to prove that the SDP relaxation exactly recovers communities

More information

Elliptically Contoured Distributions

Elliptically Contoured Distributions Elliptically Contoured Distributions Recall: if X N p µ, Σ), then { 1 f X x) = exp 1 } det πσ x µ) Σ 1 x µ) So f X x) depends on x only through x µ) Σ 1 x µ), and is therefore constant on the ellipsoidal

More information

Part II Probability and Measure

Part II Probability and Measure Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Stable Process. 2. Multivariate Stable Distributions. July, 2006 Stable Process 2. Multivariate Stable Distributions July, 2006 1. Stable random vectors. 2. Characteristic functions. 3. Strictly stable and symmetric stable random vectors. 4. Sub-Gaussian random vectors.

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

ACI-matrices all of whose completions have the same rank

ACI-matrices all of whose completions have the same rank ACI-matrices all of whose completions have the same rank Zejun Huang, Xingzhi Zhan Department of Mathematics East China Normal University Shanghai 200241, China Abstract We characterize the ACI-matrices

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Overlapping Variable Clustering with Statistical Guarantees and LOVE

Overlapping Variable Clustering with Statistical Guarantees and LOVE with Statistical Guarantees and LOVE Department of Statistical Science Cornell University WHOA-PSI, St. Louis, August 2017 Joint work with Mike Bing, Yang Ning and Marten Wegkamp Cornell University, Department

More information

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A). Math 344 Lecture #19 3.5 Normed Linear Spaces Definition 3.5.1. A seminorm on a vector space V over F is a map : V R that for all x, y V and for all α F satisfies (i) x 0 (positivity), (ii) αx = α x (scale

More information