Submitted to the Aals of Statistics arxiv: arxiv:0000.0000 SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS By T. Toy Cai, Tegyua Liag ad Alexader Rakhli The Wharto School at Uiversity of Pesylvaia APPENDIX A: ADDITIONAL PROOFS Proof of Lemma. The roof uses cocetratio of Lischitz fuctios o Gaussia sace, which is illustrated i the followig lemma take from equatio 1.6 i [3]. Lemma A.1 Gaussia cocetratio iequality for Lischitz fuctios. Let g R be a Gaussia vector with i.i.d mea zero ad variace oe elemets ad let F : R R be a Lischitz fuctio with Lischitz costat L i.e. F x F y L x y for ay x, y R, with Euclidea metric o R. The for ay λ > 0, P F g E g F g λ ex λ L. We would like to uer boud X Z A with high robability, where Z N0, σ I. We have X Z A = su X Z, v = su Z, X v. Fixig X, we ca thik of su, X v : R R as a fuctio o the Gaussia sace g N0, I satisfyig the Lischitz costat K A X := su X v l su g 1, X v su g, X v KX A g 1 g l. I fact, first fixig a u 1 = arg su g 1, X v, the su g 1, X v su g, X v g 1 g, X u 1 X u 1 l g 1 g l. The other side uses the same trick, fixig u = arg su g, X v su g 1, X v su g, X v g 1 g, X u X u l g 1 g l. 1
CAI, LIANG & RAKHLIN Thus we roved the Lischitz costat is uer bouded by KX A. Now we ca aly the cocetratio of Lischitz fuctio o Gaussia sace ad get A.1 P X Z A E X Z A + λ ex λ σ KX A. Thus we have with robability at least 1 ex δ /, X Z σ { ] } A. E g [su g, X v + δ su X v l. Proof of Lemma 3. The roof uses Gordo s method []. The lower boud side art of this lemma is a modified versio of the key lemma i [1]. First let s itroduce a imortat lemma i Gordo s aalysis. Lemma A.. Corollary 1. i [] Let Ω be a closed subset of S 1. Let Φ : R R be a radom ma with i.i.d. zero-mea Gaussia etries havig variace oe. The λ wω E [ mi z Ω Φz l ] [ ] E max Φz l λ + wω z Ω where λ = Γ +1 /Γ satisfies / + 1 < λ <. Use the same ste as i Lemma : for ay closed subset Ω S 1, the fuctios Φ mi z Ω Φz l ad Φ max z Ω Φz l both are Lischitz mas o Gaussia sace Φ with Lichitz costat 1: mi z Ω Φ 1z l mi z Ω Φ z l Φ 1 Φ F, Thus usig the Lichitz cocetratio i Gaussia sace, we have P mi X z l E[mi X z l ] t ex t / z Ω z Ω P max X z l E[max X z l ] + t ex t / z Ω z Ω max z Ω Φ 1z l max z Ω Φ z l Φ 1 Φ F. where X is a Gaussia esemble desig. Ad we have P mi X z l 1 c ex λ wω 1 c / z Ω P max X z l 1 + c ex 1 + c λ wω /. z Ω
GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 3 Thus uder the coditio we have 4[wΩ + δ] c A.3 1 + c λ wω 1 + c wω [wω + δ] wω δ ad λ wω 1 c 1 c wω δ + δ + 1 1 + c wω + δ + δ + 1 + 1 + 1 c + + δ δ. Thus λ wω 1 c δ > 0, I fact, we roved a stroger result P mi X z l 1 c 1 ex δ /, z Ω 1 c 1 + c λ wω δ > 0. P max X z l 1 + c 1 ex δ /. z Ω Now aly our lemma to local taget coe T A M, observe wb T AM = ws 1 T A M. Now the lemma holds by luggig i the taget coe. Proof of Lemma 4. The roof requires a observatio The A.4 A.5 A.6 wa = E g su g, v = E g g A γ A MwA = E g g A = E g [ E g su h T A M su h T A M h A = E g h l ] g A h A h l defiitio of dual orm. [ g A g, h su = wb h T A M h T AM. l su h T A M The last ste requires the Cauchy Schwartz relatioshi.3. h A h l ]
4 CAI, LIANG & RAKHLIN Proof of Theorem 6 with Sudakov Etroy Estimate. The key techical tool i rovig Theorem 6 is the followig well-kow Fao s iformatio Lemma A.3. This versio is from [4], similar versios are rovided i [6, 7, 5], ad the basic ideas are essetially the same. Lemma A.3 Fao s Lemma. Let Θ, d, be a seudo metric sace ad {P θ : θ Θ} be a collectio of robability measures. Let r be a iteger ad let S T Θ. Deote by MS, ɛ, d the ɛ ackig set as well as the ackig umber of T with resect to metric d, i.e. if dθ, θ,θ MS,ɛ,d θ ɛ. Suose β := su θ,θ MS,ɛ,d D KL P θ P θ > 0. The if ˆθ su E θ d ˆθ, θ θ T ɛ su S T,ɛ>0 4 1 β + log. log MS, ɛ, d We use Sudakov estimate for the lower boud. Recall the model Y = X M + Z, where Z N0, σ I. Without loss of geerality, we ca assume σ = 1. The Kullback-Leiber divergece betwee stadardized liear iverse models with differet arameters uder the Gaussia oise is D KL M M = X M X M l. Recall Sudakov Mioratio i Lemma 1, ad deote the critical radius ɛb T := arg max ɛ ɛ log N B T, ɛ Cosider the coe itersected with l ball with radius δ, Kδ := B δ T R, where δ will be secified later. As before, defie ψ = su v B T X v l su D KL M M X M l + X M l M,M Kδ δ ψ. The ackig umber is lower bouded by the coverig umber as the last equality holds because we ca scale both the set ad coverig ball by δ MKδ, ɛ N Kδ, ɛ = N K1, ɛ δ Alyig the Fao s lemma, we have if su E X M l ɛ su δ>0,0<ɛ<δ 4 1 δ ψ + log log N K1, ɛ δ.
GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 5 Because K1 = B T, set δ = 1 log N B ψ T, ɛb T, ɛ = δ ɛb T The we have if su E X M l c 0 ψ with some uiversal costat c 0. Thus if ɛb T log N B T, ɛb T. su E X M l c 0σ ψ eb T. Proof of Theorem 6 with Volume Ratio. For the lower boud usig volume ratio. Recall the stadardized liear iverse model Y = X M + Z, where Z N0, σ I. Without loss of geerality, we ca assume σ = 1. The Kullback-Leiber divergece betwee stadardized liear iverse models with differet arameters uder the Gaussia oise is D KL M M = X M X M l. Cosider the itersectio of a coe T with l ball of radius δ, Kδ := B δ T R, where δ will be secified later. Defiig ψ = su v B T X v l, su D KL M M X M l + X M l M,M Kδ δ ψ. We have the ackig umber lower bouded by coverig umber as follows: MKδ, ɛ N Kδ, ɛ volkδ δ volb ɛ = volb A.7 T ɛ volb. Alyig Fao s iequality of Lemma A.3, we have if su E X M l ɛ su δ>0,0<ɛ<δ 4 1 δ ψ + log [ log δ ɛ volb T 1 volb ].
6 CAI, LIANG & RAKHLIN If for a > 0, 0 < b < 1 we choose the we have if su E X M l δ = 1 a volb ψ, ɛ = δ b volb ψ T volb T volb 1 su a>0,0<b<1 As show i [4, equatio 9], there is a uiversal costat c 0 > 0 such that ab 4 1 a + log log 1. b A.8 Thus if if su E X M l c 0 volb ψ T volb su E X M l c 0σ ψ vb T.. APPENDIX B: PROOF OF COROLLARIES Proof of Corollary 1. Defie Φ = X Σ 1, the Φ satisfies Gaussia esemble desig. Recall the Proof of Theorem 1, the oly lace we require the covariace of the desig matrix to be orthogoal is rovig local isometry costat LIC is bouded o the local taget coe. Via Lemma 3, max X z = max X z T A M z T A M Σ 1 1 Σ z = max Φz z Σ 1 T A M where Σ 1 T A M deotes the image of T A M uder liear trasform Σ 1. Here exact same calculatio holds for mi as max. Thus as log as wb Σ 1 TA M Theorem 1 still holds for. Similar to Theorem, let s first rove with the choice of η, rogram 3.6 is feasible o-emty for Ω with high robability. I articular, if we lug i Ω = Σ 1, ote X X Ω i Σ 1 ei A = su Φ Φe i e i, Σ 1 w.h. v wφσ 1 A wx A =
GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 7 where the last iequality followigs from 6.3 i roof of Theorem. Hece followig decomositio holds for the de-biased estimator M M M = Σ 1 + σ Σ 1 ΩX W where W N0, I is the stadard Gaussia vector, ad = ΩX X Σ 1 M has l cotrol Note we have l max X X Ω i Σ 1 ei A M A i γam λη σ γ A Mw X A. Σ 1 v, M M = v, + σ v, ΩX W ad v, v l1 l ρσ γ A Mw X A ΩX W N0, ΩX X Ω. The rest of the roof follows exactly as i Theorem. 0, I the followig, we deote as the solutio to the rogram.1 ad the estimatio error to be H = M. We refer exlicit calculatios of Gaussia width for various local taget coe to Sectio 3.4 roositios 3.10-3.14 i [1] for simlicity of our aer. Proof of Corollary. Let s calculate the rate for sarse vector recovery. We will treat the geometric terms γ A M, φ A M, X, λ A X, σ, searately. For γ A M: We kow that H lives i the taget coe T A M. Decomose H = H 0 + H c accordig to the suort of M, where H 0 l0 = s ad share the same suort as M. We have M l1 + H c l1 H 0 l1 = M +H c l1 H 0 l1 M +H 0 +H c l1 M l1 which meas H l1 H 0 l1, H 0 l H l. Thus we have the followig relatios H l1 H 0 l1 s H 0 l s H l
8 CAI, LIANG & RAKHLIN Therefore, H l 1 H l s ad thus γ A M s. As for φ A M, X : By the taget coe calculatio, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c s log s The last boud is from Gaussia width uer boud for local taget coe for s sarse vector. Lastly, for λ A X, σ, : We kow the oerator X is orm reservig i the sese that su X v l 1 + c. ad wx A is the Gaussia width of discrete oits o Euclidea ball, which is at most log due to the log behavior of maximum of Gaussia variables. Thus we ca rove λ σ with some roer costat is eough with high robability. The corollary the follows from Theorem 1. Proof of Corollary 3. Let s calculate the rate for low rak matrix recovery. We will boud the geometric terms γ A M, φ A M, X, λ A X, σ, searately. For γ A M: Note H lives i the taget coe T A M. We ca write H = H 0 + H c accordig to the sa of M that is, M = UDV T, H 0 is saed by either U as the row sace or V as the colum sace, ad H c is saed by U as the row sace ad V as the colum sace with the followig roerties M + H c H 0 = M + H c H 0 M + H M. Thus we have rakh 0 r ad H H 0, H 0 F H F. Thus we have the followig relatios H H 0 r H 0 F r H F We the have H H F r ad thus γ A M r. As for φ A M, X : By the taget coe calculatio for rak-r matrix, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c r + q r At last, for λ A X, σ, : Rak oe matrix maifold is a subsace with dimesio + q. Thus wx A ca be bouded by + q because the
GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 9 Gaussia width of the + q-dimesioal subsace is + q ad the liear trasformatio caot elarge the dimesio. The rak oe matrices are of uit Frobeius orm ad the X is orm reservig i the sese that +q su X v l 1 + c. Puttig together, we ca rove λ σ with some roer costat is eough with high robability. The corollary follows by uttig together the geometric terms ad alyig the Theorem 1. Proof of Corollary 4. As usual, we will boud the geometric terms oe at a time. For γ A M, it is clear that H l / H l 1 ad so γ A M 1. As for φ A M, X, by the taget coe calculatio for sig vector, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c. Fially, for λ, µ: wx A is the Gaussia width of l ball, which is of order ad wx B is the Gaussia width of dimesio Euclidea ball, which is. Thus we ca rove λ σ ad µ σ with some roer costat i frot of the order is eough with high robability. We kow from Theorem 1, Thus φ AM, X M l X M l µ M l, M l γ A M M l. M l, M l, X M l C σ. Proof of Corollary 5. We boud searately the three geometric terms γ A M, φ A M, X, ad λ A X, σ,. For γ A M, it is clear that H F / H 1 ad thus γ A M 1. As for φ A M, X, by the taget coe calculatio for orthogoal matrix, it is easy to show that φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c mm 1.
10 CAI, LIANG & RAKHLIN At last, for λ, µ: Orthogoal matrix maifold is a subsace with dimesio mm 1, thus wx A is uer bouded by mm 1 m because the Gaussia width of the mm 1 -dimesioal maifold itersects Euclidea ball is mm 1 ad liear trasformatio caot elarge the dimesio. wx B ca be bouded by m m. Thus we ca show λ σ 3 ad µ σ m with some roer costat i frot of the order is eough with high robability. Recall Theorem 1 Hece φ AM, X M l X M l µ M l, M γ A M M l. M F, M, X m M l C σ. REFERENCES [1] Chadrasekara, V., Recht, B., Parrilo, P. A., ad Willsky, A. S. 01. The covex geometry of liear iverse roblems. Foudatios of Comutatioal Mathematics, 16:805 849. [] Gordo, Y. 1988. O Milma s iequality ad radom subsaces which escae through a mesh i R. Sriger. [3] Ledoux, M. ad Talagrad, M. 1991. Probability i Baach Saces: isoerimetry ad rocesses, volume 3. Sriger. [4] Ma, Z. ad Wu, Y. 013. Volume ratio, sarsity, ad miimaxity uder uitarily ivariat orms. arxiv rerit arxiv:1306.3609. [5] Tsybakov, A. B. 009. Itroductio to oarametric estimatio, volume 11. Sriger. [6] Yag, Y. ad Barro, A. 1999. Iformatio-theoretic determiatio of miimax rates of covergece. The Aals of Statistics, 7:1564 1599. [7] Yu, B. 1997. Assouad, fao, ad le cam. I Festschrift for Lucie Le Cam, ages 43 435. Sriger. Deartmet of Statistics The Wharto School Uiversity of Pesylvaia Philadelhia, PA 19104 USA E-mail: tcai@wharto.ue.edu E-mail: tegyua@wharto.ue.edu E-mail: rakhli@wharto.ue.edu