SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS

Similar documents
ECE534, Spring 2018: Solutions for Problem Set #2

13.1 Shannon lower bound

A Central Limit Theorem for Belief Functions

ECE534, Spring 2018: Final Exam

arxiv: v1 [math.pr] 4 Dec 2013

A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS. S. S. Dragomir 1. INTRODUCTION

On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities

The random version of Dvoretzky s theorem in l n

Unit 5. Hypersurfaces

Lecture 13: Maximum Likelihood Estimation

arxiv: v3 [math.st] 16 Jun 2015

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

5.1 A mutual information bound based on metric entropy

SYMMETRIC POSITIVE SEMI-DEFINITE SOLUTIONS OF AX = B AND XC = D

ON SOME NEW SEQUENCE SPACES OF NON-ABSOLUTE TYPE RELATED TO THE SPACES l p AND l I. M. Mursaleen and Abdullah K. Noman

DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL SELECTION VIA CONVEX OPTIMIZATION. By Zhao Ren and Harrison H. Zhou Yale University

6. Kalman filter implementation for linear algebraic equations. Karhunen-Loeve decomposition

Final Solutions. 1. (25pts) Define the following terms. Be as precise as you can.

5.1 Review of Singular Value Decomposition (SVD)

Equations and Inequalities Involving v p (n!)

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Maximum Likelihood Estimation and Complexity Regularization

A Note on Sums of Independent Random Variables

New Definition of Density on Knapsack Cryptosystems

Functional Analysis: Assignment Set # 10 Spring Professor: Fengbo Hang April 22, 2009

Estimation Theory Chapter 3

Lecture 12: February 28

A survey on penalized empirical risk minimization Sara A. van de Geer

PAPER : IIT-JAM 2010

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Non-Archimedian Fields. Topological Properties of Z p, Q p (p-adics Numbers)

An operator equality involving a continuous field of operators and its norm inequalities

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Optimally Sparse SVMs

Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Adaptive estimation of linear functionals under different performance measures

Selections of set-valued functions satisfying the general linear inclusion

Lecture 15: Learning Theory: Concentration Inequalities

Signal Processing. Lecture 02: Discrete Time Signals and Systems. Ahmet Taha Koru, Ph. D. Yildiz Technical University.

A remark on p-summing norms of operators

Dimension of a Maximum Volume

arxiv: v1 [math.pr] 13 Oct 2011

John H. J. Einmahl Tilburg University, NL. Juan Juan Cai Tilburg University, NL

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Concentration on the l n p ball

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

Hybridized Heredity In Support Vector Machine

MATH 112: HOMEWORK 6 SOLUTIONS. Problem 1: Rudin, Chapter 3, Problem s k < s k < 2 + s k+1

Boundaries and the James theorem

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Inverse Matrix. A meaning that matrix B is an inverse of matrix A.

arxiv: v1 [math.dg] 27 Jul 2012

Estimation of the Mean and the ACVF

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

ABOUT CHAOS AND SENSITIVITY IN TOPOLOGICAL DYNAMICS

Songklanakarin Journal of Science and Technology SJST R1 Teerapabolarn

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Weak and Strong Convergence Theorems of New Iterations with Errors for Nonexpansive Nonself-Mappings

Notes for Lecture 11

On Cesáro means for Fox-Wright functions

Minimal surface area position of a convex body is not always an M-position

Lecture 01: the Central Limit Theorem. 1 Central Limit Theorem for i.i.d. random variables

Algebra of Least Squares

The value of Banach limits on a certain sequence of all rational numbers in the interval (0,1) Bao Qi Feng

arxiv: v1 [math.st] 15 Jan 2014

Review Problems 1. ICME and MS&E Refresher Course September 19, 2011 B = C = AB = A = A 2 = A 3... C 2 = C 3 = =

32 estimating the cumulative distribution function

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

The Asymptotic Expansions of Certain Sums Involving Inverse of Binomial Coefficient 1

2 Banach spaces and Hilbert spaces

1+x 1 + α+x. x = 2(α x2 ) 1+x

SOME SEQUENCE SPACES DEFINED BY ORLICZ FUNCTIONS

Nonlinear Gronwall Bellman Type Inequalities and Their Applications

Self-normalized deviation inequalities with application to t-statistic

Lecture Chapter 6: Convergence of Random Sequences

Confidence Intervals

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Positive Schatten-Herz class Toeplitz operators on the ball

On equivalent strictly G-convex renormings of Banach spaces

Apply change-of-basis formula to rewrite x as a linear combination of eigenvectors v j.

Lecture 3 : Random variables and their distributions

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Lecture 24: Variable selection in linear models

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Supplemental Material: Proofs

A Note on the Symmetric Powers of the Standard Representation of S n

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

5 Birkhoff s Ergodic Theorem

Lecture 12: November 13, 2018

18.S096: Homework Problem Set 1 (revised)

3.1. Introduction Assumptions.

Solutions to Problem Set 7

Estimation with Overidentifying Inequality Moment Conditions Technical Appendix

Learning Theory: Lecture Notes

The Growth of Functions. Theoretical Supplement

Transcription:

Submitted to the Aals of Statistics arxiv: arxiv:0000.0000 SUPPLEMENT TO GEOMETRIC INFERENCE FOR GENERAL HIGH-DIMENSIONAL LINEAR INVERSE PROBLEMS By T. Toy Cai, Tegyua Liag ad Alexader Rakhli The Wharto School at Uiversity of Pesylvaia APPENDIX A: ADDITIONAL PROOFS Proof of Lemma. The roof uses cocetratio of Lischitz fuctios o Gaussia sace, which is illustrated i the followig lemma take from equatio 1.6 i [3]. Lemma A.1 Gaussia cocetratio iequality for Lischitz fuctios. Let g R be a Gaussia vector with i.i.d mea zero ad variace oe elemets ad let F : R R be a Lischitz fuctio with Lischitz costat L i.e. F x F y L x y for ay x, y R, with Euclidea metric o R. The for ay λ > 0, P F g E g F g λ ex λ L. We would like to uer boud X Z A with high robability, where Z N0, σ I. We have X Z A = su X Z, v = su Z, X v. Fixig X, we ca thik of su, X v : R R as a fuctio o the Gaussia sace g N0, I satisfyig the Lischitz costat K A X := su X v l su g 1, X v su g, X v KX A g 1 g l. I fact, first fixig a u 1 = arg su g 1, X v, the su g 1, X v su g, X v g 1 g, X u 1 X u 1 l g 1 g l. The other side uses the same trick, fixig u = arg su g, X v su g 1, X v su g, X v g 1 g, X u X u l g 1 g l. 1

CAI, LIANG & RAKHLIN Thus we roved the Lischitz costat is uer bouded by KX A. Now we ca aly the cocetratio of Lischitz fuctio o Gaussia sace ad get A.1 P X Z A E X Z A + λ ex λ σ KX A. Thus we have with robability at least 1 ex δ /, X Z σ { ] } A. E g [su g, X v + δ su X v l. Proof of Lemma 3. The roof uses Gordo s method []. The lower boud side art of this lemma is a modified versio of the key lemma i [1]. First let s itroduce a imortat lemma i Gordo s aalysis. Lemma A.. Corollary 1. i [] Let Ω be a closed subset of S 1. Let Φ : R R be a radom ma with i.i.d. zero-mea Gaussia etries havig variace oe. The λ wω E [ mi z Ω Φz l ] [ ] E max Φz l λ + wω z Ω where λ = Γ +1 /Γ satisfies / + 1 < λ <. Use the same ste as i Lemma : for ay closed subset Ω S 1, the fuctios Φ mi z Ω Φz l ad Φ max z Ω Φz l both are Lischitz mas o Gaussia sace Φ with Lichitz costat 1: mi z Ω Φ 1z l mi z Ω Φ z l Φ 1 Φ F, Thus usig the Lichitz cocetratio i Gaussia sace, we have P mi X z l E[mi X z l ] t ex t / z Ω z Ω P max X z l E[max X z l ] + t ex t / z Ω z Ω max z Ω Φ 1z l max z Ω Φ z l Φ 1 Φ F. where X is a Gaussia esemble desig. Ad we have P mi X z l 1 c ex λ wω 1 c / z Ω P max X z l 1 + c ex 1 + c λ wω /. z Ω

GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 3 Thus uder the coditio we have 4[wΩ + δ] c A.3 1 + c λ wω 1 + c wω [wω + δ] wω δ ad λ wω 1 c 1 c wω δ + δ + 1 1 + c wω + δ + δ + 1 + 1 + 1 c + + δ δ. Thus λ wω 1 c δ > 0, I fact, we roved a stroger result P mi X z l 1 c 1 ex δ /, z Ω 1 c 1 + c λ wω δ > 0. P max X z l 1 + c 1 ex δ /. z Ω Now aly our lemma to local taget coe T A M, observe wb T AM = ws 1 T A M. Now the lemma holds by luggig i the taget coe. Proof of Lemma 4. The roof requires a observatio The A.4 A.5 A.6 wa = E g su g, v = E g g A γ A MwA = E g g A = E g [ E g su h T A M su h T A M h A = E g h l ] g A h A h l defiitio of dual orm. [ g A g, h su = wb h T A M h T AM. l su h T A M The last ste requires the Cauchy Schwartz relatioshi.3. h A h l ]

4 CAI, LIANG & RAKHLIN Proof of Theorem 6 with Sudakov Etroy Estimate. The key techical tool i rovig Theorem 6 is the followig well-kow Fao s iformatio Lemma A.3. This versio is from [4], similar versios are rovided i [6, 7, 5], ad the basic ideas are essetially the same. Lemma A.3 Fao s Lemma. Let Θ, d, be a seudo metric sace ad {P θ : θ Θ} be a collectio of robability measures. Let r be a iteger ad let S T Θ. Deote by MS, ɛ, d the ɛ ackig set as well as the ackig umber of T with resect to metric d, i.e. if dθ, θ,θ MS,ɛ,d θ ɛ. Suose β := su θ,θ MS,ɛ,d D KL P θ P θ > 0. The if ˆθ su E θ d ˆθ, θ θ T ɛ su S T,ɛ>0 4 1 β + log. log MS, ɛ, d We use Sudakov estimate for the lower boud. Recall the model Y = X M + Z, where Z N0, σ I. Without loss of geerality, we ca assume σ = 1. The Kullback-Leiber divergece betwee stadardized liear iverse models with differet arameters uder the Gaussia oise is D KL M M = X M X M l. Recall Sudakov Mioratio i Lemma 1, ad deote the critical radius ɛb T := arg max ɛ ɛ log N B T, ɛ Cosider the coe itersected with l ball with radius δ, Kδ := B δ T R, where δ will be secified later. As before, defie ψ = su v B T X v l su D KL M M X M l + X M l M,M Kδ δ ψ. The ackig umber is lower bouded by the coverig umber as the last equality holds because we ca scale both the set ad coverig ball by δ MKδ, ɛ N Kδ, ɛ = N K1, ɛ δ Alyig the Fao s lemma, we have if su E X M l ɛ su δ>0,0<ɛ<δ 4 1 δ ψ + log log N K1, ɛ δ.

GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 5 Because K1 = B T, set δ = 1 log N B ψ T, ɛb T, ɛ = δ ɛb T The we have if su E X M l c 0 ψ with some uiversal costat c 0. Thus if ɛb T log N B T, ɛb T. su E X M l c 0σ ψ eb T. Proof of Theorem 6 with Volume Ratio. For the lower boud usig volume ratio. Recall the stadardized liear iverse model Y = X M + Z, where Z N0, σ I. Without loss of geerality, we ca assume σ = 1. The Kullback-Leiber divergece betwee stadardized liear iverse models with differet arameters uder the Gaussia oise is D KL M M = X M X M l. Cosider the itersectio of a coe T with l ball of radius δ, Kδ := B δ T R, where δ will be secified later. Defiig ψ = su v B T X v l, su D KL M M X M l + X M l M,M Kδ δ ψ. We have the ackig umber lower bouded by coverig umber as follows: MKδ, ɛ N Kδ, ɛ volkδ δ volb ɛ = volb A.7 T ɛ volb. Alyig Fao s iequality of Lemma A.3, we have if su E X M l ɛ su δ>0,0<ɛ<δ 4 1 δ ψ + log [ log δ ɛ volb T 1 volb ].

6 CAI, LIANG & RAKHLIN If for a > 0, 0 < b < 1 we choose the we have if su E X M l δ = 1 a volb ψ, ɛ = δ b volb ψ T volb T volb 1 su a>0,0<b<1 As show i [4, equatio 9], there is a uiversal costat c 0 > 0 such that ab 4 1 a + log log 1. b A.8 Thus if if su E X M l c 0 volb ψ T volb su E X M l c 0σ ψ vb T.. APPENDIX B: PROOF OF COROLLARIES Proof of Corollary 1. Defie Φ = X Σ 1, the Φ satisfies Gaussia esemble desig. Recall the Proof of Theorem 1, the oly lace we require the covariace of the desig matrix to be orthogoal is rovig local isometry costat LIC is bouded o the local taget coe. Via Lemma 3, max X z = max X z T A M z T A M Σ 1 1 Σ z = max Φz z Σ 1 T A M where Σ 1 T A M deotes the image of T A M uder liear trasform Σ 1. Here exact same calculatio holds for mi as max. Thus as log as wb Σ 1 TA M Theorem 1 still holds for. Similar to Theorem, let s first rove with the choice of η, rogram 3.6 is feasible o-emty for Ω with high robability. I articular, if we lug i Ω = Σ 1, ote X X Ω i Σ 1 ei A = su Φ Φe i e i, Σ 1 w.h. v wφσ 1 A wx A =

GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 7 where the last iequality followigs from 6.3 i roof of Theorem. Hece followig decomositio holds for the de-biased estimator M M M = Σ 1 + σ Σ 1 ΩX W where W N0, I is the stadard Gaussia vector, ad = ΩX X Σ 1 M has l cotrol Note we have l max X X Ω i Σ 1 ei A M A i γam λη σ γ A Mw X A. Σ 1 v, M M = v, + σ v, ΩX W ad v, v l1 l ρσ γ A Mw X A ΩX W N0, ΩX X Ω. The rest of the roof follows exactly as i Theorem. 0, I the followig, we deote as the solutio to the rogram.1 ad the estimatio error to be H = M. We refer exlicit calculatios of Gaussia width for various local taget coe to Sectio 3.4 roositios 3.10-3.14 i [1] for simlicity of our aer. Proof of Corollary. Let s calculate the rate for sarse vector recovery. We will treat the geometric terms γ A M, φ A M, X, λ A X, σ, searately. For γ A M: We kow that H lives i the taget coe T A M. Decomose H = H 0 + H c accordig to the suort of M, where H 0 l0 = s ad share the same suort as M. We have M l1 + H c l1 H 0 l1 = M +H c l1 H 0 l1 M +H 0 +H c l1 M l1 which meas H l1 H 0 l1, H 0 l H l. Thus we have the followig relatios H l1 H 0 l1 s H 0 l s H l

8 CAI, LIANG & RAKHLIN Therefore, H l 1 H l s ad thus γ A M s. As for φ A M, X : By the taget coe calculatio, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c s log s The last boud is from Gaussia width uer boud for local taget coe for s sarse vector. Lastly, for λ A X, σ, : We kow the oerator X is orm reservig i the sese that su X v l 1 + c. ad wx A is the Gaussia width of discrete oits o Euclidea ball, which is at most log due to the log behavior of maximum of Gaussia variables. Thus we ca rove λ σ with some roer costat is eough with high robability. The corollary the follows from Theorem 1. Proof of Corollary 3. Let s calculate the rate for low rak matrix recovery. We will boud the geometric terms γ A M, φ A M, X, λ A X, σ, searately. For γ A M: Note H lives i the taget coe T A M. We ca write H = H 0 + H c accordig to the sa of M that is, M = UDV T, H 0 is saed by either U as the row sace or V as the colum sace, ad H c is saed by U as the row sace ad V as the colum sace with the followig roerties M + H c H 0 = M + H c H 0 M + H M. Thus we have rakh 0 r ad H H 0, H 0 F H F. Thus we have the followig relatios H H 0 r H 0 F r H F We the have H H F r ad thus γ A M r. As for φ A M, X : By the taget coe calculatio for rak-r matrix, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c r + q r At last, for λ A X, σ, : Rak oe matrix maifold is a subsace with dimesio + q. Thus wx A ca be bouded by + q because the

GEOMETRIC INFERENCE LINEAR INVERSE PROBLEMS 9 Gaussia width of the + q-dimesioal subsace is + q ad the liear trasformatio caot elarge the dimesio. The rak oe matrices are of uit Frobeius orm ad the X is orm reservig i the sese that +q su X v l 1 + c. Puttig together, we ca rove λ σ with some roer costat is eough with high robability. The corollary follows by uttig together the geometric terms ad alyig the Theorem 1. Proof of Corollary 4. As usual, we will boud the geometric terms oe at a time. For γ A M, it is clear that H l / H l 1 ad so γ A M 1. As for φ A M, X, by the taget coe calculatio for sig vector, we ca rove φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c. Fially, for λ, µ: wx A is the Gaussia width of l ball, which is of order ad wx B is the Gaussia width of dimesio Euclidea ball, which is. Thus we ca rove λ σ ad µ σ with some roer costat i frot of the order is eough with high robability. We kow from Theorem 1, Thus φ AM, X M l X M l µ M l, M l γ A M M l. M l, M l, X M l C σ. Proof of Corollary 5. We boud searately the three geometric terms γ A M, φ A M, X, ad λ A X, σ,. For γ A M, it is clear that H F / H 1 ad thus γ A M 1. As for φ A M, X, by the taget coe calculatio for orthogoal matrix, it is easy to show that φ A M, X 1 c with high robability if 4[wB T AM + δ] c 1 c mm 1.

10 CAI, LIANG & RAKHLIN At last, for λ, µ: Orthogoal matrix maifold is a subsace with dimesio mm 1, thus wx A is uer bouded by mm 1 m because the Gaussia width of the mm 1 -dimesioal maifold itersects Euclidea ball is mm 1 ad liear trasformatio caot elarge the dimesio. wx B ca be bouded by m m. Thus we ca show λ σ 3 ad µ σ m with some roer costat i frot of the order is eough with high robability. Recall Theorem 1 Hece φ AM, X M l X M l µ M l, M γ A M M l. M F, M, X m M l C σ. REFERENCES [1] Chadrasekara, V., Recht, B., Parrilo, P. A., ad Willsky, A. S. 01. The covex geometry of liear iverse roblems. Foudatios of Comutatioal Mathematics, 16:805 849. [] Gordo, Y. 1988. O Milma s iequality ad radom subsaces which escae through a mesh i R. Sriger. [3] Ledoux, M. ad Talagrad, M. 1991. Probability i Baach Saces: isoerimetry ad rocesses, volume 3. Sriger. [4] Ma, Z. ad Wu, Y. 013. Volume ratio, sarsity, ad miimaxity uder uitarily ivariat orms. arxiv rerit arxiv:1306.3609. [5] Tsybakov, A. B. 009. Itroductio to oarametric estimatio, volume 11. Sriger. [6] Yag, Y. ad Barro, A. 1999. Iformatio-theoretic determiatio of miimax rates of covergece. The Aals of Statistics, 7:1564 1599. [7] Yu, B. 1997. Assouad, fao, ad le cam. I Festschrift for Lucie Le Cam, ages 43 435. Sriger. Deartmet of Statistics The Wharto School Uiversity of Pesylvaia Philadelhia, PA 19104 USA E-mail: tcai@wharto.ue.edu E-mail: tegyua@wharto.ue.edu E-mail: rakhli@wharto.ue.edu