Supplementary Material for Limits on Sparse Support Recovery via Linear Sketching with Random Expander Matrices

Similar documents
TESTS BASED ON MAXIMUM LIKELIHOOD

Chapter 4 Multiple Random Variables

ρ < 1 be five real numbers. The

Numerical Analysis Formulae Booklet

Chapter 5 Properties of a Random Sample

Mu Sequences/Series Solutions National Convention 2014

CHAPTER VI Statistical Analysis of Experimental Data

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

1 Review and Overview

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Asymptotic Formulas Composite Numbers II

MATH 247/Winter Notes on the adjoint and on normal operators.

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Bounds on the expected entropy and KL-divergence of sampled multinomial distributions. Brandon C. Roy

X ε ) = 0, or equivalently, lim

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

Summary of the lecture in Biostatistics

Lecture Note to Rice Chapter 8

Lecture 02: Bounding tail distributions of a random variable

Chapter 9 Jordan Block Matrices

Econometric Methods. Review of Estimation

Chapter 3 Sampling For Proportions and Percentages

Qualifying Exam Statistical Theory Problem Solutions August 2005

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

CS286.2 Lecture 4: Dinur s Proof of the PCP Theorem

Homework 1: Solutions Sid Banerjee Problem 1: (Practice with Asymptotic Notation) ORIE 4520: Stochastics at Scale Fall 2015

Lecture 3 Probability review (cont d)

The Mathematical Appendix

Extreme Value Theory: An Introduction

1 Mixed Quantum State. 2 Density Matrix. CS Density Matrices, von Neumann Entropy 3/7/07 Spring 2007 Lecture 13. ψ = α x x. ρ = p i ψ i ψ i.

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

22 Nonparametric Methods.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

General Method for Calculating Chemical Equilibrium Composition

Rademacher Complexity. Examples

Chapter 2 - Free Vibration of Multi-Degree-of-Freedom Systems - II

Functions of Random Variables

CHAPTER 4 RADICAL EXPRESSIONS

Complete Convergence and Some Maximal Inequalities for Weighted Sums of Random Variables

Point Estimation: definition of estimators

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Random Variables and Probability Distributions

Lecture 9: Tolerant Testing

1 Lyapunov Stability Theory

Dr. Shalabh. Indian Institute of Technology Kanpur

The Occupancy and Coupon Collector problems

Introduction to Probability

AN UPPER BOUND FOR THE PERMANENT VERSUS DETERMINANT PROBLEM BRUNO GRENET

1 Solution to Problem 6.40

Special Instructions / Useful Data

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

D KL (P Q) := p i ln p i q i

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Parameter, Statistic and Random Samples

Lecture 3: Randomly Stopped Processes

Lecture Notes Types of economic variables

18.413: Error Correcting Codes Lab March 2, Lecture 8

Chain Rules for Entropy

Chapter 14 Logistic Regression Models

Sampling Theory MODULE X LECTURE - 35 TWO STAGE SAMPLING (SUB SAMPLING)

STK4011 and STK9011 Autumn 2016

Class 13,14 June 17, 19, 2015

D. VQ WITH 1ST-ORDER LOSSLESS CODING

MOLECULAR VIBRATIONS

Lecture 3. Sampling, sampling distributions, and parameter estimation

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 8. Inferences about More Than Two Population Central Values

VARIABLE-RATE VQ (AKA VQ WITH ENTROPY CODING)

CIS 800/002 The Algorithmic Foundations of Data Privacy October 13, Lecture 9. Database Update Algorithms: Multiplicative Weights

Simulation Output Analysis

Chapter 4 Multiple Random Variables

LINEAR RECURRENT SEQUENCES AND POWERS OF A SQUARE MATRIX

Introduction to local (nonparametric) density estimation. methods

5 Short Proofs of Simplified Stirling s Approximation

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Continuous Distributions

Lecture Notes to Rice Chapter 5

2. Independence and Bernoulli Trials

6.867 Machine Learning

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Nonlinear Piecewise-Defined Difference Equations with Reciprocal Quadratic Terms

Large and Moderate Deviation Principles for Kernel Distribution Estimator

Notes on Censored EL, and Harzard

Introduction to Matrices and Matrix Approach to Simple Linear Regression

Answer key to problem set # 2 ECON 342 J. Marcelo Ochoa Spring, 2009

(b) By independence, the probability that the string 1011 is received correctly is

Feature Selection: Part 2. 1 Greedy Algorithms (continued from the last lecture)

STRONG CONSISTENCY OF LEAST SQUARES ESTIMATE IN MULTIPLE REGRESSION WHEN THE ERROR VARIANCE IS INFINITE

MA 524 Homework 6 Solutions

Multivariate Transformation of Variables and Maximum Likelihood Estimation

ESS Line Fitting

3. Basic Concepts: Consequences and Properties

Assignment 5/MATH 247/Winter Due: Friday, February 19 in class (!) (answers will be posted right after class)

Non-uniform Turán-type problems

Transcription:

Joata Scarlett ad Volka Cever Supplemetary Materal for Lmts o Sparse Support Recovery va Lear Sketcg wt Radom Expader Matrces (AISTATS 26, Joata Scarlett ad Volka Cever) Note tat all ctatos ere are to te bblograpy te ma documet, ad smlarly for may of te crossrefereces. A Proof of Lemma I te otato of Defto, let E ( =,...,k)beteevettatsomesets of cardalty fals to satsfy te expaso property,.e., N (S) < ( )d S. We start wt te followg o-asymptotc boud gve 8]: p d d PE] d. (44) d Applyg te bouds log p log p ad log d d dh 2 ( ), weobta log PE] log p + dh 2 ( )+ d log d (45) = log p d log H 2 ( ). (46) d Sce k = (), weobtafromteuoboudtatp =,...,k E s! provded tat (46) teds to all. Ts s true provded tat (2) olds; te domat codto s te oe wt = k. for B Proof of Teorem 3 Recall te deftos of te radom varables () (), ad te formato destes (25) (27). We fx te costats,..., k arbtrarly, ad cosder a decoder tat searces for te uque set s 2S suc tat for all 2 k declared. ĩ(x sdf ; y x seq ) > sdf (47) parttos (s df,s eq ) of s wt s df 6= ;. If o suc s exsts, or f multple exst, te a error s Sce te jot dstrbuto of ( s, s, Y s S = s) s te same for all s our setup (cf., Secto.2), ad te decoder tat we ave cose exbts a smlar symmetry, we ca codto o S = s = {,...,k}. Byteuo boud, te error probablty s upper bouded by P e P o ĩ( sdf ; Y seq ) sdf + s2s\{s} P ĩ( s\s ; Y s\s ) > sdf, (48) were ere ad subsequetly we let te codto s df 6= ; rema mplct. I te summad of te secod term, we ave upper bouded te probablty of a tersecto of 2 k evets by just oe suc evet, amely, te oe wt te formato desty correspodg to s df = s\s ad s eq = s \ s. As metoed prevously, a key tool te proof s te followg cage of measure (wt := s df ): P Y seq (y x seq )= Y P (x ) P Y sdf seq (y x sdf, x seq ) (49) x sdf 2s df ( + Y ) P(x ) P Y sdf seq (y x sdf, x seq ) (5) 2s df x sdf =( + ) e PY seq (y x seq ), (5) were we ave used te deftos (23) (24), ad (5) follows from (2). By a detcal argumet, we ave P Y seq s (y x seq,b s ) ( + ) e PY seq s (y x seq,b s ), (52)

Lmts o Sparse Support Recovery va Lear Sketcg wt Radom Expader Matrces were e P Y seq s := P Y seq s as a..d. law. We ca weake te secod probablty (48) as follows (wt := s\s ): P ĩ( s\s ; Y s\s ) > = P k (x s\s )P (x s\s ) dy P Y seq (y x s\s ) log P Y sdf seq (y x s\s, x s\s ) > (53) x s\s,x R s\s ep Y seq (y x s\s ) ( + ) (x s\s )P (x s\s ) dy P e Y seq (y x s\s ) log P Y sdf seq (y x s\s, x s\s ) > R ep Y seq (y x s\s ) ( + ) x s\s,x s\s P k =( + )e, x s\s,x s\s P k (54) (x s\s )P (x s\s ) dy P Y sdf seq (y x s\s, x s\s )e (55) R were (53) we used te fact tat te output vector depeds oly o te colums of x s correspodg to etres of s tat are also s, (54) follows from (5), ad (55) follows by boudg P e Y seq usg te evet wt te dcator fucto, ad te upper boudg te dcator fucto by oe. Substtutg (56) to (48) gves P e P ĩ( sdf ; Y seq ) were te combatoral terms arse from a stadard coutg argumet 7]. o + = (56) k p k k ( + )e, (57) We ow fx te costats,..., k arbtrarly, ad recall te followg steps from 7] (aga wrtg := s df ): o P ĩ( sdf ; Y seq ) = P P P log P Y sdf seq (Y sdf, seq ) (58) ep Y seq (Y seq ) log P Y sdf seq (Y sdf, seq ) ep Y seq (Y seq ) Te secod term (6) s upper bouded as P = log P log + P \ log log P Y sdf seq (Y sdf, seq ) + (Y seq, s) ep Y seq (Y seq ) (Y seq, s) > + P log log ep Y seq (Y seq ) (Y seq, s) ep Y seq (Y seq ) (Y seq, s) > (59) ep Y seq (Y seq ) (Y seq, s) >. (6) ep Y seq (Y seq ) (Y seq, s) > (6) P s (b s )P k (x seq ) dy P Y seq s (y x seq,b s ) log b R s,x seq ep Y seq (y x seq ) (y x seq,b s ) > (62)

( + ) ( + ) =( + ) k = k e Joata Scarlett ad Volka Cever P s (b s )P k (x seq ) dy e ep Y seq (y x seq ) P Y seq s (y x seq,b s ) log b R s,x seq (y x seq,b s ) > P s (b s )P k (x seq ) dy P e Y seq (y x seq )e b R s,x seq, (63) (64) were (6) follows from te uo boud, ad te remag steps follow te argumets used (53) (56) (wt (52) used place of (5)). We ow upper boud te frst term (6), aga followg 7]. Te umerator te frst term (6) equals P Y s (Y s ) for all (s df,s eq ) (recall te defto (22)), ad we ca tus wrte te overall term as P log P Y s (Y s ) max Usg te same steps as tose used (58) (6), we ca upper boud ts by P log P Y s s (Y s, s) max for ay costat P (65) log e P Y seq s (Y seq, s)+ +. (66) log e P Y seq s (Y seq, s)+ + +. Reversg te step (66), ts ca equvaletly be wrtte as + P log P Y s, s (Y s, s) > (67) P Y s (Y s ) log P Y sdf seq s (Y sdf, seq, s) + + + P log P Y s, s (Y s, s) >. (68) (Y seq, s) P Y s (Y s ) Te frst logartm te frst term s te formato desty (26). Moreover, te coces k p k k = log ( + ) k k = log ( + ) (69) (7) make (65) ad te secod term (57) be upper bouded by (68), ad recallg tat = s df,weobta(28). eac. Hece, ad combg (6) wt (65) ad C Proof of Teorem 2 Fx <b m <b max <, adletb := {b s :m b b m \ max b b max }. Te ma step provg Teorem 2 s extedg te argumets of Secto 4.5 to sow tat s df log p P e P max :s df 6=; I sdf,s eq ( s ) ( + ) \ s 2B + P s /2 B + o(), (7) ad P e P max :s df 6=; s df log p I sdf,s eq ( s ) ( ) \ s 2B + o(), (72) Before provg tese, we sow ow tey yeld te teorem. Usg (6), t s readly verfed tat eac I sdf,s eq ( s ), wt a..d. Gaussa vector s, sacotuousradomvarableavgomasspots. Bytakg! suffcetly slowly ad otg tat we ave restrcted s to te set B (wt wc all of te I sdf,s eq ( s ) are

Lmts o Sparse Support Recovery va Lear Sketcg wt Radom Expader Matrces bouded away from zero ad fty), we coclude tat (7) (72) rema true we s replaced by zero, ad ts cotrbuto s factored to te o() terms. Hece, we obta Teorem 2 by () droppg te codto s 2B from te frst probablty (7); () usg te detty PA \A 2 ] PA ] PA 2 ] to remove te same codto from te frst probablty (72); () otg tat te remader term P s /2 B ca be made arbtrarly small by coosg b m suffcetly small ad b max suffcetly large. It remas to establs (7) (72). Recall te value of gve followg Lemma 3. Te above coce of B esures tat all of te o-zero etres are bouded away from ad, sotattemutualformatosi sdf,s eq ( s ) ad varaces V sdf,s eq ( s ) are bouded away from zero ad fty, ad ece = (). Sce P s s cotuous, we must coose ad adle P (29) dfferetly to te above. Smlarly to te aalyss of Gaussa measuremets 7], we fx > ad ote tat Cebysev s equalty mples = I + r V =) P ( ), (73) were I := I( s ; Y s ) (74) V := Var log P Y s, s (Y s, s). (75) P Y s (Y s ) Te followg s a stragtforward exteso of 7, Prop. 4] to expader-based measuremets. Proposto. Te quattes I ad V defed (74) (75) satsfy Proof. See Appedx E. I k 2 log + d 2 2 (76) V 2. (77) We ca ow obta (7) (72) usg te steps of te prevous subsecto; te codto P s 2B ] arses (35) ad (39) due to te fact tat ts codto was used to obta a bouded varace (32), ad te frst two probabltes (7) arse from te detty PA A 2 ] PA A c 2]+PA 2 ]. Te oly addtoal step s sowg tat we ca smultaeously aceve = o(log p) ad P ( )=o() te acevablty part weever = (log p), te same way tat we sowed 2 s df log = o(log p) te prevous subsecto. Ts mmedately follows by substtutg (76) (77) to (73) (alog wt d = O() =O(log p)) toobta = O(log log p)+ p log p = o(log p) for ay >, adotgtat (ad ece P ( )) (73)cabearbtrarlysmall. D Proof of Lemma 3 We prove te lemma by caracterzg te varace of a geeral fucto of ( s, Y) of te form f ( s, Y) := P = f(() s,y () ). Clearly all of te quattes ı for te varous (s df,s eq ) ca be wrtte ts geeral form. We ave Var f ( s, Y) = Var f( s (),Y () ) (78) = = = j= Cov f( () s = Var f( s,y) +( 2,Y () ),f( s (j),y (j) ) (79) )Cov f( s,y),f(s,y ), (8) were ( s,y) ad ( s,y ) correspod to two dfferet dces {,,}; ere (8) follows by smple symmetry cosderatos for te cases = j ad 6= j.

Joata Scarlett ad Volka Cever To compute te covarace term (8), we frst fd te jot dstrbuto of ( s,y) ad (s,y ). As oted 29, Sec. IV-B], a uform permutato of a vector wt d oes ad d zeros ca be terpreted as successvely performg uform samplg from a collecto of symbols wtout replacemet ( tmes total), were te tal collecto cotas d oes ad d zeros. By cosderg te frst two steps of ts procedure, we obta P = x ]=P (x ) (8) P = x = x ]= P (x ) {x = x } for =, 2, were P () = P () = d. Deotg te rgt-ad sde of (82) by P (x x ), ad wrtg µ f := Ef( s,y)], tecovarace(8)sgveby Cov f( s,y),f(s,y ) = E f( s,y) µ f f(s,y ) µ f (83) = P(x k s ) Y P(x x ) E f(x s,y) µ f f(x s,y ) µ f s = x s,s = x s. (84) x s x s 2s We ow cosder te varous terms arsg by substtutg (82) to (84) ad performg a bomal-type expaso of te product: Tere s a sgle term of te form (84) wt eac P x(x x ) replaced by P (x ). Ts yelds a average of f( s,y) µ f f( s,y ) µ f over depedet radom varables s ad s,adtereforeevaluatesto zero. Tere are k terms wc oe value Px(x x {x ) (84) s replaced by =x } ad te oter k are replaced by P (x ). Eac suc term ca be wrtte as ( ) Var Ef( 2 s,y) s\{} ], wc tur beaves as Var Ef( s,y) s\{} ] + O(). All of te remag terms replace P x(x x ) (84) by {x=x } for at least two values of. Allsucterms are easly verfed to beave as O 2, ad te umber of suc terms s fte ad does ot scale wt (recall tat k s fxed by assumpto). Substtutg tese cases to (84) ad recallg tat k = () ad d = (), weobta(4). E Proof of Proposto Here we caracterze I ad V,defed(74) (75),vaaextesoofteaalyssgve7,App.B]. Sce Y = s s +, weave (82) I = I( s ; Y s )=H(Y s ) H(Y s, s) (85) = H( s s + s ) H(). (86) From 25, C. 9], we ave H() = 2 log(2 e 2 ) ad H( s s + s = x s )= 2 log (2 e) det( 2 I + 2 x s x T s ), were I s te detty matrx. Averagg te latter over s ad substtutg tese to (86) gves I = 2 E log det I + = 2 E log det I k + = 2 k = E log + k 2 log + d 2 2 2 2 s T s 2 2 T s s 2 2 ( T s s ) (87) (88) (89), (9)

Lmts o Sparse Support Recovery va Lear Sketcg wt Radom Expader Matrces were (88) follows from te detty det(i + AB) =det(i + BA), (89) follows by wrtg te determat as a product of egevalues (deoted by ( )), ad (9) follows from Jese s equalty ad te followg calculato: k k E = ( T s s ) = k ETr(T s s )] = E T ]=d, (9) sce te squared orm of s d almost surely. Ts cocludes te proof of (76). We ow tur to te boudg of te varace. Aga usg te fact tat Y = s log P Y s, s (Y s, s) P Y s (Y s ) = log P () P Y s ( s s + s ) = I 2 2 T + 2 ( s s + ) T 2 I + 2 s T s s +, weave (92) (s s + ), (93) were P s te desty of, ad (93) follows by a drect substtuto of te destes P N(, 2 I) ad P Y S ( x s ) N(, 2 I + 2 x s x T s ). Observe ow tat 2 T s a sum of depedet 2 radom varables wt oe degree of freedom (eac avg a varace of 2), ad ece te secod term (93) as a varace of 2. Moreover, by wrtg M =(M 2 ) T M 2 for te symmetrc postve defte matrx M = 2 I + 2 s T s,we smlarly observe tat te fal term (93) s a sum of 2 varables (ts s true codtoed o ay s = x s, ad ece also true ucodtoally), aga yeldg a varace of 2. We tus obta (77) usg te detty VarA + B] VarA] + VarB] + 2 max{vara], VarB]}.