A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

Size: px
Start display at page:

Download "A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers"

Transcription

1 A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba Departmet of EECS UC Berkeley Marti J. Waiwright Departmet of Statistics Departmet of EECS UC Berkeley Pradeep Ravikumar Departmet of Computer Scieces UT Austi Bi Yu Departmet of Statistics Departmet of EECS UC Berkeley Abstract High-dimesioal statistical iferece deals with models i which the the umber of parameters p is comparable to or larger tha the sample size. Sice it is usually impossible to obtai cosistet procedures uless p/ 0, a lie of recet work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rak matrices; Markov assumptios). I such settigs, a geeral approach to estimatio is to solve a regularized covex program (kow as a regularized M-estimator) which combies a loss fuctio (measurig goodess-of-fit of the models to the data) with some regularizatio fuctio that ecourages the assumed structure. The goal of this paper is to provide a uified framework for establishig cosistecy ad covergece rates for such regularized M-estimatio procedures uder high-dimesioal scalig. We state oe mai theorem ad show how it ca be used to re-derive several existig results, ad also to obtai several ew results o cosistecy ad covergece rates. Our aalysis also idetifies two key properties of loss ad regularizatio fuctios, referred to as restricted strog covexity ad decomposability, that esure the correspodig regularized M-estimators have fast covergece rates. Itroductio I may fields of sciece ad egieerig such as geomics ad atural laguage processig, it is of great iterest to relate predictor variables (e.g. gee levels) to a respose variable (e.g. cacer status). Due to the explodig size of problems, we ofte fid ourselves i the large p small regime that is, the umber of predictor variables p is comparable to or eve larger tha the umber of observatios. For such high dimesioal data, successful statistical modelig is possible oly if the data follows models with restrictios. For istace, the data might be sparse i a suitably chose basis, could lie o some maifold, or the depedecies amog the variables might have Markov structure specifid by a graphical model. I such settigs, a commo approach is to use regularized M-estimators, where some loss fuctio (e.g., the egative log-likelihood of the data) is regularized by a fuctio appropriate to the assumed structure. Such estimators may also be iterpreted from a Bayesia perspective as the Maximum A Posterior (MAP) estimator, with the regularizer reflectig prior iformatio. I this paper, we study such regularized M-estimatio procedures, ad attempt to provide a uifyig framework that both

2 recovers some existig results ad provides ew results o cosistecy ad covergece rates uder high-dimesioal scalig. As a illustratio of the applicatios of our aalysis, we work with three ruig examples of costraied parametric structures. The first are sparse models, both where the umber of model parameters that are o-zero is small (hard-sparse) or more geerally where the umber of parameters above a certai threshold are limited (weak-sparse). The secod are so called block-sparse models, where the parameters are matrix-structured, ad etire rows are either zero or ot. Our third class is the estimatio of low-rak matrices, which arises i system idetificatio, collaborative filterig, ad other types of matrix completio problems. To motivate the eed for a uified aalysis, let us provide a brief (ad hece ecessarily icomplete) overview of the broad rage of work o high-dimesioal models. For the case of sparse regressio, a popular regularizer is the l orm of the parameter vector, which is the sum of the absolute values of the parameters. A umber of researchers have studied the Lasso [5, 3] as well as the closely related Datzig selector [2] ad provided coditios o various aspects of its behavior, icludig l 2 -error bouds [6,, 20, 2] ad model selectio cosistecy [2, 9, 5, 6]. For geeralized liear models (GLMs) ad expoetial family models, estimators based o l -regularized maximum likelihood have also bee studied, icludig results o risk cosistecy [8] ad model selectio cosistecy []. A body of work has focused o the case of estimatig Gaussia graphical models, icludig covergece rates i Frobeius ad operator orm [4], ad results o operator orm ad model selectio cosistecy [2]. Motivated by iferece problems ivolvig block-sparse matrices, other researchers have proposed block-structured regularizers [7, 22], ad more recetly, high-dimesioal cosistecy results have bee obtaied for model selectio [7, 8] ad parameter cosistecy [4]. I this paper, we derive a sigle mai theorem, ad show how we are able to rederive a wide rage of kow results o high-dimesioal cosistecy, as well as some ovel oes: such as estimatio error rates for low-rak matrices, sparse matrices, ad weakly -sparse vectors. 2 Problem formulatio ad some key properties I this sectio, we begi with a precise formulatio of the problem, ad the develop some key properties of the regularizer ad loss fuctio. I particular, we defie a otio of decomposability for regularizig fuctios r, ad the prove that whe it is satisfied, the error = θ θ of the regularized M-estimator must satisfy certai costraits. We use these costraits to defie a otio of restricted strog covexity that the loss fuctio must satisfy. 2. Problem set-up Cosider a radom variable Z with distributio P takig values i a set Z. Let Z := {Z,...,Z } deote observatios draw i a i.i.d. maer from P, ad suppose θ R p is some parameter of this distributio. We cosider the problem of estimatig θ from the data Z. I order to do so, we cosider the followig class of regularized M-estimators. Let L : R p Z R be some loss fuctio that assigs a cost to ay parameter θ R p give a set of observatios. Let r : R p R deote a regularizatio fuctio. We the cosider the regularized M-estimator give by θ arg mi θ R p { L(θ; Z )+λ r(θ) }, () where λ > 0 is a regularizatio pealty. For ease of otatio, i the sequel, we adopt the shorthad L(θ) for L(θ; Z ). Throughout the paper, we assume that the loss fuctio L is covex ad differetiable, ad that the regularizer r is a orm. Our goal is to provide geeral techiques for derivig bouds o the error θ θ i some error metric d. A commo example is the l 2 -orm d( θ θ ) := θ θ 2. As discussed earlier, high-dimesioal parameter estimatio is made possible by structural costraits o θ such as sparsity, ad we will see that the behavior of the error is determied by how well these costraits are captured by the regularizatio fuctio r( ). We ow tur to the properties of the regularizer r ad the loss fuctio L that uderlie our aalysis. 2

3 2.2 Decomposability Our first coditio requires that the regularizatio fuctio r be decomposable, i a sese to be defied precisely, with respect to a family of subspaces. This otio is a formalizatio of the maer i which the regularizatio fuctio imposes costraits o possible parameter vectors θ R p. We begi with some abstract defiitios, which we the illustrate with a umber of cocrete examples. Take some arbitrary ier product space H, ad let 2 deote the orm iduced by the ier product. Cosider a pair (A, B) of subspaces of H such that A B. For a give subspace A ad vector u H, we let π A (u) := argmi v A u v 2 deote the orthogoal projectio of u oto A. We let V = {(A, B) A B } be a collectio of subspace pairs. For a give statistical model, our goal is to costruct subspace collectios V such that for ay give θ from our model class, there exists a pair (A, B) V with π A (θ ) 2 θ 2, ad π B (θ ) 2 0. Of most iterest to us are subspace pairs (A, B) i which this property holds but the subspace A is relatively small ad B is relatively large. Note that A represets the costraits uderlyig our model class, ad imposed by our regularizer. I the remaider of this paper we assume that H = R p ad use the stadard Euclidea ierproduct, uless otherwise specified. As a first cocrete (but toy) example, cosider the model class of all vectors θ R p, ad the subspace collectio T that cosists of a sigle subspace pair (A, B) = (R p, 0). We refer to this choice (V = T ) as the trivial subspace collectio. I this case, for ay θ R p, we have π A (θ )=θ ad π B (θ ) = 0. Although this collectio satisfies our desired property, it is ot so useful sice A = R p is a very large subspace. As a secod example cosider the class of s-sparse parameter vectors θ R p, meaig that θi 0oly if i S, where S is some s-sized subset of {, 2,..., p}. For ay give subset S ad its complemet S c, let us defie the subspaces A(S) ={θ R p θ S c =0}, ad B(S) ={θ R p θ S =0}, ad the s-sparse subspace collectio S = {(A(S),B(S)) S {,..., p}, S = s}. With this set-up, for ay s-sparse parameter vector θ, we are guarateed that there exists some (A, B) S such that π A (θ )=θ ad π B (θ ) = 0. I this case, the property is more iterestig, sice the subspaces A(S) are relatively small as log as S = s p. With this set-up, we say that the regularizer r is decomposable with respect to a give subspace pair (A, B) if r(u + z) =r(u)+r(z) for all u A ad z B. (2) I our subsequet aalysis, we impose the followig coditio o the regularizer: Defiitio. The regularizer r is decomposable with respect to a give subspace collectio V, meaig that it is decomposable for each subspace pair (A, B) V. Note that ay regularizer is decomposable with respect to the trivial subspace collectio T = {(R p, 0)}. It will be of more iterest to us whe the regularizer decomposes with respect to a larger collectio V that icludes subspace pairs (A, B) i which A is relatively small ad B is relatively large. Let us illustrate with some examples. Sparse vectors ad l orm regularizatio. Cosider a model ivolvig s-sparse regressio vectors θ R p, ad recall the defiitio of the s-sparse subspace collectio S discussed above. We claim that the l -orm regularizer r(u) = u is decomposable with respect to S. Ideed, for ay s-sized subset S ad vectors u A(S) ad v B(S), we have u + v = u + v, as required. Group-structured sparse matrices ad l,q matrix orms. Various statistical problems ivolve matrix-valued parameters Θ R k m ; examples iclude multivariate regressio problems or (iverse) covariace matrix estimatio. We ca defie a ier product o such matrices via Θ, Σ = trace(θ T Σ) ad the iduced (Frobeius) orm k m i= j= Θ2 i,j. Let us suppose that Θ satisfies a group sparsity coditio, meaig that the i th row, deoted Θ i, is o-zero oly if i S {,..., k} ad the cardiality of S is cotrolled. For a give subset S, we ca defie the subspace pair B(S) = { Θ R k m Θ i =0 for all i S c}, ad A(S) = (B(S)), For some fixed s k, we the cosider the collectio V = {(A(S),B(S)) S {,..., k}, S = s}, 3

4 which is a group-structured aalog of the s-sparse set S for vectors. For ay q [, ], ow suppose that the regularizer is the l /l q matrix orm, give by r(θ) = k i= [ m j= Θ ij q ] /q, correspodig to applyig the l q orm to each row ad the takig the l -orm of the result. It ca be see that the regularizer r(θ) = Θ,q is decomposable with respect to the collectio V. Low-rak matrices ad uclear orm. The estimatio of low-rak matrices arises i various cotexts, icludig pricipal compoet aalysis, spectral clusterig, collaborative filterig, ad matrix completio. I particular, cosider the class of matrices Θ R k m that have rak r mi{k, m}. For ay give matrix Θ, we let row(θ) R m ad col(θ) R k deote its row space ad colum space respectively. For a give pair of r-dimesioal subspaces U R k ad V R m, we defie a pair of subspaces A(U, V ) ad B(U, V ) of R k m as follows: A(U, V ) := { Θ R k m row(θ) V, col(θ) U }, ad (3a) B(U, V ) := { Θ R k m row(θ) V, col(θ) U }. Note that A(U, V ) B (U, V ), as is required by our costructio. We the cosider the collectio V = {(A(U, V ),B(U, V )) U R k, V R m }, where (U, V ) rage over all pairs of r-dimesioal subspaces. Now suppose that we regularize with the uclear orm r(θ) = Θ, correspodig to the sum of the sigular values of the matrix Θ. It ca be show that the uclear orm is decomposable with respect to V. Ideed, sice ay pair of matrices M A(U, V ) ad M B(U, V ) have orthogoal row ad colum spaces, we have M +M = M + M (e.g., see the paper [3]). Thus, we have demostrated various models ad regularizers i which decomposability is satisfied with iterestig subspace collectios V. We ow show that decomposability has importat cosequeces for the error = θ θ, where θ R p is ay optimal solutio of the regularized M-estimatio procedure (). I order to state a lemma that captures this fact, we eed to defie the dual orm of the regularizer, give by r (v) := sup u R p. For the regularizers of iterest, the dual orm ca be obtaied via some easy calculatios. For istace, give a vector θ R p ad r(θ) = θ, we have r (θ) = θ. Similarly, give a matrix Θ R k m ad the uclear orm regularizer r(θ) = Θ, we have r (Θ) = Θ 2, correspodig to the operator orm (or maximal sigular value). Lemma. Suppose θ is a optimal solutio of the regularized M-estimatio procedure (), with associated error = θ θ. Furthermore, suppose that the regularizatio pealty is strictly positive with λ 2 r ( L(θ )). The for ay (A, B) V ut v r(u) r(π B ( )) 3r(π B ( )) + 4r(π A (θ )). This property plays a essetial role i our defiitio of restricted strog covexity ad subsequet aalysis. 2.3 Restricted Strog Covexity Next we state our assumptio o the loss fuctio L. I geeral, guarateeig that L( θ) L(θ ) is small is ot sufficiet to show that θ ad θ are close. (As a trivial example, cosider a loss fuctio that is idetically zero.) The stadard way to esure that a fuctio is ot too flat is via the otio of strog covexity i particular, by requirig that there exist some costat γ> 0 such that L(θ + ) L(θ ) L(θ ) T γd 2 ( ) for all R p. I the high-dimesioal settig, where the umber of parameters p may be much larger tha the sample size, the strog covexity assumptio eed ot be satisfied. As a simple example, cosider the usual liear regressio model y = Xθ + w, where y R is the respose vector, θ R p is the ukow parameter vector, X R p is the desig matrix, ad w R is a oise vector, with i.i.d. zero mea elemets. The least-squares loss is give by L(θ) = 2 y Xθ 2 2, ad has the Hessia H(θ) = XT X. It is easy to check that the p p matrix H(θ) will be rak-deficiet wheever p >, showig that the least-squares loss caot be strogly covex (with respect to d( ) = 2 ) whe p >. Herei lies the utility of Lemma : it guaratees that the error must lie withi a restricted set, so that we oly eed the loss fuctio to be strogly covex for a limited set of directios. More precisely, we have: (3b) 4

5 Defiitio 2. Give some subset C R p ad error orm d( ), we say that the loss fuctio L satisfies restricted strog covexity (RSC) (with respect to d( )) with parameter > 0 over C if L(θ + ) L(θ ) L(θ ) T d 2 ( ) for all C. (4) I the statemet of our results, we will be iterested i loss fuctios that satisfy RSC over sets C(A, B, ɛ) that are idexed by a subspace pair (A, B) ad a tolerace ɛ 0 as follows: C(A, B, ɛ) := { R p r(π B ( )) 3r(π B ( )) + 4r(π A (θ )), d( ) ɛ }. (5) I the special case of least-squares regressio with hard sparsity costraits, the RSC coditio correspods to a lower boud o the sparse eigevalues of the Hessia matrix X T X, ad is essetially equivalet to a restricted eigevalue coditio itroduced by Bickel et al. []. 3 Covergece rates We are ow ready to state a geeral result that provides bouds ad hece covergece rates for the error d( θ θ ). Although it may appear somewhat abstract at first sight, we illustrate that this result has a umber of cocrete cosequeces for specific models. I particular, we recover some kow results about estimatio i s-sparse models [], as well as a umber of ew results, icludig covergece rates for estimatio uder l q -sparsity costraits, estimatio i sparse geeralized liear models, estimatio of block-structured sparse matrices ad estimatio of low-rak matrices. I additio to the regularizatio parameter λ ad RSC costat of the loss fuctio, our geeral result ivolves a quatity that relates the error metric d to the regularizer r; i particular, for ay set A R p, we defie Ψ(A) := sup r(u), (6) {u R p d(u)=} so that r(u) Ψ(A)d(u) for u A. Theorem (Bouds for geeral models). For a give subspace collectio V, suppose that the regularizer r is decomposable, ad cosider the regularized M-estimator () with λ 2 r ( L(θ )). The, for ay pair of subspaces (A, B) Vadtolerace ɛ 0 such that the loss fuctio L satisfies restricted strog covexity over C(A, B, ɛ), we have d( θ θ ) max { ɛ, [ 2 Ψ(B ) λ + 2 λ r(π A (θ )) ]}. (7) The proof is motivated by argumets used i past work o high-dimesioal estimatio (e.g., [9, 4]); we provide the details i the full-legth versio. I the remaider of this paper, we illustrate the cosequeces of Theorem for specific models. The parameter λ will be selected as small as possible while satisfyig the lower boud 2 r ( L(θ )). For the sake of clarity, the error d( ) is take to be 2. For all models ɛ =0, apart from the weak-sparse model i sectio Bouds for liear regressio Cosider the stadard liear regressio y = Xθ + w model, where θ R p is the regressio vector, X R p is the desig matrix, ad w R is a oise vector. Give the observatios (y, X), our goal is to estimate the regressio vector θ. Without ay structural costraits o θ, we ca apply Theorem with the trivial subspace collectio T = {(R p, 0)} to establish a rate θ θ 2 = O(σ p/) for ridge regressio. Note that the RSC coditio requires that X is full-rak so that > p. Here we cosider bouds for liear regressio where θ is a s-sparse vector. 3.. Lasso estimates of hard sparse models More precisely, let us cosider estimatig a s-sparse regressio vector θ by solvig the Lasso program { } θ arg mi θ R p 2 y Xθ λ θ. (8) 5

6 The Lasso is a special case of our M-estimator () with r(θ) = θ, ad L(θ) = 2 y Xθ 2 2. Recall the defiitio of the s-sparse subspace collectio S from Sectio 2.2. For this problem, let us set ɛ =0sothat the restricted strog covexity set (5) becomes C(A, B, 0) = { R p S c 3 S }. Establishig restricted strog covexity for the least-squares loss is equivalet to esurig the followig boud o the desig matrix: Xθ 2 2/ θ 2 2 for all θ R p s.t. θ S 3 θ S. (9) As metioed previously, this coditio is essetially the same as the restricted eigevalue coditio developed by Bickel et al. []. Moreover, we ote that Raskutti et al. [0] have show that coditio (9) will hold with high probability for various radom esembles of Gaussia matrices. The i th colum of X, X i, also satisfies the costrait X i 2. Fially, we assume that the elemets of w i are zero-mea ad have sub-gaussia tails, meaig that there exists some costat σ> 0 such that P[ w i >t] exp( t 2 /2σ 2 ) for all t>0. Uder these coditios, we recover as a corollary of Theorem the followig kow result [, 6]. Corollary. Suppose that the true vector θ R p is exactly s-sparse with support S, ad that the desig matrix X satisfies coditio (9). If we solve the the Lasso with λ 2 = 6σ2 log p, the with probability at least c exp( c 2 λ 2 ), the solutio satisfies θ θ 2 8σ s log p. (0) Proof. As oted previously, the l -regularizer is decomposable for the sparse subspace collectio S, while coditio (9) esures that RSC holds for all sets C(A, B, 0) with (A, B) S. We must verify that the give choice of regularizatio satisfies λ 2 r ( L(θ )). Note that r ( ) =, ad moreover that L(θ )=X T w/. Uder the colum ormalizatio coditio o the desig matrix X ad the sub-gaussia ature of the oise, it follows that X T w/ 4σ 2 log p with high probability. The boud i Theorem is thus applicable, ad it remais to compute the form that its differet terms take i this special case. For the l -regularizer ad the l 2 error metric, we have Ψ(A S )= S. Give the hard sparsity assumptio, r(θs c) = 0, so that Theorem implies that θ θ 2 2 sλ = 8σ s log p, as claimed Lasso estimates of weak sparse models We ow cosider models that satisfy a weak sparsity assumptio. More cocretely, suppose that θ lies i the l q - ball of radius R q amely, the set B q (R q ) := {θ R p p i= θ i q R q } for some q (0, ]. Our aalysis exploits the fact that ay θ B q (R q ) ca be well approximated by a s-sparse vector (for a appropriately chose sparsity idex s). It is atural to approximate θ by a vector supported o the set S = {i θi τ}. For ay choice of threshold τ> 0, it ca be show that S R q τ q, ad as show i the full-legth versio, the optimal choice is to set τ = λ, usig the same regularizatio parameter as i Corollary. Accordigly, we cosider the s-sparse subspace collectio S with subsets of size s = R q λ q. We assume that the oise vector w R is as defied above ad that the colums are ormalized as i the previous sectio. We also assume that the matrix X satisfies the coditio ( log p ) 2 Xv 2 κ v 2 κ 2 v for costats κ,κ 2 > 0. () Raskutti et al. [0] show that this property holds with high probablity for suitable Gaussia radom matrices. Uder this coditio, it ca be verified that RSC holds with =κ /2 over the set C ( A(S),B(S),ɛ ), where ɛ = ( 4/κ + 4/κ )R ( ) 2 6 σ 2 log p q/2. q The followig result, which we obtai by applyig Theorem i this settig, is ew to the best of our kowledge: Corollary 2. Suppose that the true vector θ B q (R q ), ad the desig matrix X satisfies coditio (). If we solve the Lasso with λ 2 = 6σ2 log p solutio satisfies θ θ 2 R 2 q ( 6 σ2 log p, the with probability c exp( c 2 λ 2 ), the ) q/2 [ ] (2) 6

7 We ote that both of the rates for hard-sparsity i Corollary ad weak-sparsity i Corollary 2 are kow to be optimal i a miimax sese [0]. I [0], the authors also show that (2) is achievable by solvig the computatioally itractable problem of miimizig L(θ) over the l q -ball. 3.2 Bouds for geeralized liear models Next, cosider ay geeralized liear model with caoical lik fuctio, where the distributio of respose y Y, give predictor X R p, is give by p(y X; θ ) = exp(yθ T X a(θ T X)+d(y)), for some fixed fuctios a : R R ad d : Y R, where X A, ad y B. We cosider estimatig θ from observatios {(X i,y i )} i= by l -regularized maximum likelihood: { θ arg mi ( ) θ R p θt y i X i + a(θ T } X i )+ θ, (3) i= i= so that L(θ) = θ ( T i= y ) ix i + i= a(θt X i ), ad r(θ) = θ. Let X R p deote the matrix with X i as row i. Agai we use the s-sparse subspace collectio S ad ɛ =0, so that it ca be verified that it suffices for the restricted strog covexity coditio to hold if for some c>0, ä(θ T x) > c, for x M, θ {θ + : 2 6AB s log p }, ad that the desig matrix X satisfies the restricted eigevalue boud Xθ 2 2 / θ 2 2 for all θ R p s.t. θ S c 3 θ S. (4) c Corollary 3. Suppose that the true vector θ R p is exactly s-sparse with support S, ad the desig matrix X satisfies coditio (4). Suppose that we solve the l -regularized M-estimator (3) with λ 2 = 32A2 B 2 log p. The with probability c exp( c 2 λ 2 ), the solutio satisfies θ θ 2 6AB We defer the proof to the full-legth versio due to space costraits. 3.3 Bouds for sparse matrices s log p. (5) I this sectio, we cosider some extesios of our results to estimatio of regressio matrices. Various authors have proposed extesios of the Lasso based o regularizers that have more structure tha the l orm [7, 22]. Such regularizers allow oe to impose various types of block-sparsity costraits, i which groups of parameters are assumed to be active (or iactive) simultaeously. We assume that the observatio model takes o the form Y = XΘ + W, where Θ R k m is the ukow fixed set of parameters, X R k is the desig matrix, ad W R m is the oise matrix. As a loss fuctio, we use the Frobeius orm L(Θ) = Y XΘ 2 F, ad as a regularizer, we use the l,q -matrix orm for some q, which takes the form Θ,q = k i= (Θ i,...,θ im ) q. We refer to the resultig estimator as the q-group Lasso. We defie the quatity η(m; q) = if q (, 2] ad η(m; q) =m /2 /q if q>2. We the set the regularizatio parameter as follows: { 4σ [η(m; q) log k + C q m /q ] if q> λ = log(km) 4σ for q =. Corollary 4. Suppose that the true parameter matix Θ has o-zero rows oly for idices i S {,..., k} where S = s, ad that the desig matrix X R k satisfies coditio (9). The with probability at least c exp( c 2 λ 2 ), the q-block Lasso solutio satisfies Θ Θ 2 F Ψ(S)λ. (6) Proof. We simply eed to establish that the regularizatio parameter satisfies λ 2 r ( L(Θ )). We ote that for a matrix U, r (U) = max i=,...,k U i q for /q = /q. Moreover, we have L(Θ )= XT W. Cocetratio results o q ad the uio boud yield that r ( XT W ) 2σ [η(m; q) log k + C q m /q ], as required. 7

8 We will ow cosider three special cases of the above result. A simple argumet shows that Ψ(S) = s if q 2, ad Ψ(S) =m /q /2 s if q [, 2]. First, we cosider q =, ad ote that solvig the Group Lasso with q =is idetical solvig a Lasso problem with sparsity sm ad ambiet dimesio km. The resultig upper boud o the Frobeius orm reflects this fact: 8σ smlog(km) more specifically, for q =, the boud is. For the case q =2, Corollary 4 im- ]. This is also plies that the Frobeius error Θ Θ F is upper bouded as 8σ a very atural result: the term s log k [ s log k + sm captures the difficulty of fidig the s o-zero rows out of the total k, whereas the term sm captures the difficulty of estimatig the sm free parameters i the matrix (oce the o-zero rows have bee determied). We ote that recet work by Louici et al. [4] established the boud O( σ c mslog k Fially, for q =, we obtai the upper boud 8σ 3.4 Bouds for estimatig low rak matrices + sm ), which is equivalet apart from a term m. [ s log k + m ] s, which is a ovel result. Fially, we cosider the implicatios of our mai result for the problem of estimatig low-rak matrices. This structural assumptio is a atural geeralizatio of sparsity, ad has bee studied by various authors (see the paper [3] ad refereces therei). To illustrate our mai theorem i this cotext, let us cosider the followig istace of low-rak matrix learig. Give a low-rak matrix Θ R k m, suppose that we are give oisy observatios of the form Y i = X i, Θ + W i, where W i N(0, ). Such a observatio model arises, for istace, i system idetificatio settigs i cotrol theory [3]. The followig regularized M-estimator ca be cosidered i order to estimate the desired low-rak matrix Θ : mi Y i X i, Θ) 2 + Θ, (7) Θ R m p 2 i= where the regularizer, Θ, is the uclear orm, or the sum of the sigular values of Θ. Recall the rak-r collectio V defied for low-rak matrices i Sectio 2.2. Let Θ = UΣW T be the sigular value decompositio (SVD) of Θ, so that U R k r ad W R m r are orthogoal, ad Σ R r r is a diagoal matrix. If we let A = A(U, W ) ad B = B(U, W ), the, π B (Θ ) = 0, so that by Lemma we have that π B ( ) 3 π B ( ). Thus, for restricted strog covexity to hold it ca be show that the desig matrices X i must satisfy X i, 2 2 F for all such that π B ( ) 3 π B ( ). (8) i= As with the aalogous coditios for sparse vectors ad liear regressio, this coditio ca be show to hold with high probability for Gaussia radom matrices. Corollary 5. Suppose that the true matrix Θ has rak r mi(k, m), ad that the desig matrices {X i } satisfy coditio (8). If we solve the regularized M-estimator (7) with λ =4 k+ m, the with probability at least c exp( c 2 (k + m)), we have Θ Θ F 6 [ rk + rm ]. (9) Proof. Note that if rak(θ )=r, the Θ r Θ F so that Ψ(B )= 2r, sice the subspace B(U, V ) cosists of matrices with rak at most 2r. All that remais is to show that λ 2 r ( L(Θ )). Stadard aalysis gives that the dual orm to is the operator orm, 2. Applyig this observatio ad the fact that L(Θ )= i= X iw i we ca costruct a boud o the operator orm of i= X iw i. We assume that the etries of X i are i.i.d. N(0, ). The, coditioed o W, the etries of the matrix i= X iw i are i.i.d. N(0, W 2 2/ 2 ) from which it ca be show that with probability at least c exp( c 2 ), W 2 2 / 2. Coupled with results o radom matrix theory we have that i= X iw i 2 2 k+ m with probability at least c exp( c 2 (k + m)), verifyig that λ 2 r ( L(θ )). 8

9 Refereces [] P. Bickel, Y. Ritov, ad A. Tsybakov. Simultaeous aalysis of Lasso ad Datzig selector. Submitted to Aals of Statistics, [2] E. Cades ad T. Tao. The Datzig selector: Statistical estimatio whe p is much larger tha. Aals of Statistics, 35(6): , [3] S. Che, D. L. Dooho, ad M. A. Sauders. Atomic decompositio by basis pursuit. SIAM J. Sci. Computig, 20():33 6, 998. [4] K. Louici, M. Potil, A. B. Tsybakov, ad S. va de Geer. Takig advatage of sparsity i multi-task learig. Arxiv, [5] N. Meishause ad P. Bühlma. High-dimesioal graphs ad variable selectio with the Lasso. Aals of Statistics, 34: , [6] N. Meishause ad B. Yu. Lasso-type recovery of sparse represetatios for high-dimesioal data. Aals of Statistics, 37(): , [7] S. Negahba ad M. J. Waiwright. Simultaeous support recovery i high-dimesioal regressio: Beefits ad perils of l, -regularizatio. Techical report, Departmet of Statistics, UC Berkeley, April [8] G. Oboziski, M. J. Waiwright, ad M. I. Jorda. Uio support recovery i high-dimesioal multivariate regressio. Techical report, Departmet of Statistics, UC Berkeley, August [9] S. Portoy. Asymptotic behavior of M-estimators of p regressio parameters whe p 2 / is large: I. cosistecy. Aals of Statistics, 2(4): , 984. [0] G. Raskutti, M. J. Waiwright, ad B. Yu. Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls. Techical Report arxiv: , UC Berkeley, Departmet of Statistics, [] P. Ravikumar, M. J. Waiwright, ad J. Lafferty. High-dimesioal Isig model selectio usig l -regularized logistic regressio. Aals of Statistics, To appear. [2] P. Ravikumar, M. J. Waiwright, G. Raskutti, ad B. Yu. High-dimesioal covariace estimatio by miimizig l -pealized log-determiat divergece. Techical Report 767, Departmet of Statistics, UC Berkeley, September [3] B. Recht, M. Fazel, ad P. A. Parrilo. Guarateed miimum-rak solutios of liear matrix equatios via uclear orm miimizatio. Allerto Coferece 07, Allerto House, Illiois, [4] A.J. Rothma, P.J. Bickel, E. Levia, ad J. Zhu. Sparse permutatio ivariat covariace estimatio. Electro. J. Statist., 2:494 55, [5] R. Tibshirai. Regressio shrikage ad selectio via the lasso. Joural of the Royal Statistical Society, Series B, 58(): , 996. [6] J. Tropp. Just relax: Covex programmig methods for idetifyig sparse sigals i oise. IEEE Tras. Ifo Theory, 52(3):030 05, March [7] B. Turlach, W.N. Veables, ad S.J. Wright. Simultaeous variable selectio. Techometrics, 27: , [8] S. Va de Geer. High-dimesioal geeralized liear models ad the lasso. Aals of Statistics, 36(2):64 645, [9] M. J. Waiwright. Sharp thresholds for high-dimesioal ad oisy sparsity recovery usig l - costraied quadratic programmig (Lasso). IEEE Tras. Iformatio Theory, 55: , May [20] C. Zhag ad J. Huag. Model selectio cosistecy of the lasso selectio i high-dimesioal liear regressio. Aals of Statistics, 36: , [2] P. Zhao ad B. Yu. O model selectio cosistecy of Lasso. Joural of Machie Learig Research, 7: , [22] P. Zhao, G. Rocha, ad B. Yu. Grouped ad hierarchical model selectio through composite absolute pealties. Aals of Statistics, 37(6A): ,

10 A Ridge-Regressio I this sectio, we apply Theorem to ridge-regressio. Cosider solvig the program { } θ arg mi θ R p y Xθ λ θ 2. Assume that the uderlyig structure eforces θ 2 M for some costat M>0. As a result, the restricted strog covexity assumptio reduces to λ mi ( XT X) > 0. We may ow preset the followig trivial corollary to Theorem. Note that the result is ot ew, ad provides exactly the same boud as i the ordiary least-squares solutio to the problem. Corollary 6. Suppose that the true vector θ R p ad that the desig matrix X has its smallest eigevalue bouded below by. Suppose that we solve the Ridge-regressio program with λ 2 = p. The, with probability c exp( c 2 λ 2 ), the solutio satisfies θ θ 8σ p 2 (20) Proof. The restricted strog covexity coditio clearly holds. Furthermore, let V be the space of all subspace-pairs. Therefore, we ca apply the boud i Theorem. First ote that Ψ(A) = for ay set A sice d(v) =r(v) v R p. The dual orm r ( ) is r( ). Thus, we must establish the l 2 orm of L(θ )=X T w/. However, the colum ormalizatio bouds yields that X T w/ 2 2σ p/ with probability c exp( c 2 p). Therefore, lettig λ =2 X T w/ 2 we have by Theorem that d( θ θ ) [8 p + 8λ r(π A (θ ))]. Thus, the boud is clealry miimized as log as θ A =0, which is the case if we let A = R p. Verifyig the result. B Proof of Theorem The argumet is motivated by the methods of Rothma et al. [4], i their aalysis of a l - regularized log-determiat program. Cosider the fuctio g( ) := L(θ + ) L(θ )+λ { r(θ + ) r(θ ) }. (2) The covexity of L( ) ad r( ) implies that g is a covex fuctio. Here, we have that = θ θ ad = θ θ. Observe that g(0) = 0 so that g( ) 0. From Lemma, we kow that C, where C := { R p : r(π B ( )) 3 r(π B ( )) + 4 r(π A (θ ))}. We also have that if C, the t C for ay t [0, ]. Now suppose that d( ) >M. The there exists a t (0, ) such that d(t ) = M ad t C. Now suppose that g(t ) > 0. The, by the covexity of g g(( t)0 + t ) ( t)g(0) + tg( ). We kow g(0) = 0 ad t>0. Thus, g( ) > 0, which is a cotradictio. Therefore, d( ) M. Hece, it suffices to show that for ay C such that d( ) = M, g( ) > 0, which we ow prove. Proof. Fix ay arbitrary vector R p such that C ad d( ) = M. We assume that restricted strog covexity holds for all such vectors. Therefore, g( ) = L(θ + ) L(θ )+λ { r(θ + ) r(θ ) } L(θ ) T + d( ) 2 + λ { r(θ + ) r(θ ) }. (22) Recall that λ 2r ( L(θ )), so that by Lemma L(θ ) T + λ { r(θ + ) r(θ ) } λ 2 {r(π B( )) 3r(π B ( )) 4r(π A (θ )) } λ 2 {3r(π B( )) + 4r(π A (θ )) } 0

11 Substitutig the latter iequality ito equato (22) yields g( ) d( ) 2 λ 2 {3r(π B( )) + 4r(π A (θ )) }. Notig that r(π B ( )) Ψ(B ) d(π B ( )) Ψ(B ) d( ), establishes that g( ) d( ) 2 λ { 3Ψ(B )d( ) + 4r(π 2 A (θ )) }. { [ Fially, substitutig M = 2 Ψ(B ) λ + ]} 2 λ r(π A (θ )) proves that g( ) > 0. C Proofs ad Auxiliary Results Proof of Lemma. Recall the fuctio g( ) := L(θ + ) L(θ )+λ { r(θ + ) r(θ ) }. (23) We will start off by obtaiig a lower boud for this fuctio. Loss Deviatio: Usig the covexity of the loss fuctio L, we have By the Cauchy-Schwartz iequality, we have L(θ + ) L(θ ) L(θ ) T. (24) L(θ ) T r ( L(θ )) r( ) λ [ r(πb ( )) + r(π B ( )) ], 2 where we have used the assumptio o r ( L(θ )), ad the triagle iequality. Substitutig i (24) L(θ + ) L(θ ) λ 2 Regularizatio Deviatio: By the triagle iequality, By the decompositio property, [ r(πb ( )) + r(π B ( )) ]. (25) r(θ + ) r(π A (θ )+π B ( )) r(π A (θ )) r(π B ( )). r(π A (θ )+π B ( )) = r(π A (θ )) + r(π B ( )), so that by aother applicatio of the triagle iequality, r(θ + ) r(θ ) r(π B ( )) r(π B ( )) 2r(π A (θ )). (26) Substitutig the lower bouds for the loss ad regularizatio fuctio deviatios (26) ad (25) i (23), g( ) λ 2 [r(π B( )) 3r(π B ( )) 4r(π A (θ ))]. (27) By costructio g(0) = 0, ad hece the deviatio of the optimum satisfies g( ) 0. Usig i (27) ad dividig by λ 2 > 0 yields, as required. r(π B ( )) 3 r(π B ( )) + 4 r(π A (θ )),

12 D Proof of Corollary 2 Proof. The subset V of the sparse-vectors decomposability-set collectio we use i this corollary is the subset V =(A S,A S c) for sets S S = {S S R q (log(p)/) q }. As i the proof of Corollary 2, the assumptios of Theorem are satisfied, so that we ca use the boud i the theorem; its terms ca be simplified as follows. Agai, for the l -regularizer ad the l 2 error metric, we have Ψ(A S )= S. Now S ca be bouded as follows: R q θi q i τ q S, i S θ i q so that S τ q R q. Further, give the soft sparsity assumptio, r(θs c) ca be boud as follows: θs c = θi i S c = θi q θi q R q τ q. i S c We thus obtai from Theorem that θ θ 2 [ 2 ] S λ + 2 λ θs c [ 2 ] R q τ q/2 λ + 2 λ R q τ q. From the settigs of τ ad λ, it ca be see that λ = τ, which whe substituted i the previous expressio yields, [ ] θ θ 2 R q λ 2 q Substitutig for the value of λ, we thus obtai the boud i the Corollary. D. Restricted Strog Covexity for Weak-Sparse Models Oe sufficiet coditio for the restricted strog covexity coditio to hold is that the desig matrices X R p satisfy the coditioo for some costats c > 0 ad c 2 > 0. Xv c v 2 c 2 log p v I our settig, v S c 3 v S +4 θ S c so that v 4[ v S + θ S c ], which further implies the that v 4[ S v + θ S c. Therefore, it immediately follows the that ( ) S log p log p Xv c 4c 2 v 2 4c 2 θ S c. Recall from the argumets above that θ S c R q τ q where we also set τ = oly cocered with sets such that S R q τ q so that Xv (c 4c 2 R q τ 2 q ) v 2 4c 2 R q τ 2 q. log p ad we are 2

13 For the applicatios of restricted strog covexity above, we oly eed it told hold for the vectors v such that v 2 = O( c Rq τ 2 q ) where we recall that τ = λ, justifyig the swap. Fially, applyig the boud o v 2 yields that ) Xv 2 ( 4c R q τ 2 q R q τ 2 q 4c R q τ 2 q ) ( 8 c R q τ 2 q R q τ 2 q, where c = c 2 /c. The costats c ad c 2 are idepedet of everythig else ad by the scalig of, have that the term i the parathesis ca be made arbitrarily close to by takig sufficietly large. Therefore, have that Xv 2 c 2 v 2, which immediately implies the that = c 2 for v G. Note, i fact that the boud holds for ay v such that v 2 c Rq τ 2 q, which implies the that the boud established i Corollary 2 is valid sice E Restricted Strog Covexity for the Trace Observatio Model Recall the low-rak matrix observatio model is Y i = trace(xi T Θ )+W i, where X i, Θ R m p. Note that by we ca covert each X i ad Θ to a vector to yield the usual liear regressio observatio model Y = Xθ + W, where X R (pm) ad θ R pm. We establish RSC for the simple case where the observatio matrices X i are draw from the i.i.d. Gaussia esemble. We will the appeal to the Gordo-Slepia Lemma to establish that if p + m X 2 c 2 c 2 { : 2 =} where the orm is the uclear orm, ad 2 is the Frobeius orm. Gordo-Slepai will lower boud the expected value of the radom variable if X 2, while we the apply cocetratio results to arrive at the above result with high probability, leavig that as a exercise. We kow that if X 2 = if sup trace(u T X ). Now, trace(u T X ) is a cetered Gaussia radom process idexed by U ad. We may costruct a secod cetered Gaussia radom process idex by U ad by defiig Y U, = trace U T W + trace T Z, where W, Z are idepedet ormal i.i.d. Gaussia matrices. We thus have the followig E[(X U, X U, )2 ]=E[[trace (X( U T (U ) T )] 2 ]= U T (U ) T 2 F. (28) ad E[(trace((U U ) T W ) + trace(( ) T Z)) 2 ] U = E[(trace((U U ) T W )) 2 + (trace(( ) T Z)) 2 ] = (U U ) 2 F + ( ) 2 F (29) Recall that U ad are the vectorized versios of the correspodig matrices. Equatio (28) is upper bouded by equatio (29). O the other had, if =, the equatio (28) equals equatio (29), thus verifyig the coditios of the Gordo-Slepai Lemma. Therefore, by the lemma, it immediately follows the that E if sup U T X E if sup U T W + T Z U U = E W F E Z 2 2 ( p + m ) 3

14 as desired. 4

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba 1 Pradeep Ravikumar 2 Marti J. Waiwright 1,3 Bi Yu 1,3 Departmet of EECS 1 Departmet of CS 2 Departmet

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY Submitted to the Aals of Statistics FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY By Alekh Agarwal ad Sahad Negahba ad Marti J. Waiwright UC Berkeley, Departmet

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY The Aals of Statistics 2012, Vol. 40, No. 5, 2452 2482 DOI: 10.1214/12-AOS1032 Istitute of Mathematical Statistics, 2012 FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY

More information

High-Dimensional Graphical Model Selection Using l 1 -Regularized Logistic Regression

High-Dimensional Graphical Model Selection Using l 1 -Regularized Logistic Regression High-Dimesioal Graphical Model Selectio Usig l 1 -Regularized Logistic Regressio Marti J. Waiwright Pradeep Ravikumar Joh D. Lafferty Departmet of Statistics Machie Learig Dept. Computer Sciece Dept. Departmet

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness Lower bouds o miimax rates for oparametric regressio with additive sparsity ad smoothess Garvesh Raskutti 1, Marti J. Waiwright 1,2, Bi Yu 1,2 1 UC Berkeley Departmet of Statistics 2 UC Berkeley Departmet

More information

Supplemental Material: Proofs

Supplemental Material: Proofs Proof to Theorem Supplemetal Material: Proofs Proof. Let be the miimal umber of traiig items to esure a uique solutio θ. First cosider the case. It happes if ad oly if θ ad Rak(A) d, which is a special

More information

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh Departmet of Statistics Uiversity of Califoria, Berkeley Berkeley, CA 94720 ploh@berkeley.edu Marti J. Waiwright

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK)

LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) LECTURE 8: ORTHOGONALITY (CHAPTER 5 IN THE BOOK) Everythig marked by is ot required by the course syllabus I this lecture, all vector spaces is over the real umber R. All vectors i R is viewed as a colum

More information

A primer on high-dimensional statistics: Lecture 2

A primer on high-dimensional statistics: Lecture 2 A primer o high-dimesioal statistics: Lecture 2 Marti Waiwright UC Berkeley Departmets of Statistics, ad EECS Simos Istitute Workshop, Bootcamp Tutorials High-level overview Regularized M-estimators: May

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ. 2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

arxiv: v1 [math.pr] 13 Oct 2011

arxiv: v1 [math.pr] 13 Oct 2011 A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,

More information

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients. Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting

Supplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Stochastic Matrices in a Finite Field

Stochastic Matrices in a Finite Field Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices

More information

Information-theoretic bounds on model selection for Gaussian Markov random fields

Information-theoretic bounds on model selection for Gaussian Markov random fields Iformatio-theoretic bouds o model selectio for Gaussia Markov radom fields Wei Wag, Marti J. Waiwright,, ad Kaa Ramchadra Departmet of Electrical Egieerig ad Computer Scieces, ad Departmet of Statistics

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Sieve Estimators: Consistency and Rates of Convergence

Sieve Estimators: Consistency and Rates of Convergence EECS 598: Statistical Learig Theory, Witer 2014 Topic 6 Sieve Estimators: Cosistecy ad Rates of Covergece Lecturer: Clayto Scott Scribe: Julia Katz-Samuels, Brado Oselio, Pi-Yu Che Disclaimer: These otes

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Feedback in Iterative Algorithms

Feedback in Iterative Algorithms Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Rademacher Complexity

Rademacher Complexity EECS 598: Statistical Learig Theory, Witer 204 Topic 0 Rademacher Complexity Lecturer: Clayto Scott Scribe: Ya Deg, Kevi Moo Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved for

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls Garvesh Raskutti Marti J. Waiwright, garveshr@stat.berkeley.edu waiwrig@stat.berkeley.edu Bi Yu, biyu@stat.berkeley.edu arxiv:090.04v

More information

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung Bull. Korea Math. Soc. 36 (999), No. 3, pp. 45{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Abstract. This paper provides suciet coditios which esure the strog cosistecy of regressio

More information

High-dimensional support union recovery in multivariate regression

High-dimensional support union recovery in multivariate regression High-dimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate Supplemetary Material for Fast Stochastic AUC Maximizatio with O/-Covergece Rate Migrui Liu Xiaoxua Zhag Zaiyi Che Xiaoyu Wag 3 iabao Yag echical Lemmas ized versio of Hoeffdig s iequality, ote that We

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Riesz-Fischer Sequences and Lower Frame Bounds

Riesz-Fischer Sequences and Lower Frame Bounds Zeitschrift für Aalysis ud ihre Aweduge Joural for Aalysis ad its Applicatios Volume 1 (00), No., 305 314 Riesz-Fischer Sequeces ad Lower Frame Bouds P. Casazza, O. Christese, S. Li ad A. Lider Abstract.

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Binary classification, Part 1

Binary classification, Part 1 Biary classificatio, Part 1 Maxim Ragisky September 25, 2014 The problem of biary classificatio ca be stated as follows. We have a radom couple Z = (X,Y ), where X R d is called the feature vector ad Y

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

Sparse Estimation with Strongly Correlated Variables using Ordered Weighted l 1 Regularization

Sparse Estimation with Strongly Correlated Variables using Ordered Weighted l 1 Regularization Sparse Estimatio with Strogly Correlated Variables usig Ordered Weighted l Regularizatio Mário A. T. Figueiredo Istituto de Telecomuicaçõe ad Istituto Superior Técico, Uiversidade de Lisboa, Portugal Robert

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Supplementary material to Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization

Supplementary material to Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization Electroic Joural of Statistics ISSN: 1935-754 Supplemetary material to No-egative least squares for high-dimesioal liear models: cosistecy ad sparse recovery without regularizatio Marti Slawski ad Matthias

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

arxiv: v1 [stat.ml] 5 Aug 2008

arxiv: v1 [stat.ml] 5 Aug 2008 Uio support recovery i high-dimesioal multivariate regressio Guillaume Oboziski Marti J. Waiwright, Michael I. Jorda, {gobo, waiwrig, jorda}@stat.berkeley.edu Departmet of Statistics, ad Departmet of Electrical

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 8: October 20, Applications of SVD: least squares approximation Mathematical Toolkit Autum 2016 Lecturer: Madhur Tulsiai Lecture 8: October 20, 2016 1 Applicatios of SVD: least squares approximatio We discuss aother applicatio of sigular value decompositio (SVD) of

More information

4 The Sperner property.

4 The Sperner property. 4 The Sperer property. I this sectio we cosider a surprisig applicatio of certai adjacecy matrices to some problems i extremal set theory. A importat role will also be played by fiite groups. I geeral,

More information

The random version of Dvoretzky s theorem in l n

The random version of Dvoretzky s theorem in l n The radom versio of Dvoretzky s theorem i l Gideo Schechtma Abstract We show that with high probability a sectio of the l ball of dimesio k cε log c > 0 a uiversal costat) is ε close to a multiple of the

More information

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4. 4. BASES I BAACH SPACES 39 4. BASES I BAACH SPACES Sice a Baach space X is a vector space, it must possess a Hamel, or vector space, basis, i.e., a subset {x γ } γ Γ whose fiite liear spa is all of X ad

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Bounds for the Extreme Eigenvalues Using the Trace and Determinant

Bounds for the Extreme Eigenvalues Using the Trace and Determinant ISSN 746-7659, Eglad, UK Joural of Iformatio ad Computig Sciece Vol 4, No, 9, pp 49-55 Bouds for the Etreme Eigevalues Usig the Trace ad Determiat Qi Zhog, +, Tig-Zhu Huag School of pplied Mathematics,

More information