Information-theoretic bounds on model selection for Gaussian Markov random fields

Size: px
Start display at page:

Download "Information-theoretic bounds on model selection for Gaussian Markov random fields"

Transcription

1 Iformatio-theoretic bouds o model selectio for Gaussia Markov radom fields Wei Wag, Marti J. Waiwright,, ad Kaa Ramchadra Departmet of Electrical Egieerig ad Computer Scieces, ad Departmet of Statistics UC Berkeley, Berkeley, CA 970 {wagwei, waiwrig, kaar}@eecs.berkeley.edu ISIT 00, Austi, Texas, U.S.A., Jue 3-8, 00 Abstract The problem of graphical model selectio is to estimate the graph structure of a ukow Markov radom field based o observed samples from the graphical model. For Gaussia Markov radom fields, this problem is closely related to the problem of estimatig the iverse covariace matrix of the uderlyig Gaussia distributio. This paper focuses o the iformatio-theoretic limitatios of Gaussia graphical model selectio ad iverse covariace estimatio i the highdimesioal settig, i which the graph size p ad maximum ode degree d are allowed to grow as a fuctio of the sample size. Our first result establishes a set of ecessary coditios o (p, d for ay recovery method to cosistetly estimate the uderlyig graph. Our secod result provides ecessary coditios for ay decoder to produce a estimate b Θ of the true iverse covariace matrix Θ satisfyig b Θ Θ < δ i the elemetwise l -orm (which implies aalogous results i the Frobeius orm as well. Combied with previously kow sufficiet coditios for polyomial-time algorithms, these results yield sharp characterizatios i several regimes of iterest. I. INTRODUCTION Markov radom fields or udirected graphical models are families of multivariate probability distributios whose factorizatio ad coditioal idepedece properties are characterized by the structure of a uderlyig graph ]. Graphical model selectio refers to the problem of estimatig the graph structure based o observed samples from a Markov radom field. This problem arises i a wide variety of settigs, icludig statistical image aalysis, atural laguage processig, ad computatioal biology. I may applicatios, this problem is of iterest uder high-dimesioal scalig, meaig both the graph size p ad the umber of samples are large. Classical methods, such as those based directly o the sample covariace, are kow (via radom matrix theory ] to break dow whe p/ does ot go to zero. Cosequetly, i the highdimesioal regime where p, additioal structure is required i order to obtai cosistet estimators. Accordigly, a lie of recet work has focused o developig computatioally efficiet methods to solve this problem by imposig sparsity o the uderlyig graph. I particular, methods based o l - regularizatio (e.g. 3], ], 5], 6], 7] have bee show to yield cosistet estimators for Gaussia graphical models, or the associated iverse covariace matrices. Complemetary i ature to such achievable results are the iformatio-theoretic limits associated with ay procedure for graphical model selectio. Such aalysis ca serve two purposes. First, it ca demostrate whe kow polyomial-time algorithms achieve the iformatio-theoretic bouds. Secod, it ca reveal regimes i which there exists a gap betwee the performace of curret methods ad the fudametal limits. With this motivatio, some previous work (8], 9] has studied both ecessary ad sufficiet coditios for graphical model selectio i discrete Markov radom fields. The focus of this paper is o the iformatio-theoretic limits of Gaussia graphical model selectio, i which the observed radom vector has a multivariate Gaussia distributio. For Gaussia Markov radom fields, by the Hammersley-Clifford theorem ], the model selectio problem is equivalet to estimatig the off-diagoal sparsity patter of the iverse covariace matrix. I this paper, we study the esemble G d,p of graphs o p vertices with maximum degree at most d, ad derive two mai results. Our first result is to derive coditios o the sample size, graph size p, ad maximum ode degree d that are ecessary for ay method to correctly recover the uderlyig graph with probability of error goig to zero. Our secod result addresses the problem of estimatig the iverse covariace matrix Θ, ad establishes ecessary coditios for ay method to produce a estimate Θ satisfyig Θ Θ <δ. Our results ca be compared agaist kow sufficiet coditios for graph selectio ad iverse covariace estimatio usig l -pealized maximum likelihood 7], ad reveal regimes i which this polyomial-time algorithm achieves the iformatio-theoretic scalig. Oe cosequece of our results is coditios uder which the scalig o the sample size =Ω(d log p is sharp. This paper is orgaized as follows. I Sectio II, we begi with some backgroud ad a precise formulatio of the problem. Sectio III provides the statemets of our mai results ad a discussio of their cosequeces. Sectio IV describes a geeral framework for derivig iformatio-theoretic lower bouds ad discusses several approaches for boudig the mutual iformatio that arises i Fao s iequality. Subsectios IV-B ad IV-C are devoted to the proofs of the ecessary coditios for graphical model selectio ad iverse covariace estimatio. Give space costraits, this paper oly provides statemets ad high-level proof ideas; we refer the reader to the techical report 0] for details. We coclude i Sectio V with a discussio of ope directios /0/$ IEEE 373 ISIT 00

2 ISIT 00, Austi, Texas, U.S.A., Jue 3-8, (a (b Figure. Illustratio of Gaussia Markov radom fields. (a Give a udirected graph, associate a radom variable X i with each vertex i i the graph. A GMRF is the collectio of Gaussia distributios over the vector X that respect the structure of the graph. (b Sparsity patter of the iverse covariace matrix Θ associated with the GMRF i (a. II. BACKGROUND AND PROBLEM FORMULATION We begi with some backgroud o Gaussia Markov radom fields. We the formulate the graphical model selectio problem, which for Gaussia models is directly related to estimatio of the iverse covariace matrix. Our goal is to derive iformatio-theoretic lower bouds o the umber of samples required for recovery, which apply to ay procedure regardless of its computatioal complexity. A. Gaussia Markov radom fields Let X =(X,...,X p be a multivariate Gaussia radom vector with zero mea ad covariace matrix Σ. Accordigly, its desity is determied completely by the iverse covariace matrix Θ=Σ, ad has the form φ(x;0, Σ = 3 (πp det(θ exp{ xt Θx}. ( For a give udirected graph G = (V,E with vertex set V ad edge set E V V, we associate a radom variable X i with each vertex i V. The Gaussia Markov radom field associated with the graph G is the family of Gaussia distributios that respect the Markov properties of G. I particular, the off-diagoal sparsity patter of the iverse covariace matrix Θ is specified by the edge structure of the graph, such that Θ ij =0if (i, j E (see Figure. Give i.i.d. samples from a ukow Markov radom field, the problem of estimatig the iverse covariace matrix Θ correspods to recoverig the graphical model istace, while the problem of estimatig the uderlyig graph G correspods to graphical model selectio. We defie the maximum degree of the graph as d := max {j V (i, j E}, ( i V which is equal to the maximum umber of o-zeros per row of the iverse covariace matrix Θ. Note that we are ot icludig self-loops at each vertex i the degree cout, correspodig to the diagoal etries Θ ii. We ofte write Θ(G to emphasize the graph-based structure of Θ. B. Classes of graphical models Let G p,d be a family of udirected graphs o p vertices with edge sets that have degree at most d. For a give graph G G p,d, let Σ(G be the covariace matrix of a Gaussia Markov radom field (GMRF defied by the graph G. By defiitio, the iverse covariace matrix Θ(G must have o-zeros oly i positios correspodig to edges i E. I additio to graph structure, the difficulty of graphical model selectio also depeds o properties of the iverse covariace matrix etries. We measure the miimum value of each matrix Θ(G by the fuctio λ Θ st (Θ(G : = mi, (3 (s,t E Θss Θ tt so that it is ivariat to rescalig of the data. We study the class G p,d (λ of Gaussia Markov radom fields parameterized by a lower boud λ o the miimum value, defied as the set of probability distributios φ Θ(G = φ(0, Σ(G where the uderlyig graph G G p,d, the iverse covariace matrix satisfies Θ st =0if (s, t / E, ad λ (Θ(G λ. C. Decoders ad error metrics ( Suppose we are give i.i.d. vector samples X = X (,...,X ( R p from a ukow distributio φ Θ(G i the class G p,d (λ. Graphical model selectio refers to the problem of estimatig the uderlyig graph G based o the observatios X. A decoder ψ : R p G p,d maps the observatios X to a estimated graph Ĝ = ψ(x. We defie the error metric betwee the estimate Ĝ ad the true uderlyig graph G usig the 0- loss fuctio Iψ(X G]. For ay decoder ψ, we defie the maximal probability of error over the class G p,d (λ as p err (ψ := max P Θ(G ψ(x G ], ( φ Θ(G G p,d (λ where the error probability P Θ(G ψ(x G ] = E Θ(G Iψ(X G] ] is take with respect to the product distributio P Θ(G ( =φ( ; 0, Σ(G over i.i.d. samples. I cotrast to graphical model selectio (i which the goal is to recover the support set of Θ(G, the goal of iverse covariace estimatio is to estimate the umerical values of the iverse covariace matrix. More precisely, a decoder ψ : R p G p,d (λ maps the samples X to a estimate Θ = ψ(x. We measure the error betwee the estimate Θ ad the true iverse covariace matrix Θ usig the elemetwise l -orm Θ Θ := max st Θ st Θ st, ad defie the probability of error P Θ(G Θ Θ δ/ ]. The maximal probability of error over the model class G p,d (λ is the defied as p err ( ψ := max P Θ(G Θ Θ δ/ ].(5 φ Θ(G G p,d (λ Although the error metrics for graphical model selectio ad iverse covariace estimatio are closely related, either 37

3 ISIT 00, Austi, Texas, U.S.A., Jue 3-8, 00 recovery guaratee is strictly stroger tha the other. I particular, it is possible to recover the true graph (i.e. Ĝ = G eve whe Θ Θ δ/, sice the graph structure is determied oly by which etries are zero. Coversely, it is also possible to recover a estimate satisfyig Θ Θ <δ/ ad still fail to recover the true graph, if for istace there is a o-zero edge weight less tha δ/. With this set-up, our goal is to derive ecessary coditios o the sample size (p, d, λ for ay decoder to reliably recover the uderlyig graph (or estimate the iverse covariace matrix. We say that recovery is asymptotically reliable over the graphical model class G p,d (λ if p err 0 as. Our aalysis is high-dimesioal i ature, i which the graph size p, maximum degree d, ad miimum value λ are all allowed to scale arbitrarily as the umber of samples teds to ifiity. III. MAIN RESULTS AND CONSEQUENCES I this sectio, we state our mai results o the iformatiotheoretic limits of Gaussia graphical model selectio ad iverse covariace estimatio, ad the discuss some of their cosequeces. A. Graphical model selectio We begi with a set of ecessary coditios for graphical model selectio, applicable to ay recovery method regardless of its computatioal complexity. Theorem. Cosider the class G p,d (λ of Gaussia Markov radom fields with λ 0, ]. A ecessary coditio for asymptotically reliable graphical model selectio over the class G p,d (λ is >max { ( log p d λ, log ( p d ( log( + dλ λ dλ +(d λ } (6. The proof of Theorem (give i Sectio IV-B costructs restricted esembles of graphical models ad the, viewig the observatio process as a commuicatio chael, applies Fao s iequality ] i order to boud the probability of error. The bouds i Theorem capture how the sample size must grow with graph size p ad miimum value λ. I particular, i order for the sum of the edge weights i each eighborhood of the graph to stay bouded, the miimum value must scale as λ = Θ( d. I this regime, the first boud i Theorem implies that the sample size must scale as =Ω(d log(p d. For ay costat λ 0, /], the secod boud i Theorem scales as = Ω ( d log(p/d log(+dλ. Moreover, it implies that =Ω(d ɛ log( p d for ay ɛ>0. The iformatio-theoretic bouds i Theorem ca be compared with previous work o polyomial-time methods for cosistet graph selectio. I particular, Ravikumar et al. 7] showed that a sufficiet coditio for l -regularized maximum likelihood to cosistetly estimate the uderlyig graph is =Ω((d + λ logp. I the regime i which λ = Θ( d, this scalig matches the iformatio-theoretic bouds i Theorem, showig that a polyomial-time method achieves the optimal rates (up to costat factors. B. Iverse covariace estimatio We ow state some ecessary coditios for the closely related problem of iverse covariace estimatio. Recall that A :=max ij A ij deotes the elemetwise l -orm applied to a matrix. Theorem. Cosider the class of Gaussia Markov radom fields G p,d (λ. If there exists a estimator such that P Θ Θ <δ/] / uiformly over choices from G p,d (λ, the we must have > log ( δ. (7 The proof of Theorem, give i Sectio IV-C, is based o costructig restricted esembles of graphical models with miimum separatio δ, ad the applyig Fao s iequality ] to boud the probability of decodig error i distiguishig betwee such models. Theorem captures how the sample size must grow with the miimum separatio betwee models δ. A cosequece of Theorem is that if the recovery error decays at rate δ =/d, the the sample size must scale as >d ( log ( /. Furthermore, Theorem implies that the same ecessary coditio holds for iverse covariace estimatio with other error metrics as well. I particular, let A F :=( ij A ij / deote the Frobeius orm. Corollary. A ecessary coditio for asymptotically reliable iverse covariace estimatio, with recovery error ( at most δ/ measured i the Frobeius orm, is > log δ. The ecessary coditio i Theorem ca be compared to kow sufficiet coditios for l -regularized maximum likelihood to cosistetly estimate the iverse covariace matrix. Ravikumar et al. 7] showed that if the sample size satisfies >cd log p for some costat c>0, the with probability goig to oe, the l -regularized maximum likelihood method ( produces a estimate Θ satisfyig Θ Θ = O log p Cosequetly, the performace of the polyomial-time algorithm i 7] matches the scalig of the iformatio-theoretic boud i Theorem. IV. PROOF SKETCHES I this sectio, we describe our geeral framework for derivig ecessary coditios for cosistet graphical model selectio ad iverse covariace estimatio. Our methods are iformatio-theoretic i ature, ispired by techiques that have bee used to derive miimax bouds i oparametric estimatio (e.g., ], 3]. A. Fao s method Our geeral approach is to costruct restricted esembles of graphical models, ad the use Fao s method to lower boud the probability of error i each restricted esemble. Cosider a restricted esemble G cosistig of M = G models, ad let model idex θ be chose uiformly at radom from. 375

4 ISIT 00, Austi, Texas, U.S.A., Jue 3-8, 00 {,...,M}. Give the observatios X R ν, the decoder ψ estimates the uderlyig graph structure with maximal probability of decodig error defied as ψ( X Gj ] p err ( ψ = max P e j=,...,m Θ( Gj e. (8 By Fao s iequality ], the maximal probability of error over G ca be lower bouded as p err ( ψ I( θ; X + log M. (9 I order to make use of the Fao boud, the key is to desig esembles of models for which log M is large, while the mutual iformatio I ( θ; X is relatively small. Sice it is typically difficult to evaluate the mutual iformatio exactly, we discuss some upper bouds o it. Etropy-based boud: Defie the averaged covariace matrix Σ := M Σ ( Gj. (0 j= The mutual iformatio is upper bouded by I ( θ; X F ( G, where F ( G := logdet Σ M log det Σ ( Gj. ( j= KL-based boud: Let P j = f ( ( X θ = j = φ 0, Σ( Gj for j =,...,M. A alterative boud o the mutual iformatio is give by I ( θ; X E θ D(P θ Q] ( for ay distributio Q over X. Settig Q = φ(0,i ν ν, the KL distace ca be expressed as D(P j Q = { log det Gj +trace ( Σ( Gj ν }. (3 Note that we are assumig log e throughout this paper. B. Aalysis of graphical model selectio We ow briefly outlie the proofs of the ecessary coditios i Theorem o the sample size as a fuctio of the umber of vertices p, maximum degree d ad miimum value λ. We obtai two ecessary coditios, which ca be see as ed poits of a etire family of bouds, by aalyzig esembles of graphs i which a subset S of up to d odes form a clique (i.e. fully coected subset, ad the remaiig odes are all isolated. Restricted esemble A: We begi by derivig the first boud i Theorem, which captures how the sample size must grow with the miimum value λ. Cosider a family of graphs o p vertices, i which each edge set E(S, T ={(s, t s, t S or s, t T } defies a clique over a subset S of size, ad aother clique over a disjoit subset T of size d. Foragive graph G =(V,E(S, T ad a parameter a 0, we defie the iverse covariace matrix Θ(G :=I + a S T S + a T T T, where S ad T are the idicator vectors of sets S ad T, respectively. The covariace matrix ca the be computed as Σ(G = I a +a S T S a +da T T T. ( The resultig class of graphical models is a subset of G p,d (λ if λ (Θ(G = a +a λ. Suppose the decoder is give the idices of the d vertices i T, ad the parameter value a. Estimatig the uderlyig graph structure G ow amouts to fidig the remaiig pair of odes i S, out of ( p d possibilities. More precisely, give (T,a, the decoder ca extract the submatrix of observatios X :=(X T C R (p d. Whe the origial observatios are sampled i.i.d. from the distributio X (i N(0, Σ, the modified observatios are distributed accordig to X (i N(0, Σ T C T C. Sice the modified covariace matrix is of the form Σ ( G := Σ T C T C = I a +a S T S, (5 the iverse covariace matrix becomes Θ ( G = ( Σ( G = I + a S T S. (6 Note that the uderlyig graph associated with Θ ( G is G := G \ T (i.e. the graph obtaied by removig the vertices i set T ad all edges coected to T from graph G. The remaiig sub-problem is to determie, give the observatios X, the sigle edge graph o (p d vertices. Let G deote the set of graphs o (p d vertices with a sigle edge, ad let G(λ deote the associated class of Gaussia Markov radom fields with iverse covariace matrices defied as i (6. The proof the applies the Fao boud (9 over this restricted esemble usig the etropy-based boud o mutual iformatio (. Restricted esemble B: We ow derive the secod lower boud i Theorem usig a esemble of d-clique graphs ad the etropy-based boud o mutual iformatio (. Cosider the esemble of graphs cosistig of edge sets E(S ={(s, t s, t S} with S = d. For a give edge set E(S ad paramter a 0, defie the iverse covariace matrix Θ(G :=I + a S T S, ad its associated covariace matrix Σ(G =(Θ(G = I a +da S T S. The cardiality of this restricted esemble is ( p d. The proof the follows by applyig Fao s iequality (9 usig the etropy-based boud (. C. Aalysis for iverse covariace estimatio I this sectio, we provide the basic ituitio uderlyig the proof of Theorem. We derive a set of ecessary coditios for iverse covariace estimatio usig a esemble of graphical models which share the same uderlyig graph, but vary by perturbig a sigle edge weight. These bouds capture the difficulty of distiguishig betwee models with iverse covariace matrices that are δ-close, e.g i the elemetwise 376

5 ISIT 00, Austi, Texas, U.S.A., Jue 3-8, 00 l -orm. Note that for ay two models Θ (i ad Θ (j i our esemble, sice Θ (i Θ (j = δ by costructio, there does ot exist a matrix Θ satisfyig both Θ Θ (i <δ/ ad Θ Θ (j <δ/. Cosequetly, we ca apply Fao s iequality (9 to boud the probability of error i the restricted esemble, ad the problem is reduced to boudig the mutual iformatio betwee the model idex ad the observatios. Alterate KL boud: We begi by statig a variat of the KL-based boud o mutual iformatio i (3, usig KL distaces betwee all pairs of models i the class, istead of KL distaces betwee each model ad the stadard Gaussia distributio. Pairwise KL-based boud: We defie the symmetrized Kullback-Leibler divergece, S(P i P j := D(P i P j +D(P j P i. (7 By covexity of the KL divergece, we have the followig boud o mutual iformatio I ( θ; X M i= j=i+ S(P i P j. (8 For Gaussia Markov radom fields, a straightforward calculatio shows that the symmetrized KL distace is equal to S(P i P j = p p l= m= ( ( Θ (i lm Θ(j lm Σ (j lm Σ(i lm (9 Restricted esemble C: We ow use these methods to derive ecessary coditios for iverse covariace estimatio (stated i Theorem, which capture how the sample size must grow with the miimum separatio betwee models δ. Cosider a graph o p vertices cosistig of p d+ cliques, where each clique is of size (d +. Let N = p d+, ad let {S,...,S N } deote the N cliques with S i = d +.We defie the iverse covariace matrix associated with this graph as Θ := I + a N Si T S i, (0 for some parameter a 0. From this base model, we geerate a esemble of Gaussia Markov radom fields i which each model perturbs the weight associated with oe edge. Thus the model obtaied by perturbig the weight o edge (s, t is defied by the iverse covariace matrix Θ (i := Θ+δ( st T st I st for some parameter δ (0, ]. Note that we are usig ( st T st I st to deote the matrix with oes i locatios (s, t ad (t, s, ad zeros elsewhere. The resultig esemble of graphical models has cardiality M = p d+ ( d+. The proof the computes the KLbased boud o mutual iformatio i (9 ad applies Fao s iequality (9. i= V. DISCUSSION I this paper, we have studied the iformatio-theoretic limits of Gaussia graphical model selectio ad iverse covariace estimatio i the high-dimesioal settig. Our aalysis yields a set of ecessary coditios for cosistet graph selectio with ay method, which matches the scalig of kow sufficiet coditios 7] for l -regularized maximum likelihood i regimes i which the miimum value scales as λ =Θ( d. The tightess of the bouds i other regimes of λ is a iterestig ope questio. Furthermore, we derived a set of ecessary coditios for iverse covariace estimatio, which similarly matches the performace of polyomial-time recovery methods 7]. Our results cosider recovery i the elemetwise l ad Frobeius orms; the tightess of the ecessary coditios for recovery i other orms is a iterestig ope questio. At a high-level, our aalysis is based o a geeral framework for derivig iformatio-theoretic bouds i which we view the observatio process as a commuicatio chael, ad may be applicable to other problems as well. Ackowledgmet The work of WW ad KR was supported by NSF grat CCF ad AFOSR grat FA The work of MJW was supported by NSF grats CAREER-CCF ad AFOSR-09NL8. REFERENCES ] S. L. Lauritze, Graphical Models. Oxford: Oxford Uiversity Press, 996. ] V. A. Marceko ad L. A. Pastur, Distributio of eigevalues for some sets of radom matrices, Aals of Probability, vol., o., pp , ] M. Yua ad Y. Li, Model selectio ad estimatio i the Gaussia graphical model, Biometrika, vol. 9, o., pp. 9 35, 007. ] J. Friedma, T. Hastie, ad R. Tibshirai, Sparse iverse covariace estimatio with the graphical lasso, Biostatistics, vol. 9, o. 3, pp. 3, ] A. d Aspremot, O. Baerjee, ad L. E. Ghaoui, First order methods for sparse covariace selectio, SIAM Joural o Matrix Aalysis ad its Applicatios, vol. 30, o., pp , ] A. J. Rothma, P. J. Bickel, E. Levia, ad J. Zhu, Sparse permutatio ivariat covariace estimatio, Electroic Joural of Statistics, vol., pp. 9 55, ] P. Ravikumar, M. J. Waiwright, G. Raskutti, ad B. Yu, Highdimesioal covariace estimatio by miimizig l -pealized logdetermiat divergece, Departmet of Statistics, UC Berkeley, Tech. Rep. 767, November ] N. Sathaam ad M. J. Waiwright, Iformatio-theoretic limits of selectig biary graphical models i high dimesios, i Iteratioal Symposium o Iformatio Theory (ISIT, Toroto, Caada, July ] G. Bresler, E. Mossel, ad A. Sly, Recostructio of markov radom fields from samples: Some easy observatios ad algorithms, UC Berkeley, Tech. Rep. arxiv, ] W. Wag, M. J. Waiwright, ad K. Ramchadra, Iformatiotheoretic bouds o model selectio for Gaussia markov radom fields, Departmet of Statistics, UC Berkeley, Tech. Rep., May 00. ] T. Cover ad J. Thomas, Elemets of Iformatio Theory. New York: Joh Wiley ad Sos, 99. ] B. Yu, Assouad, Fao ad Le Cam, Research Papers i Probability ad Statistics: Festschrift i Hoor of Lucie Le Cam, pp. 3 35, ] Y. Yag ad A. Barro, Iformatio-theoretic determiatio of miimax rates of covergece, Aals of Statistics, vol. 7, o. 5, pp ,

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness

Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness Lower bouds o miimax rates for oparametric regressio with additive sparsity ad smoothess Garvesh Raskutti 1, Marti J. Waiwright 1,2, Bi Yu 1,2 1 UC Berkeley Departmet of Statistics 2 UC Berkeley Departmet

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Notes for Lecture 11

Notes for Lecture 11 U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Lecture 9: Expanders Part 2, Extractors

Lecture 9: Expanders Part 2, Extractors Lecture 9: Expaders Part, Extractors Topics i Complexity Theory ad Pseudoradomess Sprig 013 Rutgers Uiversity Swastik Kopparty Scribes: Jaso Perry, Joh Kim I this lecture, we will discuss further the pseudoradomess

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology Advaced Aalysis Mi Ya Departmet of Mathematics Hog Kog Uiversity of Sciece ad Techology September 3, 009 Cotets Limit ad Cotiuity 7 Limit of Sequece 8 Defiitio 8 Property 3 3 Ifiity ad Ifiitesimal 8 4

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Lecture #20. n ( x p i )1/p = max

Lecture #20. n ( x p i )1/p = max COMPSCI 632: Approximatio Algorithms November 8, 2017 Lecturer: Debmalya Paigrahi Lecture #20 Scribe: Yua Deg 1 Overview Today, we cotiue to discuss about metric embeddigs techique. Specifically, we apply

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Disjoint Systems. Abstract

Disjoint Systems. Abstract Disjoit Systems Noga Alo ad Bey Sudaov Departmet of Mathematics Raymod ad Beverly Sacler Faculty of Exact Scieces Tel Aviv Uiversity, Tel Aviv, Israel Abstract A disjoit system of type (,,, ) is a collectio

More information

Lecture 7: October 18, 2017

Lecture 7: October 18, 2017 Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

Agnostic Learning and Concentration Inequalities

Agnostic Learning and Concentration Inequalities ECE901 Sprig 2004 Statistical Regularizatio ad Learig Theory Lecture: 7 Agostic Learig ad Cocetratio Iequalities Lecturer: Rob Nowak Scribe: Aravid Kailas 1 Itroductio 1.1 Motivatio I the last lecture

More information

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS

THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS THE ASYMPTOTIC COMPLEXITY OF MATRIX REDUCTION OVER FINITE FIELDS DEMETRES CHRISTOFIDES Abstract. Cosider a ivertible matrix over some field. The Gauss-Jorda elimiatio reduces this matrix to the idetity

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls Miimax rates of estimatio for high-dimesioal liear regressio over l q -balls Garvesh Raskutti Marti J. Waiwright, garveshr@stat.berkeley.edu waiwrig@stat.berkeley.edu Bi Yu, biyu@stat.berkeley.edu arxiv:090.04v

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

High-dimensional covariance estimation by minimizing

High-dimensional covariance estimation by minimizing Electroic Joural of Statistics ISSN: 1935-7524 High-dimesioal covariace estimatio by miimizig l 1 -pealized log-determiat divergece Pradeep Ravikumar, Marti J. Waiwright, Garvesh Raskutti ad Bi Yu Berkeley,

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

Lecture 11: Channel Coding Theorem: Converse Part

Lecture 11: Channel Coding Theorem: Converse Part EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

Spectral Partitioning in the Planted Partition Model

Spectral Partitioning in the Planted Partition Model Spectral Graph Theory Lecture 21 Spectral Partitioig i the Plated Partitio Model Daiel A. Spielma November 11, 2009 21.1 Itroductio I this lecture, we will perform a crude aalysis of the performace of

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

Lecture 19. sup y 1,..., yn B d n

Lecture 19. sup y 1,..., yn B d n STAT 06A: Polyomials of adom Variables Lecture date: Nov Lecture 19 Grothedieck s Iequality Scribe: Be Hough The scribes are based o a guest lecture by ya O Doell. I this lecture we prove Grothedieck s

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

Minimax rates of estimation for high-dimensional linear regression over l q -balls

Minimax rates of estimation for high-dimensional linear regression over l q -balls TO APPEAR IN IEEE TRANS. OF INFORMATION THEORY Miimax rates of estimatio for high-dimesioal liear regressio over l -balls Garvesh Raskutti, Marti J. Waiwright, Seior Member, IEEE ad Bi Yu, Fellow, IEEE.

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Independence number of graphs with a prescribed number of cliques

Independence number of graphs with a prescribed number of cliques Idepedece umber of graphs with a prescribed umber of cliques Tom Bohma Dhruv Mubayi Abstract We cosider the followig problem posed by Erdős i 1962. Suppose that G is a -vertex graph where the umber of

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

MINIMAX ESTIMATION OF LARGE COVARIANCE MATRICES UNDER l 1 -NORM

MINIMAX ESTIMATION OF LARGE COVARIANCE MATRICES UNDER l 1 -NORM Statistica Siica 22 (202), 39-378 doi:http://dx.doi.org/0.5705/ss.200.253 MINIMAX ESTIMATION OF LARGE COVARIANCE MATRICES UNDER l -NORM T. Toy Cai ad Harriso H. Zhou Uiversity of Pesylvaia ad Yale Uiversity

More information

Large holes in quasi-random graphs

Large holes in quasi-random graphs Large holes i quasi-radom graphs Joaa Polcy Departmet of Discrete Mathematics Adam Mickiewicz Uiversity Pozań, Polad joaska@amuedupl Submitted: Nov 23, 2006; Accepted: Apr 10, 2008; Published: Apr 18,

More information

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory

ACO Comprehensive Exam 9 October 2007 Student code A. 1. Graph Theory 1. Graph Theory Prove that there exist o simple plaar triagulatio T ad two distict adjacet vertices x, y V (T ) such that x ad y are the oly vertices of T of odd degree. Do ot use the Four-Color Theorem.

More information

Chapter 6 Infinite Series

Chapter 6 Infinite Series Chapter 6 Ifiite Series I the previous chapter we cosidered itegrals which were improper i the sese that the iterval of itegratio was ubouded. I this chapter we are goig to discuss a topic which is somewhat

More information

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio

More information

Lecture 9: Hierarchy Theorems

Lecture 9: Hierarchy Theorems IAS/PCMI Summer Sessio 2000 Clay Mathematics Udergraduate Program Basic Course o Computatioal Complexity Lecture 9: Hierarchy Theorems David Mix Barrigto ad Alexis Maciel July 27, 2000 Most of this lecture

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

Rates of Convergence by Moduli of Continuity

Rates of Convergence by Moduli of Continuity Rates of Covergece by Moduli of Cotiuity Joh Duchi: Notes for Statistics 300b March, 017 1 Itroductio I this ote, we give a presetatio showig the importace, ad relatioship betwee, the modulis of cotiuity

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh Departmet of Statistics Uiversity of Califoria, Berkeley Berkeley, CA 94720 ploh@berkeley.edu Marti J. Waiwright

More information

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS 18th Feb, 016 Defiitio (Lipschitz fuctio). A fuctio f : R R is said to be Lipschitz if there exists a positive real umber c such that for ay x, y i the domai

More information

Lecture Notes for Analysis Class

Lecture Notes for Analysis Class Lecture Notes for Aalysis Class Topological Spaces A topology for a set X is a collectio T of subsets of X such that: (a) X ad the empty set are i T (b) Uios of elemets of T are i T (c) Fiite itersectios

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Empirical Processes: Glivenko Cantelli Theorems

Empirical Processes: Glivenko Cantelli Theorems Empirical Processes: Gliveko Catelli Theorems Mouliath Baerjee Jue 6, 200 Gliveko Catelli classes of fuctios The reader is referred to Chapter.6 of Weller s Torgo otes, Chapter??? of VDVW ad Chapter 8.3

More information

Asymptotic Coupling and Its Applications in Information Theory

Asymptotic Coupling and Its Applications in Information Theory Asymptotic Couplig ad Its Applicatios i Iformatio Theory Vicet Y. F. Ta Joit Work with Lei Yu Departmet of Electrical ad Computer Egieerig, Departmet of Mathematics, Natioal Uiversity of Sigapore IMS-APRM

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound

Lecture 27. Capacity of additive Gaussian noise channel and the sphere packing bound Lecture 7 Ageda for the lecture Gaussia chael with average power costraits Capacity of additive Gaussia oise chael ad the sphere packig boud 7. Additive Gaussia oise chael Up to this poit, we have bee

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

LONG SNAKES IN POWERS OF THE COMPLETE GRAPH WITH AN ODD NUMBER OF VERTICES

LONG SNAKES IN POWERS OF THE COMPLETE GRAPH WITH AN ODD NUMBER OF VERTICES J Lodo Math Soc (2 50, (1994, 465 476 LONG SNAKES IN POWERS OF THE COMPLETE GRAPH WITH AN ODD NUMBER OF VERTICES Jerzy Wojciechowski Abstract I [5] Abbott ad Katchalski ask if there exists a costat c >

More information

Maximum Likelihood Estimation and Complexity Regularization

Maximum Likelihood Estimation and Complexity Regularization ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio

More information

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION Iteratioal Joural of Pure ad Applied Mathematics Volume 103 No 3 2015, 537-545 ISSN: 1311-8080 (prited versio); ISSN: 1314-3395 (o-lie versio) url: http://wwwijpameu doi: http://dxdoiorg/1012732/ijpamv103i314

More information

Stochastic Simulation

Stochastic Simulation Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Lecture 14: Graph Entropy

Lecture 14: Graph Entropy 15-859: Iformatio Theory ad Applicatios i TCS Sprig 2013 Lecture 14: Graph Etropy March 19, 2013 Lecturer: Mahdi Cheraghchi Scribe: Euiwoog Lee 1 Recap Bergma s boud o the permaet Shearer s Lemma Number

More information

Feedback in Iterative Algorithms

Feedback in Iterative Algorithms Feedback i Iterative Algorithms Charles Byre (Charles Byre@uml.edu), Departmet of Mathematical Scieces, Uiversity of Massachusetts Lowell, Lowell, MA 01854 October 17, 2005 Abstract Whe the oegative system

More information

Linear Support Vector Machines

Linear Support Vector Machines Liear Support Vector Machies David S. Roseberg The Support Vector Machie For a liear support vector machie (SVM), we use the hypothesis space of affie fuctios F = { f(x) = w T x + b w R d, b R } ad evaluate

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

4.1 Data processing inequality

4.1 Data processing inequality ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall

More information

Unbiased Estimation. February 7-12, 2008

Unbiased Estimation. February 7-12, 2008 Ubiased Estimatio February 7-2, 2008 We begi with a sample X = (X,..., X ) of radom variables chose accordig to oe of a family of probabilities P θ where θ is elemet from the parameter space Θ. For radom

More information

Introduction to Computational Biology Homework 2 Solution

Introduction to Computational Biology Homework 2 Solution Itroductio to Computatioal Biology Homework 2 Solutio Problem 1: Cocave gap pealty fuctio Let γ be a gap pealty fuctio defied over o-egative itegers. The fuctio γ is called sub-additive iff it satisfies

More information

Homework Set #3 - Solutions

Homework Set #3 - Solutions EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba 1 Pradeep Ravikumar 2 Marti J. Waiwright 1,3 Bi Yu 1,3 Departmet of EECS 1 Departmet of CS 2 Departmet

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes

The Maximum-Likelihood Decoding Performance of Error-Correcting Codes The Maximum-Lielihood Decodig Performace of Error-Correctig Codes Hery D. Pfister ECE Departmet Texas A&M Uiversity August 27th, 2007 (rev. 0) November 2st, 203 (rev. ) Performace of Codes. Notatio X,

More information

MA131 - Analysis 1. Workbook 2 Sequences I

MA131 - Analysis 1. Workbook 2 Sequences I MA3 - Aalysis Workbook 2 Sequeces I Autum 203 Cotets 2 Sequeces I 2. Itroductio.............................. 2.2 Icreasig ad Decreasig Sequeces................ 2 2.3 Bouded Sequeces..........................

More information

4 The Sperner property.

4 The Sperner property. 4 The Sperer property. I this sectio we cosider a surprisig applicatio of certai adjacecy matrices to some problems i extremal set theory. A importat role will also be played by fiite groups. I geeral,

More information

Lecture 2: Concentration Bounds

Lecture 2: Concentration Bounds CSE 52: Desig ad Aalysis of Algorithms I Sprig 206 Lecture 2: Cocetratio Bouds Lecturer: Shaya Oveis Ghara March 30th Scribe: Syuzaa Sargsya Disclaimer: These otes have ot bee subjected to the usual scrutiy

More information

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS Ryszard Zieliński Ist Math Polish Acad Sc POBox 21, 00-956 Warszawa 10, Polad e-mail: rziel@impagovpl ABSTRACT Weak laws of large umbers (W LLN), strog

More information

arxiv: v1 [math.pr] 4 Dec 2013

arxiv: v1 [math.pr] 4 Dec 2013 Squared-Norm Empirical Process i Baach Space arxiv:32005v [mathpr] 4 Dec 203 Vicet Q Vu Departmet of Statistics The Ohio State Uiversity Columbus, OH vqv@statosuedu Abstract Jig Lei Departmet of Statistics

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

1.010 Uncertainty in Engineering Fall 2008

1.010 Uncertainty in Engineering Fall 2008 MIT OpeCourseWare http://ocw.mit.edu.00 Ucertaity i Egieerig Fall 2008 For iformatio about citig these materials or our Terms of Use, visit: http://ocw.mit.edu.terms. .00 - Brief Notes # 9 Poit ad Iterval

More information

A NOTE ON INVARIANT SETS OF ITERATED FUNCTION SYSTEMS

A NOTE ON INVARIANT SETS OF ITERATED FUNCTION SYSTEMS Acta Math. Hugar., 2007 DOI: 10.1007/s10474-007-7013-6 A NOTE ON INVARIANT SETS OF ITERATED FUNCTION SYSTEMS L. L. STACHÓ ad L. I. SZABÓ Bolyai Istitute, Uiversity of Szeged, Aradi vértaúk tere 1, H-6720

More information

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for

More information

Exercises Advanced Data Mining: Solutions

Exercises Advanced Data Mining: Solutions Exercises Advaced Data Miig: Solutios Exercise 1 Cosider the followig directed idepedece graph. 5 8 9 a) Give the factorizatio of P (X 1, X 2,..., X 9 ) correspodig to this idepedece graph. P (X) = 9 P

More information

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation?

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation? Are Slepia-Wolf Rates Necessary for Distributed Parameter Estimatio? Mostafa El Gamal ad Lifeg Lai Departmet of Electrical ad Computer Egieerig Worcester Polytechic Istitute {melgamal, llai}@wpi.edu arxiv:1508.02765v2

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

A Note on Matrix Rigidity

A Note on Matrix Rigidity A Note o Matrix Rigidity Joel Friedma Departmet of Computer Sciece Priceto Uiversity Priceto, NJ 08544 Jue 25, 1990 Revised October 25, 1991 Abstract I this paper we give a explicit costructio of matrices

More information

Rank Modulation with Multiplicity

Rank Modulation with Multiplicity Rak Modulatio with Multiplicity Axiao (Adrew) Jiag Computer Sciece ad Eg. Dept. Texas A&M Uiversity College Statio, TX 778 ajiag@cse.tamu.edu Abstract Rak modulatio is a scheme that uses the relative order

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information