A Dirty Model for Multi-task Learning

Size: px
Start display at page:

Download "A Dirty Model for Multi-task Learning"

Transcription

1 A Dirty Model for Multi-task Learig Ali Jalali Uiversity of Texas at Austi Suay Saghavi Uiversity of Texas at Austi Pradeep Ravikumar Uiversity of Texas at Asuti Chao Rua Uiversity of Texas at Austi Abstract We cosider multi-task learig i the settig of multiple liear regressio, ad where some relevat features could be shared across the tasks. Recet research has studied the use ofl /l q orm block-regularizatios withq > for such blocksparse structured problems, establishig strog guaratees o recovery eve uder high-dimesioal scalig where the umber of features scale with the umber of observatios. However, these papers also cautio that the performace of such block-regularized methods are very depedet o the extet to which the features are shared across tasks. Ideed they show [8] that if the extet of overlap is less tha a threshold, or eve if parameter values i the shared features are highly ueve, the block l /l q regularizatio could actually perform worse tha simple separate elemetwise l regularizatio. Sice these caveats deped o the ukow true parameters, we might ot kow whe ad which method to apply. Eve otherwise, we are far away from a realistic multi-task settig: ot oly do the set of relevat features have to be exactly the same across tasks, but their values have to as well. Here, we ask the questio: ca we leverage parameter overlap whe it exists, but ot pay a pealty whe it does ot? Ideed, this falls uder a more geeral questio of whether we ca model such dirty data which may ot fall ito a sigle eat structural bracket all block-sparse, or all low-rak ad so o). With the explosio of such dirty high-dimesioal data i moder settigs, it is vital to develop tools dirty models to perform biased statistical estimatio tailored to such data. Here, we take a first step, focusig o developig a dirty model for the multiple regressio problem. Our method uses a very simple idea: we estimate a superpositio of two sets of parameters ad regularize them differetly. We show both theoretically ad empirically, our method strictly ad oticeably outperforms both l or l /l q methods, uder high-dimesioal scalig ad over the etire rage of possible overlaps except at boudary cases, where we match the best method). Itroductio: Motivatio ad Setup High-dimesioal scalig. I fields across sciece ad egieerig, we are icreasigly faced with problems where the umber of variables or features p is larger tha the umber of observatios. Uder such high-dimesioal scalig, for ay hope of statistically cosistet estimatio, it becomes vital to leverage ay potetial structure i the problem such as sparsity e.g. i compressed sesig [3] ad LASSO [4]), low-rak structure [3, 9], or sparse graphical model structure []. It is i such high-dimesioal cotexts i particular that multi-task learig [4] could be most useful. Here,

2 multiple tasks share some commo structure such as sparsity, ad estimatig these tasks oitly by leveragig this commo structure could be more statistically efficiet. Block-sparse Multiple Regressio. A commo multiple task learig settig, ad which is the focus of this paper, is that of multiple regressio, where we haver > respose variables, ad a commo set of p features or covariates. The r tasks could share certai aspects of their uderlyig distributios, such as commo variace, but the settig we focus o i this paper is where the respose variables have simultaeously sparse structure: the idex set of relevat features for each task is sparse; ad there is a large overlap of these relevat features across the differet regressio problems. Such simultaeous sparsity arises i a variety of cotexts [5]; ideed, most applicatios of sparse sigal recovery i cotexts ragig from graphical model learig, kerel learig, ad fuctio estimatio have atural extesios to the simultaeous-sparse settig [,, ]. It is useful to represet the multiple regressio parameters via a matrix, where each colum correspods to a task, ad each row to a feature. Havig simultaeous sparse structure the correspods to the matrix beig largely block-sparse where each row is either all zero or mostly o-zero, ad the umber of o-zero rows is small. A lot of recet research i this settig has focused o l /l q orm regularizatios, for q >, that ecourage the parameter matrix to have such blocksparse structure. Particular examples iclude results usig the l /l orm [6, 5, 8], ad the l /l orm [7, 0]. Dirty Models. Block-regularizatio is heavy-haded i two ways. By strictly ecouragig sharedsparsity, it assumes that all relevat features are shared, ad hece suffers uder settigs, arguably more realistic, where each task depeds o features specific to itself i additio to the oes that are commo. The secod cocer with such block-sparse regularizers is that the l /l q orms ca be show to ecourage the etries i the o-sparse rows takig early idetical values. Thus we are far away from the origial goal of multitask learig: ot oly do the set of relevat features have to be exactly the same, but their values have to as well. Ideed recet research ito such regularized methods [8, 0] cautio agaist the use of block-regularizatio i regimes where the supports ad values of the parameters for each task ca vary widely. Sice the true parameter values are ukow, that would be a worrisome caveat. We thus ask the questio: ca we lear multiple regressio models by leveragig whatever overlap of features there exist, ad without requirig the parameter values to be ear idetical? Ideed this is a istace of a more geeral questio o whether we ca estimate statistical models where the data may ot fall clealy ito ay oe structural bracket sparse, block-sparse ad so o). With the explosio of dirty high-dimesioal data i moder settigs, it is vital to ivestigate estimatio of correspodig dirty models, which might require ew approaches to biased high-dimesioal estimatio. I this paper we take a first step, focusig o such dirty models for a specific problem: simultaeously sparse multiple regressio. Our approach uses a simple idea: while ay oe structure might ot capture the data, a superpositio of structural classes might. Our method thus searches for a parameter matrix that ca be decomposed ito a row-sparse matrix correspodig to the overlappig or shared features) ad a elemetwise sparse matrix correspodig to the o-shared features). As we show both theoretically ad empirically, with this simple fix we are able to leverage ay extet of shared features, while allowig disparities i support ad values of the parameters, so that we are always better tha both the Lasso or block-sparse regularizers at times remarkably so). The rest of the paper is orgaized as follows: I Sec. basic defiitios ad setup of the problem are preseted. Mai results of the paper is discussed i sec 3. Experimetal results ad simulatios are demostrated i Sec 4. Notatio: For ay matrix M, we deote its th row as M, ad its k-th colum as M k). The set of all o-zero rows i.e. all rows with at least oe o-zero elemet) is deoted by RowSuppM) ad its support by SuppM). Also, for ay matrix M, let M, :=,k Mk), i.e. the sums of absolute values of the elemets, ad M, := M where, M := max k M k).

3 Problem Set-up ad Our Method Multiple regressio. We cosider the followig stadard multiple liear regressio model: y k) = X k) θk) +w k), k =,...,r, where y k) R is the respose for the k-th task, regressed o the desig matrix X k) R p possibly differet across tasks), while w k) R is the oise vector. We assume each w k) is draw idepedetly from N0,σ ). The total umber of tasks or target variables is r, the umber of features isp, while the umber of samples we have for each task is. For otatioal coveiece, we collate these quatities ito matrices Y R r for the resposes, Θ R p r for the regressio parameters ad W R r for the oise. Dirty Model. I this paper we are iterested i estimatig the true parameter Θ from data by leveragig ay ukow) extet of simultaeous-sparsity. I particular, certai rows of Θ would have may o-zero etries, correspodig to features shared by several tasks shared rows), while certai rows would be elemetwise sparse, correspodig to those features which are relevat for some tasks but ot all o-shared rows ), while certai rows would have all zero etries, correspodig to those features that are ot relevat to ay task. We are iterested i estimators Θ that automatically adapt to differet levels of sharedess, ad yet eoy the followig guaratees: Support recovery: We say a estimator Θ successfully recovers the true siged support if sigsupp Θ)) = sigsupp Θ)). We are iterested i derivig sufficiet coditios uder which the estimator succeeds. We ote that this is stroger tha merely recoverig the row-support of Θ, which is uio of its supports for the differet tasks. I particular, deotig for the support of the k-th colum of Θ, ad U = k. Error bouds: We are also iterested i providig bouds o the elemetwisel orm error of the estimator Θ, Θ Θ = max max Θ k) k) Θ. =,...,p. Our Method k=,...,r Our method explicitly models the dirty block-sparse structure. We estimate a sum of two parameter matrices B ad S with differet regularizatios for each: ecouragig block-structured row-sparsity ib ad elemetwise sparsity is. The correspodig clea models would either ust use blocksparse regularizatios [8, 0] or ust elemetwise sparsity regularizatios [4, 8], so that either method would perform better i certai suited regimes. Iterestigly, as we will see i the mai results, by explicitly allowig to have both block-sparse ad elemetwise sparse compoet, we are able to outperform both classes of these clea models, for all regimes Θ. Algorithm Dirty Block Sparse Solve the followig covex optimizatio problem: Ŝ, B) argmi S,B r y k) X k) S k) +B k)) +λ s S, +λ b B,. ) k= The output Θ = B +Ŝ. 3 Mai Results ad Their Cosequeces We ow provide precise statemets of our mai results. A umber of recet results have show that the Lasso [4, 8] ad l /l block-regularizatio [8] methods succeed i recoverig siged supports with cotrolled error bouds uder high-dimesioal scalig regimes. Our first two theorems exted these results to our dirty model settig. I Theorem, we cosider the case of determiistic desig matrices X k), ad provide sufficiet coditios guarateeig siged support recovery, ad elemetwisel orm error bouds. I Theorem, we specialize this theorem to the case where the 3

4 rows of the desig matrices are radom from a geeral zero mea Gaussia distributio: this allows us to provide scalig o the umber of observatios required i order to guaratee siged support recovery ad bouded elemetwise l orm error. Our third result is the most iterestig i that it explicitly quatifies the performace gais of our method vis-a-vis Lasso ad the l /l block-regularizatio method. Sice this etailed fidig the precise costats uderlyig earlier theorems, ad a correspodigly more delicate aalysis, we follow Negahba ad Waiwright [8] ad focus o the case where there are two-tasks i.e. r = ), ad where we have stadard Gaussia desig matrices as i Theorem. Further, while each of two tasks depeds osfeatures, oly a fractioαof these are commo. It is the iterestig to see how the behaviors of the differet regularizatio methods vary with the extet of overlap α. Comparisos. Negahba ad Waiwright [8] show that there is actually a phase trasitio i the scalig of the probability of successful siged support-recovery with the umber of observatios. Deote a particular rescalig of the sample-size θ Lasso,p,α) = slogp s). The as Waiwright [8] show, whe the rescaled umber of samples scales as θ Lasso > + δ for ay δ > 0, Lasso succeeds i recoverig the siged support of all colums with probability covergig to oe. But whe the sample size scales asθ Lasso < δ for ayδ > 0, Lasso fails with probability covergig to oe. For the l /l -reguralized multiple liear regressio, defie a similar rescaled sample size slogp α)s) θ,,p,α) =. The as Negahba ad Waiwright [8] show there is agai a trasitio i probability of success from ear zero to ear oe, at the rescaled sample size ofθ, = 4 3α). Thus, for α < /3 less sharig ) Lasso would perform better sice its trasitio is at a smaller sample size, while for α > /3 more sharig ) the l /l regularized method would perform better. As we show i our third theorem, the phase trasitio for our method occurs at the rescaled sample size of θ, = α), which is strictly before either the Lasso or the l /l regularized method except for the boudary cases: α = 0, i.e. the case of o sharig, where we match Lasso, ad for α =, i.e. full sharig, where we match l /l. Everywhere else, we strictly outperform both methods. Figure 3 shows the empirical performace of each of the three methods; as ca be see, they agree very well with the theoretical aalysis. Further details i the experimets Sectio 4). 3. Sufficiet Coditios for Determiistic Desigs We first cosider the case where the desig matrices X k) for k =,,r are determiistic, ad start by specifyig the assumptios we impose o the model. We ote that similar sufficiet coditios for the determiistic X k) s case were imposed i papers aalyzig Lasso [8] ad block-regularizatio methods [8, 0]. A0 Colum Normalizatio X k) for all =,...,p,k =,...,r. Let deote the support of the k-th colum of Θ, ad U = k deote the uio of supports for each task. The we require that A Icoherece Coditio γ b := max U c k= r X k),x k) X k),x k) ) > 0. We will also fid it useful to defieγ s := max k r max U c k X k),x k) X k),x k) ). Note that by the icoherece coditio A, we have γ s > 0. ) A Eigevalue Coditio C mi := mi k r λmi X k) U k,x k) > 0. ) A3 Boudedess Coditio D max := max X k) U k r k,x k) <., Further, we require the regularizatio pealties be set as λ s > γs)σ logpr) ad λ b > γ b)σ logpr). ) γ s γ b 4

5 Probability of Success Dirty Model LASSO L/Lif Reguralizer p=8 p=56 p=5 Probability of Success Dirty Model L/Lif Reguralizer LASSO p=8 p=56 p=5 0 Cotrol Parameter θ a) α = Cotrol Parameter θ b) α = Dirty Model Probability of Success L/Lif Reguralizer LASSO p=8 p=56 p= Cotrol Parameter θ c) α = 0.8 Figure : Probability of success i recoverig the true siged support usig dirty model, Lasso ad l /l regularizer. For a -task problem, the probability of success for differet values of feature-overlap fractio α is plotted. As we ca see i the regimes that Lasso is better tha, as good as ad worse tha l /l regularizer a), b) ad c) respectively), the dirty model outperforms both of the methods, i.e., it requires less umber of observatios for successful recovery of the true siged support compared to Lasso adl /l regularizer. Here s = p 0 always. Theorem. Suppose A0-A3 hold, ad that we obtai estimate Θ from our algorithm with regularizatio parameters chose accordig to ). The, with probability at least c exp c ), we are guarateed that the covex program ) has a uique optimum ad a) The estimate Θ has o false iclusios, ad has bouded l orm error so that Supp Θ) Supp Θ), ad Θ Θ 4σ logpr), +λ sd max. C } mi {{} b mi b) sigsupp Θ)) = sig Supp Θ) ) provided that mi θ k) > b mi.,k) Supp Θ) Here the positive costatsc,c deped oly oγ s,γ b,λ s,λ b adσ, but are otherwise idepedet of,p,r, the problem dimesios of iterest. Remark: Coditio a) guaratees that the estimate will have o false iclusios; i.e. all icluded features will be relevat. If i additio, we require that it have o false exclusios ad that recover the support exactly, we eed to impose the assumptio i b) that the o-zero elemets are large eough to be detectable above the oise. 3. Geeral Gaussia Desigs Ofte the desig matrices cosist of samples from a Gaussia esemble. Suppose that for each task k =,...,r the desig matrix X k) R p is such that each row X k) i R p is a zero-mea Gaussia radom vector with covariace matrix Σ k) R p p, ad is idepedet of every other row. Let Σ k) V,U R V U be the submatrix of Σ k) with rows correspodig to V ad colums to U. We require these covariace matrices to satisfy the followig coditios: r ) C Icoherece Coditio γ b := max,, Σ k), > 0 U c Σk) k= 5

6 C Eigevalue Coditio C mi := mi is bouded away from zero. C3 Boudedess Coditio D max := ) k r λmi Σ k), Σ k), ), <. > 0 so that the miimum eigevalue These coditios are aalogues of the coditios for determiistic desigs; they are ow imposed o the covariace matrix of the radomly geerated) rows of the desig matrix. Further, defiig s := max k, we require the regularizatio pealties be set as 4σ C ) / milogpr) λ s > γ s Cmi slogpr) 4σ C ) / mirrlog)+logp)) ad λ b > γ b Cmi srrlog)+logp)). 3) Theorem. Suppose assumptios )) C-C3 hold, ad that the umber of samples scale as > s logpr) max C miγ, sr r log)+logp) s C miγ. Suppose we obtai estimate Θ from algorithm 3). The, b with probability at least c exp c rlog)+logp))) c 3 exp c 4 logrs)) for some positive umbers c c 4, we are guarateed that the algorithm estimate Θ is uique ad satisfies the followig coditios: a) the estimate Θ has o false iclusios, ad has bouded l orm error so that ) Supp Θ) Supp Θ), ad Θ Θ 50σ logrs) 4s, +λ s +D max. C mi C mi }{{} g mi b) sigsupp Θ)) = sig Supp Θ) ) provided that mi θ k),k) Supp Θ) > g mi. 3.3 Sharp Trasitio for -Task Gaussia Desigs This is oe of the most importat results of this paper. Here, we perform a more delicate ad fier aalysis to establish precise quatitative gais of our method. We focus o the special case where r = ad the desig matrix has rows geerated from the stadard Gaussia distributio N0,I ), so that C C3 hold, with C mi = D max =. As we will see both aalytically ad experimetally, our method strictly outperforms both Lasso ad l /l -block-regularizatio over for all cases, except at the extreme edpoits of o support sharig where it matches that of Lasso) ad full support sharig where it matches that ofl /l ). We ow preset our aalytical results; the empirical comparisos are preseted ext i Sectio 4. The results will be i terms of a particular rescalig of the sample sizeas θ,p,s,α) := We will also require the assumptios that F λ s > F λ b > α)slogp α)s). 4σ ) / s/)logr) + logp α)s)), ) / s) / α) s logr) + logp α)s))) / 4σ ) / s/)rrlog) + logp α)s)). ) / s) / α/) sr rlog) + logp α)s))) / Theorem 3. Cosider a -task regressio problem, p, s, α), where the desig matrix has rows geerated from the stadard Gaussia distributio N0,I ). Suppose max B Θ ) 6

7 Θ ) = oλs), where B is the submatrix of Θ with rows where both etries are o-zero. The the estimate Θ of the problem ) satisfies the followig: Success) Suppose the regularizatio coefficiets satisfy F F. Further, assume that the umber of samples scales asθ,p,s,α) >. The, with probability at least c exp c ) for some positive umbers c ad c, we are guarateed that Θ satisfies the support-recovery ad l error boud coditios a-b) i Theorem. Failure) If θ,p,s,α) ) < there is o solutio ˆB,Ŝ) for ay choices of λ s ad λ b such that sig Supp Θ) = sig Supp Θ) ). We ote that we require the gap Θ ) Θ ) to be small oly o rows where both etries are o-zero. As we show i a more geeral theorem i the appedix, eve i the case where the gap is large, the depedece of the sample scalig o the gap is quite weak. 4 Empirical Results I this sectio, we ivestigate the performace of our dirty block sparse estimator o sythetic ad real-world data. The sythetic experimets explore the accuracy of Theorem 3, ad compare our estimator with LASSO ad the l /l regularizer. We see that Theorem 3 is very accurate ideed. Next, we apply our method to a real world datasets cotaiig had-writte digits for classificatio. Agai we compare agaist LASSO ad thel /l. a multi-task regressio dataset) with r = tasks. I both of this real world dataset, we show that dirty model outperforms both LASSO ad l /l practically. For each method, the parameters are chose via cross-validatio; see supplemetal material for more details. 4. Sythetic Data Simulatio We cosider a r = -task regressio problem as discussed i Theorem 3, for a rage of parameters,p,s,α). The desig matrices X have each etry beig i.i.d. Gaussia with mea 0 ad variace. For each fixed set of,s,p,α), we geerate 00 istaces of the problem. I each istace, give p,s,α, the locatios of the o-zero etries of the true Θ are chose at radomly; each ozero etry is the chose to be i.i.d. Gaussia with mea 0 ad variace. samples are the geerated from this. We the attempt to estimate usig three methods: our dirty model, l /l regularizer ad LASSO. I each case, ad for each istace, the pealty regularizer coefficiets are foud by cross validatio. After solvig the three problems, we compare the siged support of the solutio with the true siged support ad decide whether or ot the program was successful i siged support recovery. We describe these process i more details i this sectio. Performace Aalysis: We ra the algorithm for five differet values of the overlap ratio α {0.3, 3,0.8} with three differet umber of features p {8,56,5}. For ay istace of the problem,p,s,α), if the recovered matrix ˆΘ has the same sig support as the true Θ, the we cout it as success, otherwise failure eve if oe elemet has differet sig, we cout it as failure). As Theorem 3 predicts ad Fig 3 shows, the right scalig for the umber of oservatios is slogp α)s), where all curves stack o the top of each other at α. Also, the umber of observatios required by dirty model for true siged support recovery is always less tha both LASSO ad l /l regularizer. Fig a) shows the probability of success for the case α = 0.3 whe LASSO is better tha l /l regularizer) ad that dirty model outperforms both methods. Whe α = 3 see Fig b)), LASSO ad l /l regularizer performs the same; but dirty model require almost 33% less observatios for the same performace. As α grows toward, e.g. α = 0.8 as show i Fig c), l /l performs better tha LASSO. Still, dirty model performs better tha both methods i this case as well. 7

8 Phase Trasitio Threshold Dirty Model L/Lif Regularizer p=8 p=56 p=5 LASSO Shared Support Parameter α Figure : Verificatio of the result of the Theorem 3 o the behavior of phase trasitio threshold by chagig the parameter α i a -task,p,s,α) problem for dirty model, LASSO ad l /l regularizer. The y-axis is, where is the umber of samples at which threshold was observed. Here s = p. Our slogp α)s) 0 dirty model method shows a gai i sample complexity over the etire rage of sharig α. The pre-costat i Theorem 3 is also validated. Our Model l /l LASSO 0 Average Classificatio Error 8.6% 9.9% 0.8% Variace of Error 0.53% 0.64% 0.5% Average Row Support Size B:65 B + S: Average Support Size S:8 B + S: Average Classificatio Error 3.0% 3.5% 4.% Variace of Error 0.56% 0.6% 0.68% Average Row Support Size B: B + S: Average Support Size S:34 B + S: Average Classificatio Error.% 3.%.8% Variace of Error 0.57% 0.68% 0.85% Average Row Support Size B:70 B + S: Average Support Size S:67 B + S: Table : Hadwritig Classificatio Results for our model, l /l ad LASSO Scalig Verificatio: To verify that the phase trasitio threshold chages liearly with α as predicted by Theorem 3, we plot the phase trasitio threshold versus α. For five differet values of α {0.05,0.3, 3,0.8,0.95} ad three differet values of p {8,56,5}, we fid the phase trasitio threshold for dirty model, LASSO ad l /l regularizer. We cosider the poit where the probability of success i recovery of siged support exceeds 50% as the phase trasitio threshold. We fid this poit by iterpolatio o the closest two poits. Fig shows that phase trasitio threshold for dirty model is always lower tha the phase trasitio for LASSO ad l /l regularizer. 4. Hadwritte Digits Dataset We use the hadwritte digit dataset [], cotaiig features of hadwritte umerals 0-9) extracted from a collectio of Dutch utility maps. This dataset has bee used by a umber of papers [7, 6] as a reliable dataset for hadwritte recogitio algorithms. There are thus r = 0 tasks, ad each hadwritte sample cosists of p = 649 features. Table shows the results of our aalysis for differet sizes of the traiig set. We measure the classificatio error for each digit to get the 0-vector of errors. The, we fid the average error ad the variace of the error vector to show how the error is distributed over all tasks. We compare our method with l /l reguralizer method ad LASSO. Agai, i all methods, parameters are chose via cross-validatio. For our method we separate out the B ad S matrices that our method fids, so as to illustrate how may features it idetifies as shared ad how may as o-shared. For the other methods we ust report the straight row ad support umbers, sice they do ot make such a separatio. Ackowledgemets We ackowledge support from NSF grat IIS-084, ad NSF CAREER program, Grat

9 Refereces [] A. Asucio ad D.J. Newma. UCI Machie Learig Repository, mlear/mlrepository.html. Uiversity of Califoria, School of Iformatio ad Computer Sciece, Irvie, CA, 007. [] F. Bach. Cosistecy of the group lasso ad multiple kerel learig. Joural of Machie Learig Research, 9:79 5, 008. [3] R. Baraiuk. Compressive sesig. IEEE Sigal Processig Magazie, 44):8, 007. [4] R. Caruaa. Multitask learig. Machie Learig, 8:4 75, 997. [5] C.Zhag ad J.Huag. Model selectio cosistecy of the lasso selectio i high-dimesioal liear regressio. Aals of Statistics, 36: , 008. [6] X. He ad P. Niyogi. Locality preservig proectios. I NIPS, 003. [7] K. Louici, A. B. Tsybakov, M. Potil, ad S. A. va de Geer. Takig advatage of sparsity i multi-task learig. I d Coferece O Learig Theory COLT), 009. [8] S. Negahba ad M. J. Waiwright. Joit support recovery uder high-dimesioal scalig: Beefits ad perils of l, -regularizatio. I Advaces i Neural Iformatio Processig Systems NIPS), 008. [9] S. Negahba ad M. J. Waiwright. Estimatio of ear) low-rak matrices with oise ad high-dimesioal scalig. I ICML, 00. [0] G. Oboziski, M. J. Waiwright, ad M. I. Jorda. Support uio recovery i high-dimesioal multivariate regressio. Aals of Statistics, 00. [] P. Ravikumar, H. Liu, J. Lafferty, ad L. Wasserma. Sparse additive models. Joural of the Royal Statistical Society, Series B. [] P. Ravikumar, M. J. Waiwright, ad J. Lafferty. High-dimesioal isig model selectio usig l -regularized logistic regressio. Aals of Statistics, 009. [3] B. Recht, M. Fazel, ad P. A. Parrilo. Guarateed miimum-rak solutios of liear matrix equatios via uclear orm miimizatio. I Allerto Coferece, Allerto House, Illiois, 007. [4] R. Tibshirai. Regressio shrikage ad selectio via the lasso. Joural of the Royal Statistical Society, Series B, 58):67 88, 996. [5] J. A. Tropp, A. C. Gilbert, ad M. J. Strauss. Algorithms for simultaeous sparse approximatio. Sigal Processig, Special issue o Sparse approximatios i sigal ad image processig, 86:57 60, 006. [6] B. Turlach, W.N. Veables, ad S.J. Wright. Simultaeous variable selectio. Techo- metrics, 7: , 005. [7] M. va Breukele, R.P.W. Dui, D.M.J. Tax, ad J.E. de Hartog. Hadwritte digit recogitio by combied classifiers. Kyberetika, 344):38 386, 998. [8] M. J. Waiwright. Sharp thresholds for oisy ad high-dimesioal recovery of sparsity usig l -costraied quadratic programmig lasso). IEEE Trasactios o Iformatio Theory, 55: 83 0,

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba, UC Berkeley Pradeep Ravikumar, UT Austi Marti Waiwright, UC Berkeley Bi Yu, UC Berkeley NIPS

More information

Lecture 12: February 28

Lecture 12: February 28 10-716: Advaced Machie Learig Sprig 2019 Lecture 12: February 28 Lecturer: Pradeep Ravikumar Scribes: Jacob Tyo, Rishub Jai, Ojash Neopae Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

Robust Lasso with missing and grossly corrupted observations

Robust Lasso with missing and grossly corrupted observations Robust Lasso with missig ad grossly corrupted observatios Nam H. Nguye Johs Hopkis Uiversity am@jhu.edu Nasser M. Nasrabadi U.S. Army Research Lab asser.m.asrabadi.civ@mail.mil Trac D. Tra Johs Hopkis

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

A Risk Comparison of Ordinary Least Squares vs Ridge Regression Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Lecture 24: Variable selection in linear models

Lecture 24: Variable selection in linear models Lecture 24: Variable selectio i liear models Cosider liear model X = Z β + ε, β R p ad Varε = σ 2 I. Like the LSE, the ridge regressio estimator does ot give 0 estimate to a compoet of β eve if that compoet

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

High Dimensional Structured Superposition Models

High Dimensional Structured Superposition Models High Dimesioal Structured Superpositio Models Qilog Gu Dept of Computer Sciece & Egieerig Uiversity of Miesota, Twi Cities guxxx396@cs.um.edu Aridam Baerjee Dept of Computer Sciece & Egieerig Uiversity

More information

Problem Set 4 Due Oct, 12

Problem Set 4 Due Oct, 12 EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios

More information

A survey on penalized empirical risk minimization Sara A. van de Geer

A survey on penalized empirical risk minimization Sara A. van de Geer A survey o pealized empirical risk miimizatio Sara A. va de Geer We address the questio how to choose the pealty i empirical risk miimizatio. Roughly speakig, this pealty should be a good boud for the

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Machine Learning Theory (CS 6783)

Machine Learning Theory (CS 6783) Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT

More information

Lecture 3: August 31

Lecture 3: August 31 36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

High-dimensional support union recovery in multivariate regression

High-dimensional support union recovery in multivariate regression High-dimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

IP Reference guide for integer programming formulations.

IP Reference guide for integer programming formulations. IP Referece guide for iteger programmig formulatios. by James B. Orli for 15.053 ad 15.058 This documet is iteded as a compact (or relatively compact) guide to the formulatio of iteger programs. For more

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

The Method of Least Squares. To understand least squares fitting of data.

The Method of Least Squares. To understand least squares fitting of data. The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n.

Cov(aX, cy ) Var(X) Var(Y ) It is completely invariant to affine transformations: for any a, b, c, d R, ρ(ax + b, cy + d) = a.s. X i. as n. CS 189 Itroductio to Machie Learig Sprig 218 Note 11 1 Caoical Correlatio Aalysis The Pearso Correlatio Coefficiet ρ(x, Y ) is a way to measure how liearly related (i other words, how well a liear model

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

REGRESSION WITH QUADRATIC LOSS

REGRESSION WITH QUADRATIC LOSS REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

Accuracy Assessment for High-Dimensional Linear Regression

Accuracy Assessment for High-Dimensional Linear Regression Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia

More information

Taking Advantage of Sparsity in Multi-Task Learning

Taking Advantage of Sparsity in Multi-Task Learning akig Advatage of Sparsity i Multi-ask Learig Karim Louici LPMA ad CRES 3, Av. Pierre Larousse, 92240 Malakoff, Frace karim.louici@esae.fr Alexadre B. sybakov LPMA ad CRES 3, Av. Pierre Larousse, 92240

More information

Stochastic Matrices in a Finite Field

Stochastic Matrices in a Finite Field Stochastic Matrices i a Fiite Field Abstract: I this project we will explore the properties of stochastic matrices i both the real ad the fiite fields. We first explore what properties 2 2 stochastic matrices

More information

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead) Lecture 4 Homework Hw 1 ad 2 will be reoped after class for every body. New deadlie 4/20 Hw 3 ad 4 olie (Nima is lead) Pod-cast lecture o-lie Fial projects Nima will register groups ext week. Email/tell

More information

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers A uified framework for high-dimesioal aalysis of M-estimators with decomposable regularizers Sahad Negahba Departmet of EECS UC Berkeley sahad @eecs.berkeley.edu Marti J. Waiwright Departmet of Statistics

More information

Preponderantly increasing/decreasing data in regression analysis

Preponderantly increasing/decreasing data in regression analysis Croatia Operatioal Research Review 269 CRORR 7(2016), 269 276 Prepoderatly icreasig/decreasig data i regressio aalysis Darija Marković 1, 1 Departmet of Mathematics, J. J. Strossmayer Uiversity of Osijek,

More information

The multiplicative structure of finite field and a construction of LRC

The multiplicative structure of finite field and a construction of LRC IERG6120 Codig for Distributed Storage Systems Lecture 8-06/10/2016 The multiplicative structure of fiite field ad a costructio of LRC Lecturer: Keeth Shum Scribe: Zhouyi Hu Notatios: We use the otatio

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv: v2 [stat.ml] 7 Mar 2011

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv: v2 [stat.ml] 7 Mar 2011 The Aals of Statistics 011, Vol. 39, No. 1, 1 47 DOI: 10.114/09-AOS776 c Istitute of Mathematical Statistics, 011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 arxiv:0808.0711v [stat.ml]

More information

Kinetics of Complex Reactions

Kinetics of Complex Reactions Kietics of Complex Reactios by Flick Colema Departmet of Chemistry Wellesley College Wellesley MA 28 wcolema@wellesley.edu Copyright Flick Colema 996. All rights reserved. You are welcome to use this documet

More information

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion .87 Machie learig: lecture Tommi S. Jaakkola MIT CSAIL tommi@csail.mit.edu Topics The learig problem hypothesis class, estimatio algorithm loss ad estimatio criterio samplig, empirical ad epected losses

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Lecture 9: September 19

Lecture 9: September 19 36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Regression with quadratic loss

Regression with quadratic loss Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Machine Learning Brett Bernstein

Machine Learning Brett Bernstein Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio

More information

arxiv: v1 [math.fa] 3 Apr 2016

arxiv: v1 [math.fa] 3 Apr 2016 Aticommutator Norm Formula for Proectio Operators arxiv:164.699v1 math.fa] 3 Apr 16 Sam Walters Uiversity of Norther British Columbia ABSTRACT. We prove that for ay two proectio operators f, g o Hilbert

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

CMSE 820: Math. Foundations of Data Sci.

CMSE 820: Math. Foundations of Data Sci. Lecture 17 8.4 Weighted path graphs Take from [10, Lecture 3] As alluded to at the ed of the previous sectio, we ow aalyze weighted path graphs. To that ed, we prove the followig: Theorem 6 (Fiedler).

More information

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm

The DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm , pp.10-106 http://dx.doi.org/10.1457/astl.016.137.19 The DOA Estimatio of ultiple Sigals based o Weightig USIC Algorithm Chagga Shu a, Yumi Liu State Key Laboratory of IPOC, Beijig Uiversity of Posts

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

THE KALMAN FILTER RAUL ROJAS

THE KALMAN FILTER RAUL ROJAS THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

CHAPTER I: Vector Spaces

CHAPTER I: Vector Spaces CHAPTER I: Vector Spaces Sectio 1: Itroductio ad Examples This first chapter is largely a review of topics you probably saw i your liear algebra course. So why cover it? (1) Not everyoe remembers everythig

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

Zeros of Polynomials

Zeros of Polynomials Math 160 www.timetodare.com 4.5 4.6 Zeros of Polyomials I these sectios we will study polyomials algebraically. Most of our work will be cocered with fidig the solutios of polyomial equatios of ay degree

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

TEACHER CERTIFICATION STUDY GUIDE

TEACHER CERTIFICATION STUDY GUIDE COMPETENCY 1. ALGEBRA SKILL 1.1 1.1a. ALGEBRAIC STRUCTURES Kow why the real ad complex umbers are each a field, ad that particular rigs are ot fields (e.g., itegers, polyomial rigs, matrix rigs) Algebra

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Efficient GMM LECTURE 12 GMM II

Efficient GMM LECTURE 12 GMM II DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet

More information

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices A Hadamard-type lower boud for symmetric diagoally domiat positive matrices Christopher J. Hillar, Adre Wibisoo Uiversity of Califoria, Berkeley Jauary 7, 205 Abstract We prove a ew lower-boud form of

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Matrix Representation of Data in Experiment

Matrix Representation of Data in Experiment Matrix Represetatio of Data i Experimet Cosider a very simple model for resposes y ij : y ij i ij, i 1,; j 1,,..., (ote that for simplicity we are assumig the two () groups are of equal sample size ) Y

More information

On Cross-Validated Lasso

On Cross-Validated Lasso O Cross-Validated Lasso Deis Chetverikov Zhipeg Liao Abstract I this paper, we derive a rate of covergece of the Lasso estimator whe the pealty parameter λ for the estimator is chose usig K-fold cross-validatio;

More information

Chapter 4. Fourier Series

Chapter 4. Fourier Series Chapter 4. Fourier Series At this poit we are ready to ow cosider the caoical equatios. Cosider, for eample the heat equatio u t = u, < (4.) subject to u(, ) = si, u(, t) = u(, t) =. (4.) Here,

More information

On Random Line Segments in the Unit Square

On Random Line Segments in the Unit Square O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,

More information

5.1 A mutual information bound based on metric entropy

5.1 A mutual information bound based on metric entropy Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local

More information

A Note on the Symmetric Powers of the Standard Representation of S n

A Note on the Symmetric Powers of the Standard Representation of S n A Note o the Symmetric Powers of the Stadard Represetatio of S David Savitt 1 Departmet of Mathematics, Harvard Uiversity Cambridge, MA 0138, USA dsavitt@mathharvardedu Richard P Staley Departmet of Mathematics,

More information

Lecture 2. The Lovász Local Lemma

Lecture 2. The Lovász Local Lemma Staford Uiversity Sprig 208 Math 233A: No-costructive methods i combiatorics Istructor: Ja Vodrák Lecture date: Jauary 0, 208 Origial scribe: Apoorva Khare Lecture 2. The Lovász Local Lemma 2. Itroductio

More information

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

RAINFALL PREDICTION BY WAVELET DECOMPOSITION

RAINFALL PREDICTION BY WAVELET DECOMPOSITION RAIFALL PREDICTIO BY WAVELET DECOMPOSITIO A. W. JAYAWARDEA Departmet of Civil Egieerig, The Uiversit of Hog Kog, Hog Kog, Chia P. C. XU Academ of Mathematics ad Sstem Scieces, Chiese Academ of Scieces,

More information

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring

Machine Learning Regression I Hamid R. Rabiee [Slides are based on Bishop Book] Spring Machie Learig Regressio I Hamid R. Rabiee [Slides are based o Bishop Book] Sprig 015 http://ce.sharif.edu/courses/93-94//ce717-1 Liear Regressio Liear regressio: ivolves a respose variable ad a sigle predictor

More information

Regularization methods for large scale machine learning

Regularization methods for large scale machine learning Regularizatio methods for large scale machie learig Lorezo Rosasco March 7, 2017 Abstract After recallig a iverse problems perspective o supervised learig, we discuss regularizatio methods for large scale

More information

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity High-dimesioal regressio with oisy ad missig data: Provable guaratees with o-covexity Po-Lig Loh Departmet of Statistics Uiversity of Califoria, Berkeley Berkeley, CA 94720 ploh@berkeley.edu Marti J. Waiwright

More information