FuSSO: Functional Shrinkage and Selection Operator

Size: px
Start display at page:

Download "FuSSO: Functional Shrinkage and Selection Operator"

Transcription

1 Junier B. Oliva Barnabás Póczos Timothy Verstynen Aarti Singh Jeff Schneider Fang-Cheng eh Wen-ih Tseng Carnegie Mellon University ational Taiwan University Abstract We present the FuSSO, a functional analogue to the LASSO, that efficiently finds a sparse set of functional input covariates to regress a real-valued response against. The FuSSO does so in a semi-parametric fashion, making no parametric assumptions about the nature of input functional covariates and assuming a linear form to the mapping of functional covariates to the response. We provide a statistical backing for use of the FuSSO via proof of asymptotic sparsistency under various conditions. Furthermore, we observe good results on both synthetic and real-world data. Introduction Modern data collection has allowed us to collect not ust more data, but more complex data. In particular, complex obects like sets, distributions, and functions are becoming prevalent in many domains. It would be beneficial to perform machine learning tasks using these complex obects. However, many existing techniques can not handle complex, possibly infinite dimensional, obects; hence one often resorts to the ad-hoc technique of representing these complex obect by arbitrary summary statistics. In this paper, we look to perform a regression task when dealing with functional data. Specifically, we look to regress a mapping that takes in many functional input covariates and outputs a real value. Moreover, since we are considering many functional covariates possibly many more than the number of instances of one s data, we look to find an estimator that performs feature selection by only regressing on a subset of all possible input functional covariates. To this Appearing in Proceedings of the 7 th International Conference on Artificial Intelligence and Statistics AISTATS 4, Reykavik, Iceland. JMLR: W&CP volume 33. Copyright 4 by the authors. end we present the Functional Shrinkage and Selection Operator FuSSO, for performing sparse functional regression in a principled, semi-parametric manner. Indeed, there are a multitude of applications and domains where the study of a mapping that takes in a functional input and outputs a real-value is of interest. That is, if I is some class of input functions with domain Ψ R and range R, then one may be interested in a mapping h : I R: hf Figure a. Examples include: a mapping that takes in the timeseries of a commodity s price in the past f is a function with the domain of time and range of price and outputs the expected price of the commodity in the nearby future; also, a mapping that takes a patient s cardiac monitor s time-series and outputs a health index. Recently, work by 8 has explored this type of regression problem when the input function is a distribution. Furthermore, the general case of an arbitrary functional input is related to functional analysis. However, often it is expected that the response one is interested in regressing is dependent on not ust one, but many functions. That is, it may be fruitful to consider a mapping h : I... I p R: hf,..., f p Figure b. For instance, this is likely the case in regressing the price of a commodity in the future, since the commodity s future price is not only dependent on the history of it own price, but also the history of other commodities prices as well. A response s dependence on multiple functional covariates is especially common in neurological data, where thousands of voxels in the brain may each contain a corresponding function. In fact, in such domains it is not uncommon to have a number of input functional covariates that far exceeds the number of training instances one has in a data-set. Thus, it would be beneficial to have an estimator that is sparse in the number of functional covariates used to regress the response against. That is, find an estimate, ĥs, that depends on a small subset {i,..., i S } {,..., p}, such that ĥf,..., f p ĥsf i,..., f is Figure c. Here we present a semi-parametric estimator to perform sparse regression with multiple input functional 75

2 f f f f f f i f i f p- f p- f p f p a Single Functional Covariate b Multiple Functional Covariates c Sparse Model Figure : a Model where mapping takes in a function f and produces a real. b Model where response is dependent on multiple input functions f,..., f p. c Sparse model where response is dependent on a sparse subset of input functions f,..., f p. covariates and a real-valued response, the FuSSO: Functional Shrinkage and Selection Operator. o parametric assumptions are made on the nature of input functions. We shall assume that the response is the result of a sparse set of linear combinations of input functions and other non-paramteric functions {g i }: f, g. The resulting method is a LASSOlike estimator that effectively zeros out entire functions from consideration in regressing the response. Our contributions are as follows. We introduce the FuSSO, an estimator for performing regression with many functional covariates and a real-valued response. Furthermore, we provide a theoretical backing of the FuSSO estimator via proof of asymptotic sparsistency under certain conditions. We also illustrate the estimator with applications on synthetic data as well as in regressing the age of a subect when given orientation distribution function dodf 4 data for the subect s white matter. Related Work As previously mentioned, recently 8 explored regression with a mapping that takes in a probability density function and outputs a real value. Furthermore, 7 studies the case when both the input and outputs are distributions. In addition, functional analysis relates to the study for functional data. In all these works, the mappings studied take in only one functional covariate. Based on them, it is not immediately evident how to expand on these ideas to develop an estimator that simultaneously performs regression and feature selection with multiple function covariates. To the best of our knowledge, there has been no prior work in studying sparse mappings that take multiple functional inputs and produce a real-valued output. LASSO-like regression estimators that work with functional data include the following. In 6, one has a functional output and several real-valued covariates. Here, the estimator finds a sparse set of functions to scale by the real valued covariates to produce the functional response. Also, 7, 3 study the case when one has one functional covariate f and one real valued response that is linearly dependent on f and some function g: f, g fg. In 7 the estimator searches for sparsity across wavelet basis proection coefficients. In 3, sparsity is in achieved in the time input domain of the d th derivative of g; i.e. D d gt for many values of t where D d is the differential operator. Hence, roughly speaking, 7, 3 look for sparsity across frequency and time domains respectively, for the regressing function g. However, these methods do not consider the case where one has many input functional covariates {f,..., f p }, and needs to choose among them. That is, 7, 3 do not provide a method to select among function covariates in an analogous fashion to how the LASSO selects among real-valued covariates. Lastly, it is worth noting that in our estimator we will have an additive linear model, f, g where we search for {g i } in a broad, non-parametric family such that many g are the zero function. Such a task is similar in nature to the SpAM estimator 9, in which one also has an additive model g X in the dimensions of a real vector X and searches for {g i } in a broad, non-parametric family such that many g are the zero function. ote though, that in the SpAM model, the {g i } functions are applied to real covariates via a function evaluation. In the FuSSO model, {g i } are applied to functional covariates via an inner product; that is, FuSSO works over functional, not real-valued covariates, unlike SpAM. 76

3 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng 3 Model To better understand FuSSO s model we draw several analogies to real-valued linear regression and Group- LASSO 6. ote that although for simplicity we focus on functions working over a one dimensional domain, it is straightforward to extend the estimator and results to the multidimensional case. Consider a model for typical real-valued linear regression with a data-set of input-output pairs {X i, i } i : i X i, w + ɛ i, where i R, X i R d, w R d, ɛ i, σ, and X i, w d X iw. If instead one were working with functional data {f i, i } i, where f i :, R and f i L,, one may similarly consider a linear model: i f i, g + ɛ i, where, g :, R, and f i, g f i tgtdt. If Φ {ϕ m } m is an orthonormal basis for L, then we have that f i x m α i m ϕ m x, where, α m i f i tϕ m tdt. Similarly, gx m β mϕ m x where β m gtϕ mtdt. Thus, i f i, g + ɛ i α m i ϕ m x, β k ϕ k x + ɛ i m m k m k α i m β k ϕ m x, ϕ k x + ɛ i α i m β m + ɛ i, where the last step follows from orthonormality of Φ. Going back to the real-valued covariate case, if instead of having one feature vector per data instance: X i R d, one had p feature vectors associated to each data instance: {X i p, X i R d }, an additive linear model may be used for regression: i X i, w + ɛ i, where w,..., w p R d. Similarly, in the functional case one may have p functions associated with data instance i: {f i p, f i L, }. Then, an additive linear model would be: i f i, g + ɛ i m α i m β m + ɛ i, where g,..., g p L,, and α i m and β m are proection coefficients for f i and g respectively. Suppose that one has few observations relative to the number of features p. In the real-valued case, in order to effectively find a solution for w w T,..., w T p T one may search for a group sparse solution where many w. To do so, one may consider the following Group-LASSO regression: w argmin w X w + w, 3 where X is the d matrix X X... X T,,..., T, and is the Euclidean norm. If in the functional case one also has that p, one may set up a similar optimization to 3, whose direct analogue is: g argmin g equivalently, i i, g f i 4 + g ; 5 β argmin i α i m β β m 6 i m + βm, 7 m where g {g i } p i { m β imϕ m, } p i. However, it is unfeasible to directly observe functional inputs {f i i, p}. Thus, we shall instead assume that one observes { y i i, p} where y i f i ξ i f i f i + ξ i, 8 T /n, f i /n,..., f i, 9, σ ξi n. 77

4 That is, we observe a grid of n noisy values for each functional input. Then, one may estimate α i m as: α i m n ϕt m y i n ϕt m f i + ξ i ᾱ i m + ηi m where ϕ m ϕ m /n, ϕ m /n,..., ϕ m T. Furthermore, we may truncate the number of basis functions used to express f i to M n, estimating it as: f i x M n m α i m ϕ mx. Using the truncated estimate, one has: M n i f x, g m i f x α i m β m, and Mn α i m. m Hence, using the approximations, 7 becomes: M n ˆβ argmin i α i m β β m 3 i m + Mn βm 4 argmin β m à β + β, 5 where à is the M n matrix with values Ãi, m α i m and β β,..., β Mn T. ote that one need not consider proection coefficients β m for m > M n since such proection coefficients will not decrease the MSE term in 3 because α i m for m > M n, and β m for m > M n increases the norm penalty term in 4. Hence we see that our sparse functional estimates are a Group-LASSO problem on the proection coefficients. Extensions It is useful to note that there are several straightforward extensions to the FuSSO as presented. First, we would like to note that it may be possible to estimate the inner product of a function f i, and g as f i g y i, n g, where g g /n,..., g T. This effectively allows one to use a naive approach of simply using Group-LASSO on the y i feature vectors directly we ll refer to this method as -GL. It is important to note, however, that -GL will be less robust to noise, and adaptive and efficient to smoothness than the FuSSO. Furthermore, we note that it is not necessary to have observations for input functions that are on a grid for the FuSSO, since one may estimate proection coefficients in the case of an irregular design. Moreover, we may also estimate proection coeffincients for density functions with samples drawn from the pdf. ote that the -GL would fail to estimate our model in the irregular design case, and would not be possible in the case were functions are pdf. Also, a two-stage estimator as described in 5, where one first uses the regularization penalty with a large to find the support, then solves the optimization problem with a smaller on ust the estimated support to estimate the response, may be more efficient at estimating the response. Furthermore, an analogous problem as 5 may be framed to perform logistic regression and classification. 4 Theory ext, we show that the FuSSO is able to recover the correct sparsity pattern asymptotically; i.e., that the FuSSO estimate is sparsistent. In order to do so, we shall show that with high probability there is an optimal solution to our optimization problem 5 with the correct sparsity pattern. We follow a similar argument to, 9. We shall use a witness technique to show that there is a coefficient/subgradient pair ˆβ, û such that supp ˆβ suppβ, for true response generating β. Let Ωβ p β, be our penalty term 4. Let S denote the true set of non-zero functions; i.e. S { β }, with s S. First, we fix ˆβ S c, and set û S Ω β S. ote that for a vector β, Ω β {u} where: u β / β, if β ; u u if β. We shall show that with high probability, S, ˆβ and S c, u <, thus showing that there is an optimal solution to our optimization problem 5 that has the true sparsity pattern with high probability. First, we elaborate on our assumptions. 4. Assumptions Let Φ be the trigonometric basis, ϕ x, k : ϕ k x cosπkx, ϕ k+ x sinπkx. Let D {{ y i } p, i} i i, where y is as 8, and i p m αi m β m + ɛ i as in. Assume that i, p: α i Θγ, Q, where: Θγ, Q {θ : c kθk Q}, k c k k γ if k even or one, k γ otherwise, 78

5 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng α i {α i m R αi m f i ϕ m, m + } for < Q < and < γ <. Furthermore, assume that that for the true β generating the observed responses i, β, p: β Θγ, Q. Let A be the M n matrix with entries A i, m α i m. Let A S denote the matrix made up from horizontally concatenating the A matrices with S; i.e. A S A... A s, where {,..., s } S and i < k for i < k. Suppose the following: Λ max AT S A S Cmax < 6 Λ min AT S A S Cmin >. 7 Also, suppose δ, s.t. S c Λ max AT A Cmax < 8 AT A S AT S A S δ/ s 9 Let Ā be the M n matrix with entries Āi, m ᾱ i m n ϕt mf i. Let H be the M n matrix with entries H i, m η i m n ϕt mξ i. Thus, Ã Ā + H. Furthermore, let E Ā A. Then, Ã A + E + H. In addition to the aforementioned assumptions, we shall further assume the following: a < / s.t. pm n n a / e n a ρ ρ min S β > smn n γ+ / + n a s 3 / M / γ n + logsm n / 3 smn /ρ 4 s M n n γ+/ s log + n 5 sm n n + smn log 6 γ+a / n a+ / Mn logp sm n / 7 s/ M γ / n, 8 and we assume γ for the sake of simplification. We may further simplify our assumptions if we take n / and choose M n optimally for function estimation: M n n /γ+ /4γ+. Furthermore, take s O, ρ, and γ. Under these conditions, our assumptions reduce to < a and taking the follow to go to zero: p a 3 e a, /, 7 /, log, 9 / logp. 4. Sparsistency Theorem : P Ŝ S. First, we state some lemmas, whose proofs may be found in the supplementary materials. 4.. Lemmata Lemma Let X be a non-negative r.v. and C be an measurable event, then E X C PC E X. Lemma n n k ϕ mk/nϕ l k/n I{l m}, for l, m n. Lemma 3 Let H i, σ ξ n I, and Hi S be the rows of H, then H i, σ ξ n I. Lemma 4 P H max n a σ ξ pm n n a / e n a σ ξ Lemma 5 E max C Q n γ+/, where C Q, is a constant depending on Q. Lemma 6 β S Qs. Lemma 7, n, C min, C max, < C min C max <, < δ s.t. if H max < n a, and >, n > n then Λ max ÃT S ÃS C max < 9 Λ min C min > 3 S c, ÃT S ÃS ÃT ÃS ÃT S ÃS δ s Proof of Theorem Proposition P S ˆβ. Proof. Recall that by, ρ min S β >. Thus to prove that S ˆβ, it suffices to show that : ˆβ S βs ρ. To do so we show P ˆβ S βs > ρ. Let B be the event that H max < n a. ote that: P ˆβ S βs > ρ P ˆβ S βs > ρ B PB + P B c. Furthermore, P ˆβ S βs > ρ B ρ E ˆβ S βs B. Then, looking at the stationarity condition for the support S: ÃT S ÃS ˆβS + û S. 3 79

6 Let V be the vector with entries V i S mm n+ αi m β m ; i.e. the error from truncation. Then, using 3 A S βs + V + ɛ ÃT S ÃS ˆβS A S βs V ɛ + û S ÃS ˆβ S βs A S ÃSβS V ɛ û S Ã T S Thus, ÃT S ÃS ˆβ S β S ÃT S E S + H S β S + ÃT S V Let Σ SS ÃT S ÃS ; we see that, + ÃT S ɛ û S. 33 ˆβ S β S Σ SS ÃT S E S + H S β S + Σ SS ÃT S V + Σ SS ÃT S ɛ + Σ SS û S. Thus, we proceed to bound each term on the LHS in expectation. First, note that Σ SS ÃT S E S + H S βs Σ SS ÃT S E S + H S βs Σ SS AT S + ET S + HT S E S + H S βs Σ SS A T S E S + H S βs + E S + H S T E S + H S βs. Moreover, given that B occurs: Thus, E Σ SS sm n Σ SS smn. C min Σ SS A T S E S + H S βs B PB smn C min AT S E E S + H S β S B PB Q sm n C min ES β S + E H S β S B PB, noting that A T S Q. Moreover, by Lemma : Also, H S β S E H S β S B PB E HS β S. sm n is normally distributed and VarHiT S β S VarH i S β S σ ξ n β S σ ξ Qs n. Hence, by a Gaussian inequality e.g. 3 we have: E H S βs σξ Qs log/n. Unless otherwise specified, let X i be the i th row of matrix X and X be the th column. Also, E S β S max i EiT S β S β S max i Ei S Thus, E Σ Qs C Q smn n γ+/ QC Q s M n n γ+/ SS A T S E S + H S βs smn O s s log M n n γ+/ +. n Furthermore, E E S + H S T E S + H S βs B PB E max E S + H S T E S + H S β sm n S B PB E max E S + H S E S + H S β sm n S E E S + H S E S + H S β S B PB. Then, given that B occurs E S + H S E S + H S C Q n γ+/ + n a, and, as before: E E S + H S β S B C s s log M n n γ+/ + C 3. n Hence, E Σ SS ÃT S E S + H S βs PB O smn s M n n γ+/ + s log/n + n γ+/ + n a PB O smn s M n n γ+/ + s log/n The next terms are bounded as follows. Lemma 8 E Σ SS ÃT S V PB s O. Mn γ / Lemma 9 E Σ SS ÃT S ɛ PB logsmn O /. Lastly, û S max û max û S S Σ SS û S Σ SS û S smn. C min Keeping only leading terms, E ˆβ S βs PB O s 3 / M n n γ+/ + s M n log/n See Supplemental Materials for proof. 7

7 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng + O s 3 / /M γ / n + logsm n / + smn. Hence, by assumptions -4 we have P ˆβ S βs > ρ One may similarly look at the stationarity for S c to analyze û : ÃT ÃS β S + û ÃS ˆβ S βs A S ÃSβS V ɛ + û. ÃT Thus, û ÃT ÃSβ S ˆβ S + ÃT A S ÃSβ S + ÃT V + ɛ ΣS Σ SS ÃT S E S + H S β S ÃT S V ÃT S ɛ + û S ÃT E S + H S β S + ÃT V + ɛ, where Σ S ÃT ÃS and using 33. We wish to show that S c û satisfies the KKT conditions, that is: Proposition P max S c û <. Proof. Let µ H E û H. We proceed as follows: P max S c u < P max S c µh + u µ H < P max S c µh + M n u µ H < P max S c µh < δ, max S c u µ H < P max S c µh δ δ P max S c u µ H. M n We obtain the following results: Lemma P Lemma P max S c µ H δ max S c û µ H δ M n δ M n Hence, we have that P max S c û <. 5 Experiments 5. Synthetic Data We tested the FuSSO on synthetic data-sets of D {{ y i } p, i} i i where y as in 8. The experiments performed were as follows. First, we fix, n, p, and s. For i,...,,,..., p we create random functions using a maximum of M proection coefficients as follows: Set a m Unif, for m,..., M; set a m a i /c m, where c m m if m or is even, c m m if m is odd; 3 set a m a m / a ; 4 set α i a. See Figures b,c for typical functions. Similarly, we generate β for,..., s; for s +,..., p, we set β. Then, we generate i as i p β, αi + ɛ i + ɛ i, where ɛ,.. Also, a grid s β, αi of n noisy function evaluations were generated to make as in 8, with σ ξ.. These were then used to compute α i m for m,..., M n as in, M n was chosen by cross validation. See Figures b, c for typical noisy observations and function estimates for n 5 and n 5 respectively. y i We fixed s 5 and chose the following configurations for the other parameters: p,, n {, 5, 5,, 5, 5,, 5, 5}. For each tuple of p,, n configurations, random trails were performed. We recorded, r, the fraction of the trails that a value was able recover the correct sparsity pattern i.e. that only the first 5 functions are in the support. We also recorded the mean length of the range of,, that were able to recover t t the correct support; i.e. t, where t t f t l / t max, t f is the largest value found to recover the correct support in the t th trails, t l the smallest such, and t max is the smallest to produce ˆβ t is taken to be zero if no recovered the correct support. The results were as follows: p,, n r,5,5.68.5,5,5.477,5,5.479 Hence we see that even when the number of observations per function is small 5 or 5 and the number of total number of input functional covariates is large we were able to test up to, the FuSSO can recover the correct support. Also, to illustrate this point that running Group-LASSO on the y i features - GL is less robust to noise and adaptive to smoothness, we ran noisier trails using the configuration of p,, n, 5, 5. We increased the standard deviation of the noise on grid function observation and on the response to be 5 and respectively. Under these conditions the FuSSO was able to recover the support in 49% of the trails were as -GL recovered the support in 3% of the trails. Furthermore the FuSSO had 7

8 Typical Function.5.5 True Observations Estimated.5 X a Function at n 5 Typical Function.5.5 True Observations Estimated.5 X c Function at n 5 Regressed Function orm Functional Regularization Path 4 6 b Reg. Path at p, n 5, n 5 of ˆβ Regressed Function orm Functional Regularization Path 3 4 d Reg. Path at p, n 5, n 5 of ˆβ statistic of an dodf function for age regression 5. The proection coefficients for the dodfs at each voxel were estimated using the cosine basis. The FuSSO estimator gave a cross-validated MSE of 7.855, where the variance for age was ; selected voxels in the support may be seen in Figure 3c. The LASSO estimate using QA values gave a cross-validated MSE of Thus, one may see that considering the entire functional data gave us better results for age regression. We note that we were unable to use the naive approach of -GL in this case because of memory constraints and the fact that function evaluation points did not lie on a d square grid. Frequency 4 3 Figure : ac Two typical functions, noisy observations, and estimates. bd Regularization paths showing the norms of ˆβ in red for in support, blue otherwise for a range of ; rightmost vertical line indicates largest able to recover the support, leftmost line for smallest such a t.743 compared to t.54 for -GL. a Example ODF Frequency Age 5 b Ages 5. eurological Data We also tested the FuSSO estimator with a neurological data-set, using a total of 89 subects 4. Subects ranged in age from 8 to 6 years old Figure 3b. Our goal was to learn a regression that maps the dodfs at each white matter voxel for each subect to the subect s age. The dodf is a function represents the amount of water molecules, or spins, undergoing diffusion in different orientations over the S sphere5. I.e., each dodf is a function with a d domain of azimuth, elevation spherical coordinates and a range of reals representing the strength of water diffusion at the given orientations see Figure 3a. Data was provided for each subect in a template space for white-matter voxels; a total of over 5 thousand voxels dodfs were regressed on i.e. p 5. We also compared regression using the FuSSO and functional covariates to using the LASSO and real valued covariates. We used the non-functional collection of quantitative anisotropy QA values for the same white matter voxels as with dodf functions. QA values are the estimated amount of spins that undergo diffusion in the direction of the principle fiber orientation, i.e., the peak of the dodf; QAs have been used as a measure of white matter integrity in the underlying voxel hence making for a descriptive and effective summary c Voxels in support 3 Absolute Error d Errors Figure 3: a An example ODF for a voxel. b Histogram of ages for subects. c Voxels in the support of model shown in blue. d Histogram of held out error magnitudes. 6 Conclusion In conclusion, this paper presents the FuSSO, a functional analogue to the LASSO. The FuSSO allows one to efficiently find a sparse set of functional input covariates to regress a real-valued response against. The FuSSO makes no parametric assumptions about the nature of input functional covariates and assumes a linear form to the mapping of functional covariates to the response. We provide a statistical backing for use of the FuSSO via proof of asymptotic sparsistency. Acknowledgements This work is supported in part by SF grants IIS47658 and IIS535. 7

9 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng References Raendra Bhatia. Matrix analysis, volume 69. Springer, 997. F. Ferraty and P. Vieu. onparametric functional data analysis: theory and practice. Springer, 6. 3 Gareth M James, Jing Wang, and Ji Zhu. Functional linear regression that s interpretable. The Annals of Statistics, pages 83 8, 9. 4 Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes, volume 3. Springer, Fang-Cheng eh, Van Jay Wedeen, W Tseng, et al. Generalized q-sampling imaging. IEEE transactions on medical imaging, 99:66,. 6 Ming uan and i Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B Statistical Methodology, 68:49 67, 6. 7 ihong Zhao, R Todd Ogden, and Philip T Reiss. Wavelet-based lasso in functional linear regression. Journal of Computational and Graphical Statistics, 3:6 67,. 5 icolai Meinshausen. Relaxed lasso. Computational Statistics & Data Analysis, 5: , 7. 6 icola Mingotti, Rosa E Lillo, and Juan Romo. Lasso variable selection in functional regression Junier B Oliva, Barnabás Póczos, and Jeff Schneider. Distribution to distribution regression. In International Conference on Machine Learning ICML, page ICML, 3. 8 B. Poczos, A. Rinaldo, A. Singh, and L Wasserman. Distribution-Free Distribution Regression. AISTATS, 3. 9 Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B Statistical Methodology, 75:9 3, 9. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B Methodological, pages 67 88, 996. Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer, 8. Martin J Wainwright. Sharp thresholds for highdimensional and noisy recovery of sparsity. arxiv preprint math/6574, 6. 3 Larry Wasserman. Probability inequalities. Lecture.pdf, August. 4 Fang-Cheng eh and Wen-ih Isaac Tseng. tu- 9: a high angular resolution brain atlas constructed by q-space diffeomorphic reconstruction. euroimage, 58:9 99,. 73

Sparse Functional Regression

Sparse Functional Regression Sparse Functional Regression Junier B. Olia, Barnabás Póczos, Aarti Singh, Jeff Schneider, Timothy Verstynen Machine Learning Department Robotics Institute Psychology Department Carnegie Mellon Uniersity

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Applied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery

Applied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery Applied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery Barnabas Poczos Machine Learning Department Carnegie Mellon University Machine Learning Technologies and

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Convex relaxation for Combinatorial Penalties

Convex relaxation for Combinatorial Penalties Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

DATA MINING AND MACHINE LEARNING

DATA MINING AND MACHINE LEARNING DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

Distribution-Free Distribution Regression

Distribution-Free Distribution Regression Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de

More information

Deep Learning: Approximation of Functions by Composition

Deep Learning: Approximation of Functions by Composition Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression

The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity

More information

Sparse Additive Functional and kernel CCA

Sparse Additive Functional and kernel CCA Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis

More information

Marginal Regression For Multitask Learning

Marginal Regression For Multitask Learning Mladen Kolar Machine Learning Department Carnegie Mellon University mladenk@cs.cmu.edu Han Liu Biostatistics Johns Hopkins University hanliu@jhsph.edu Abstract Variable selection is an important and practical

More information

Fast Distribution To Real Regression

Fast Distribution To Real Regression Junier B. Oliva Willie Neiswanger Barnabás Póczos Jeff Schneider Eric P. Xing Carnegie Mellon University Abstract We study the problem of distribution to real regression, where one aims to regress a mapping

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

arxiv: v4 [stat.me] 27 Nov 2017

arxiv: v4 [stat.me] 27 Nov 2017 CLASSIFICATION OF LOCAL FIELD POTENTIALS USING GAUSSIAN SEQUENCE MODEL Taposh Banerjee John Choi Bijan Pesaran Demba Ba and Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Center

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso

Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Rahul Garg IBM T.J. Watson research center grahul@us.ibm.com Rohit Khandekar IBM T.J. Watson research center rohitk@us.ibm.com

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar

More information

Hierarchical kernel learning

Hierarchical kernel learning Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

arxiv: v1 [math.st] 20 Nov 2009

arxiv: v1 [math.st] 20 Nov 2009 REVISITING MARGINAL REGRESSION arxiv:0911.4080v1 [math.st] 20 Nov 2009 By Christopher R. Genovese Jiashun Jin and Larry Wasserman Carnegie Mellon University May 31, 2018 The lasso has become an important

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

Tales from fmri Learning from limited labeled data. Gae l Varoquaux

Tales from fmri Learning from limited labeled data. Gae l Varoquaux Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Sparsity and the Lasso

Sparsity and the Lasso Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise

Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA

STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public

More information

Guaranteed Sparse Recovery under Linear Transformation

Guaranteed Sparse Recovery under Linear Transformation Ji Liu JI-LIU@CS.WISC.EDU Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Lei Yuan LEI.YUAN@ASU.EDU Jieping Ye JIEPING.YE@ASU.EDU Department of Computer Science

More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information

Cambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

A Short Introduction to the Lasso Methodology

A Short Introduction to the Lasso Methodology A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael

More information

Structure Learning of Mixed Graphical Models

Structure Learning of Mixed Graphical Models Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Learning Markov Network Structure using Brownian Distance Covariance

Learning Markov Network Structure using Brownian Distance Covariance arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the

More information

Appendix A. Proof to Theorem 1

Appendix A. Proof to Theorem 1 Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation

More information

Fast Function to Function Regression

Fast Function to Function Regression Junier B. Oliva Willie Neiswanger Barnabás Póczos Eric Xing Hy Trac Shirley Ho Jeff Schneider Machine Learning Department Robotics Institute Department of Physics Carnegie Mellon University Abstract We

More information

Post-selection Inference for Forward Stepwise and Least Angle Regression

Post-selection Inference for Forward Stepwise and Least Angle Regression Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Effective Dimension and Generalization of Kernel Learning

Effective Dimension and Generalization of Kernel Learning Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance

More information

Causal Inference: Discussion

Causal Inference: Discussion Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich

THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional

More information

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA

LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Convergence Rate of Expectation-Maximization

Convergence Rate of Expectation-Maximization Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs) ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

Sparse Approximation and Variable Selection

Sparse Approximation and Variable Selection Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA

GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University

More information

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach

Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

Scalable Hash-Based Estimation of Divergence Measures

Scalable Hash-Based Estimation of Divergence Measures Scalable Hash-Based Estimation of Divergence easures orteza oshad and Alfred O. Hero III Electrical Engineering and Computer Science University of ichigan Ann Arbor, I 4805, USA {noshad,hero} @umich.edu

More information

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization

Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin

More information