FuSSO: Functional Shrinkage and Selection Operator
|
|
- Darcy Green
- 6 years ago
- Views:
Transcription
1 Junier B. Oliva Barnabás Póczos Timothy Verstynen Aarti Singh Jeff Schneider Fang-Cheng eh Wen-ih Tseng Carnegie Mellon University ational Taiwan University Abstract We present the FuSSO, a functional analogue to the LASSO, that efficiently finds a sparse set of functional input covariates to regress a real-valued response against. The FuSSO does so in a semi-parametric fashion, making no parametric assumptions about the nature of input functional covariates and assuming a linear form to the mapping of functional covariates to the response. We provide a statistical backing for use of the FuSSO via proof of asymptotic sparsistency under various conditions. Furthermore, we observe good results on both synthetic and real-world data. Introduction Modern data collection has allowed us to collect not ust more data, but more complex data. In particular, complex obects like sets, distributions, and functions are becoming prevalent in many domains. It would be beneficial to perform machine learning tasks using these complex obects. However, many existing techniques can not handle complex, possibly infinite dimensional, obects; hence one often resorts to the ad-hoc technique of representing these complex obect by arbitrary summary statistics. In this paper, we look to perform a regression task when dealing with functional data. Specifically, we look to regress a mapping that takes in many functional input covariates and outputs a real value. Moreover, since we are considering many functional covariates possibly many more than the number of instances of one s data, we look to find an estimator that performs feature selection by only regressing on a subset of all possible input functional covariates. To this Appearing in Proceedings of the 7 th International Conference on Artificial Intelligence and Statistics AISTATS 4, Reykavik, Iceland. JMLR: W&CP volume 33. Copyright 4 by the authors. end we present the Functional Shrinkage and Selection Operator FuSSO, for performing sparse functional regression in a principled, semi-parametric manner. Indeed, there are a multitude of applications and domains where the study of a mapping that takes in a functional input and outputs a real-value is of interest. That is, if I is some class of input functions with domain Ψ R and range R, then one may be interested in a mapping h : I R: hf Figure a. Examples include: a mapping that takes in the timeseries of a commodity s price in the past f is a function with the domain of time and range of price and outputs the expected price of the commodity in the nearby future; also, a mapping that takes a patient s cardiac monitor s time-series and outputs a health index. Recently, work by 8 has explored this type of regression problem when the input function is a distribution. Furthermore, the general case of an arbitrary functional input is related to functional analysis. However, often it is expected that the response one is interested in regressing is dependent on not ust one, but many functions. That is, it may be fruitful to consider a mapping h : I... I p R: hf,..., f p Figure b. For instance, this is likely the case in regressing the price of a commodity in the future, since the commodity s future price is not only dependent on the history of it own price, but also the history of other commodities prices as well. A response s dependence on multiple functional covariates is especially common in neurological data, where thousands of voxels in the brain may each contain a corresponding function. In fact, in such domains it is not uncommon to have a number of input functional covariates that far exceeds the number of training instances one has in a data-set. Thus, it would be beneficial to have an estimator that is sparse in the number of functional covariates used to regress the response against. That is, find an estimate, ĥs, that depends on a small subset {i,..., i S } {,..., p}, such that ĥf,..., f p ĥsf i,..., f is Figure c. Here we present a semi-parametric estimator to perform sparse regression with multiple input functional 75
2 f f f f f f i f i f p- f p- f p f p a Single Functional Covariate b Multiple Functional Covariates c Sparse Model Figure : a Model where mapping takes in a function f and produces a real. b Model where response is dependent on multiple input functions f,..., f p. c Sparse model where response is dependent on a sparse subset of input functions f,..., f p. covariates and a real-valued response, the FuSSO: Functional Shrinkage and Selection Operator. o parametric assumptions are made on the nature of input functions. We shall assume that the response is the result of a sparse set of linear combinations of input functions and other non-paramteric functions {g i }: f, g. The resulting method is a LASSOlike estimator that effectively zeros out entire functions from consideration in regressing the response. Our contributions are as follows. We introduce the FuSSO, an estimator for performing regression with many functional covariates and a real-valued response. Furthermore, we provide a theoretical backing of the FuSSO estimator via proof of asymptotic sparsistency under certain conditions. We also illustrate the estimator with applications on synthetic data as well as in regressing the age of a subect when given orientation distribution function dodf 4 data for the subect s white matter. Related Work As previously mentioned, recently 8 explored regression with a mapping that takes in a probability density function and outputs a real value. Furthermore, 7 studies the case when both the input and outputs are distributions. In addition, functional analysis relates to the study for functional data. In all these works, the mappings studied take in only one functional covariate. Based on them, it is not immediately evident how to expand on these ideas to develop an estimator that simultaneously performs regression and feature selection with multiple function covariates. To the best of our knowledge, there has been no prior work in studying sparse mappings that take multiple functional inputs and produce a real-valued output. LASSO-like regression estimators that work with functional data include the following. In 6, one has a functional output and several real-valued covariates. Here, the estimator finds a sparse set of functions to scale by the real valued covariates to produce the functional response. Also, 7, 3 study the case when one has one functional covariate f and one real valued response that is linearly dependent on f and some function g: f, g fg. In 7 the estimator searches for sparsity across wavelet basis proection coefficients. In 3, sparsity is in achieved in the time input domain of the d th derivative of g; i.e. D d gt for many values of t where D d is the differential operator. Hence, roughly speaking, 7, 3 look for sparsity across frequency and time domains respectively, for the regressing function g. However, these methods do not consider the case where one has many input functional covariates {f,..., f p }, and needs to choose among them. That is, 7, 3 do not provide a method to select among function covariates in an analogous fashion to how the LASSO selects among real-valued covariates. Lastly, it is worth noting that in our estimator we will have an additive linear model, f, g where we search for {g i } in a broad, non-parametric family such that many g are the zero function. Such a task is similar in nature to the SpAM estimator 9, in which one also has an additive model g X in the dimensions of a real vector X and searches for {g i } in a broad, non-parametric family such that many g are the zero function. ote though, that in the SpAM model, the {g i } functions are applied to real covariates via a function evaluation. In the FuSSO model, {g i } are applied to functional covariates via an inner product; that is, FuSSO works over functional, not real-valued covariates, unlike SpAM. 76
3 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng 3 Model To better understand FuSSO s model we draw several analogies to real-valued linear regression and Group- LASSO 6. ote that although for simplicity we focus on functions working over a one dimensional domain, it is straightforward to extend the estimator and results to the multidimensional case. Consider a model for typical real-valued linear regression with a data-set of input-output pairs {X i, i } i : i X i, w + ɛ i, where i R, X i R d, w R d, ɛ i, σ, and X i, w d X iw. If instead one were working with functional data {f i, i } i, where f i :, R and f i L,, one may similarly consider a linear model: i f i, g + ɛ i, where, g :, R, and f i, g f i tgtdt. If Φ {ϕ m } m is an orthonormal basis for L, then we have that f i x m α i m ϕ m x, where, α m i f i tϕ m tdt. Similarly, gx m β mϕ m x where β m gtϕ mtdt. Thus, i f i, g + ɛ i α m i ϕ m x, β k ϕ k x + ɛ i m m k m k α i m β k ϕ m x, ϕ k x + ɛ i α i m β m + ɛ i, where the last step follows from orthonormality of Φ. Going back to the real-valued covariate case, if instead of having one feature vector per data instance: X i R d, one had p feature vectors associated to each data instance: {X i p, X i R d }, an additive linear model may be used for regression: i X i, w + ɛ i, where w,..., w p R d. Similarly, in the functional case one may have p functions associated with data instance i: {f i p, f i L, }. Then, an additive linear model would be: i f i, g + ɛ i m α i m β m + ɛ i, where g,..., g p L,, and α i m and β m are proection coefficients for f i and g respectively. Suppose that one has few observations relative to the number of features p. In the real-valued case, in order to effectively find a solution for w w T,..., w T p T one may search for a group sparse solution where many w. To do so, one may consider the following Group-LASSO regression: w argmin w X w + w, 3 where X is the d matrix X X... X T,,..., T, and is the Euclidean norm. If in the functional case one also has that p, one may set up a similar optimization to 3, whose direct analogue is: g argmin g equivalently, i i, g f i 4 + g ; 5 β argmin i α i m β β m 6 i m + βm, 7 m where g {g i } p i { m β imϕ m, } p i. However, it is unfeasible to directly observe functional inputs {f i i, p}. Thus, we shall instead assume that one observes { y i i, p} where y i f i ξ i f i f i + ξ i, 8 T /n, f i /n,..., f i, 9, σ ξi n. 77
4 That is, we observe a grid of n noisy values for each functional input. Then, one may estimate α i m as: α i m n ϕt m y i n ϕt m f i + ξ i ᾱ i m + ηi m where ϕ m ϕ m /n, ϕ m /n,..., ϕ m T. Furthermore, we may truncate the number of basis functions used to express f i to M n, estimating it as: f i x M n m α i m ϕ mx. Using the truncated estimate, one has: M n i f x, g m i f x α i m β m, and Mn α i m. m Hence, using the approximations, 7 becomes: M n ˆβ argmin i α i m β β m 3 i m + Mn βm 4 argmin β m à β + β, 5 where à is the M n matrix with values Ãi, m α i m and β β,..., β Mn T. ote that one need not consider proection coefficients β m for m > M n since such proection coefficients will not decrease the MSE term in 3 because α i m for m > M n, and β m for m > M n increases the norm penalty term in 4. Hence we see that our sparse functional estimates are a Group-LASSO problem on the proection coefficients. Extensions It is useful to note that there are several straightforward extensions to the FuSSO as presented. First, we would like to note that it may be possible to estimate the inner product of a function f i, and g as f i g y i, n g, where g g /n,..., g T. This effectively allows one to use a naive approach of simply using Group-LASSO on the y i feature vectors directly we ll refer to this method as -GL. It is important to note, however, that -GL will be less robust to noise, and adaptive and efficient to smoothness than the FuSSO. Furthermore, we note that it is not necessary to have observations for input functions that are on a grid for the FuSSO, since one may estimate proection coefficients in the case of an irregular design. Moreover, we may also estimate proection coeffincients for density functions with samples drawn from the pdf. ote that the -GL would fail to estimate our model in the irregular design case, and would not be possible in the case were functions are pdf. Also, a two-stage estimator as described in 5, where one first uses the regularization penalty with a large to find the support, then solves the optimization problem with a smaller on ust the estimated support to estimate the response, may be more efficient at estimating the response. Furthermore, an analogous problem as 5 may be framed to perform logistic regression and classification. 4 Theory ext, we show that the FuSSO is able to recover the correct sparsity pattern asymptotically; i.e., that the FuSSO estimate is sparsistent. In order to do so, we shall show that with high probability there is an optimal solution to our optimization problem 5 with the correct sparsity pattern. We follow a similar argument to, 9. We shall use a witness technique to show that there is a coefficient/subgradient pair ˆβ, û such that supp ˆβ suppβ, for true response generating β. Let Ωβ p β, be our penalty term 4. Let S denote the true set of non-zero functions; i.e. S { β }, with s S. First, we fix ˆβ S c, and set û S Ω β S. ote that for a vector β, Ω β {u} where: u β / β, if β ; u u if β. We shall show that with high probability, S, ˆβ and S c, u <, thus showing that there is an optimal solution to our optimization problem 5 that has the true sparsity pattern with high probability. First, we elaborate on our assumptions. 4. Assumptions Let Φ be the trigonometric basis, ϕ x, k : ϕ k x cosπkx, ϕ k+ x sinπkx. Let D {{ y i } p, i} i i, where y is as 8, and i p m αi m β m + ɛ i as in. Assume that i, p: α i Θγ, Q, where: Θγ, Q {θ : c kθk Q}, k c k k γ if k even or one, k γ otherwise, 78
5 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng α i {α i m R αi m f i ϕ m, m + } for < Q < and < γ <. Furthermore, assume that that for the true β generating the observed responses i, β, p: β Θγ, Q. Let A be the M n matrix with entries A i, m α i m. Let A S denote the matrix made up from horizontally concatenating the A matrices with S; i.e. A S A... A s, where {,..., s } S and i < k for i < k. Suppose the following: Λ max AT S A S Cmax < 6 Λ min AT S A S Cmin >. 7 Also, suppose δ, s.t. S c Λ max AT A Cmax < 8 AT A S AT S A S δ/ s 9 Let Ā be the M n matrix with entries Āi, m ᾱ i m n ϕt mf i. Let H be the M n matrix with entries H i, m η i m n ϕt mξ i. Thus, Ã Ā + H. Furthermore, let E Ā A. Then, Ã A + E + H. In addition to the aforementioned assumptions, we shall further assume the following: a < / s.t. pm n n a / e n a ρ ρ min S β > smn n γ+ / + n a s 3 / M / γ n + logsm n / 3 smn /ρ 4 s M n n γ+/ s log + n 5 sm n n + smn log 6 γ+a / n a+ / Mn logp sm n / 7 s/ M γ / n, 8 and we assume γ for the sake of simplification. We may further simplify our assumptions if we take n / and choose M n optimally for function estimation: M n n /γ+ /4γ+. Furthermore, take s O, ρ, and γ. Under these conditions, our assumptions reduce to < a and taking the follow to go to zero: p a 3 e a, /, 7 /, log, 9 / logp. 4. Sparsistency Theorem : P Ŝ S. First, we state some lemmas, whose proofs may be found in the supplementary materials. 4.. Lemmata Lemma Let X be a non-negative r.v. and C be an measurable event, then E X C PC E X. Lemma n n k ϕ mk/nϕ l k/n I{l m}, for l, m n. Lemma 3 Let H i, σ ξ n I, and Hi S be the rows of H, then H i, σ ξ n I. Lemma 4 P H max n a σ ξ pm n n a / e n a σ ξ Lemma 5 E max C Q n γ+/, where C Q, is a constant depending on Q. Lemma 6 β S Qs. Lemma 7, n, C min, C max, < C min C max <, < δ s.t. if H max < n a, and >, n > n then Λ max ÃT S ÃS C max < 9 Λ min C min > 3 S c, ÃT S ÃS ÃT ÃS ÃT S ÃS δ s Proof of Theorem Proposition P S ˆβ. Proof. Recall that by, ρ min S β >. Thus to prove that S ˆβ, it suffices to show that : ˆβ S βs ρ. To do so we show P ˆβ S βs > ρ. Let B be the event that H max < n a. ote that: P ˆβ S βs > ρ P ˆβ S βs > ρ B PB + P B c. Furthermore, P ˆβ S βs > ρ B ρ E ˆβ S βs B. Then, looking at the stationarity condition for the support S: ÃT S ÃS ˆβS + û S. 3 79
6 Let V be the vector with entries V i S mm n+ αi m β m ; i.e. the error from truncation. Then, using 3 A S βs + V + ɛ ÃT S ÃS ˆβS A S βs V ɛ + û S ÃS ˆβ S βs A S ÃSβS V ɛ û S Ã T S Thus, ÃT S ÃS ˆβ S β S ÃT S E S + H S β S + ÃT S V Let Σ SS ÃT S ÃS ; we see that, + ÃT S ɛ û S. 33 ˆβ S β S Σ SS ÃT S E S + H S β S + Σ SS ÃT S V + Σ SS ÃT S ɛ + Σ SS û S. Thus, we proceed to bound each term on the LHS in expectation. First, note that Σ SS ÃT S E S + H S βs Σ SS ÃT S E S + H S βs Σ SS AT S + ET S + HT S E S + H S βs Σ SS A T S E S + H S βs + E S + H S T E S + H S βs. Moreover, given that B occurs: Thus, E Σ SS sm n Σ SS smn. C min Σ SS A T S E S + H S βs B PB smn C min AT S E E S + H S β S B PB Q sm n C min ES β S + E H S β S B PB, noting that A T S Q. Moreover, by Lemma : Also, H S β S E H S β S B PB E HS β S. sm n is normally distributed and VarHiT S β S VarH i S β S σ ξ n β S σ ξ Qs n. Hence, by a Gaussian inequality e.g. 3 we have: E H S βs σξ Qs log/n. Unless otherwise specified, let X i be the i th row of matrix X and X be the th column. Also, E S β S max i EiT S β S β S max i Ei S Thus, E Σ Qs C Q smn n γ+/ QC Q s M n n γ+/ SS A T S E S + H S βs smn O s s log M n n γ+/ +. n Furthermore, E E S + H S T E S + H S βs B PB E max E S + H S T E S + H S β sm n S B PB E max E S + H S E S + H S β sm n S E E S + H S E S + H S β S B PB. Then, given that B occurs E S + H S E S + H S C Q n γ+/ + n a, and, as before: E E S + H S β S B C s s log M n n γ+/ + C 3. n Hence, E Σ SS ÃT S E S + H S βs PB O smn s M n n γ+/ + s log/n + n γ+/ + n a PB O smn s M n n γ+/ + s log/n The next terms are bounded as follows. Lemma 8 E Σ SS ÃT S V PB s O. Mn γ / Lemma 9 E Σ SS ÃT S ɛ PB logsmn O /. Lastly, û S max û max û S S Σ SS û S Σ SS û S smn. C min Keeping only leading terms, E ˆβ S βs PB O s 3 / M n n γ+/ + s M n log/n See Supplemental Materials for proof. 7
7 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng + O s 3 / /M γ / n + logsm n / + smn. Hence, by assumptions -4 we have P ˆβ S βs > ρ One may similarly look at the stationarity for S c to analyze û : ÃT ÃS β S + û ÃS ˆβ S βs A S ÃSβS V ɛ + û. ÃT Thus, û ÃT ÃSβ S ˆβ S + ÃT A S ÃSβ S + ÃT V + ɛ ΣS Σ SS ÃT S E S + H S β S ÃT S V ÃT S ɛ + û S ÃT E S + H S β S + ÃT V + ɛ, where Σ S ÃT ÃS and using 33. We wish to show that S c û satisfies the KKT conditions, that is: Proposition P max S c û <. Proof. Let µ H E û H. We proceed as follows: P max S c u < P max S c µh + u µ H < P max S c µh + M n u µ H < P max S c µh < δ, max S c u µ H < P max S c µh δ δ P max S c u µ H. M n We obtain the following results: Lemma P Lemma P max S c µ H δ max S c û µ H δ M n δ M n Hence, we have that P max S c û <. 5 Experiments 5. Synthetic Data We tested the FuSSO on synthetic data-sets of D {{ y i } p, i} i i where y as in 8. The experiments performed were as follows. First, we fix, n, p, and s. For i,...,,,..., p we create random functions using a maximum of M proection coefficients as follows: Set a m Unif, for m,..., M; set a m a i /c m, where c m m if m or is even, c m m if m is odd; 3 set a m a m / a ; 4 set α i a. See Figures b,c for typical functions. Similarly, we generate β for,..., s; for s +,..., p, we set β. Then, we generate i as i p β, αi + ɛ i + ɛ i, where ɛ,.. Also, a grid s β, αi of n noisy function evaluations were generated to make as in 8, with σ ξ.. These were then used to compute α i m for m,..., M n as in, M n was chosen by cross validation. See Figures b, c for typical noisy observations and function estimates for n 5 and n 5 respectively. y i We fixed s 5 and chose the following configurations for the other parameters: p,, n {, 5, 5,, 5, 5,, 5, 5}. For each tuple of p,, n configurations, random trails were performed. We recorded, r, the fraction of the trails that a value was able recover the correct sparsity pattern i.e. that only the first 5 functions are in the support. We also recorded the mean length of the range of,, that were able to recover t t the correct support; i.e. t, where t t f t l / t max, t f is the largest value found to recover the correct support in the t th trails, t l the smallest such, and t max is the smallest to produce ˆβ t is taken to be zero if no recovered the correct support. The results were as follows: p,, n r,5,5.68.5,5,5.477,5,5.479 Hence we see that even when the number of observations per function is small 5 or 5 and the number of total number of input functional covariates is large we were able to test up to, the FuSSO can recover the correct support. Also, to illustrate this point that running Group-LASSO on the y i features - GL is less robust to noise and adaptive to smoothness, we ran noisier trails using the configuration of p,, n, 5, 5. We increased the standard deviation of the noise on grid function observation and on the response to be 5 and respectively. Under these conditions the FuSSO was able to recover the support in 49% of the trails were as -GL recovered the support in 3% of the trails. Furthermore the FuSSO had 7
8 Typical Function.5.5 True Observations Estimated.5 X a Function at n 5 Typical Function.5.5 True Observations Estimated.5 X c Function at n 5 Regressed Function orm Functional Regularization Path 4 6 b Reg. Path at p, n 5, n 5 of ˆβ Regressed Function orm Functional Regularization Path 3 4 d Reg. Path at p, n 5, n 5 of ˆβ statistic of an dodf function for age regression 5. The proection coefficients for the dodfs at each voxel were estimated using the cosine basis. The FuSSO estimator gave a cross-validated MSE of 7.855, where the variance for age was ; selected voxels in the support may be seen in Figure 3c. The LASSO estimate using QA values gave a cross-validated MSE of Thus, one may see that considering the entire functional data gave us better results for age regression. We note that we were unable to use the naive approach of -GL in this case because of memory constraints and the fact that function evaluation points did not lie on a d square grid. Frequency 4 3 Figure : ac Two typical functions, noisy observations, and estimates. bd Regularization paths showing the norms of ˆβ in red for in support, blue otherwise for a range of ; rightmost vertical line indicates largest able to recover the support, leftmost line for smallest such a t.743 compared to t.54 for -GL. a Example ODF Frequency Age 5 b Ages 5. eurological Data We also tested the FuSSO estimator with a neurological data-set, using a total of 89 subects 4. Subects ranged in age from 8 to 6 years old Figure 3b. Our goal was to learn a regression that maps the dodfs at each white matter voxel for each subect to the subect s age. The dodf is a function represents the amount of water molecules, or spins, undergoing diffusion in different orientations over the S sphere5. I.e., each dodf is a function with a d domain of azimuth, elevation spherical coordinates and a range of reals representing the strength of water diffusion at the given orientations see Figure 3a. Data was provided for each subect in a template space for white-matter voxels; a total of over 5 thousand voxels dodfs were regressed on i.e. p 5. We also compared regression using the FuSSO and functional covariates to using the LASSO and real valued covariates. We used the non-functional collection of quantitative anisotropy QA values for the same white matter voxels as with dodf functions. QA values are the estimated amount of spins that undergo diffusion in the direction of the principle fiber orientation, i.e., the peak of the dodf; QAs have been used as a measure of white matter integrity in the underlying voxel hence making for a descriptive and effective summary c Voxels in support 3 Absolute Error d Errors Figure 3: a An example ODF for a voxel. b Histogram of ages for subects. c Voxels in the support of model shown in blue. d Histogram of held out error magnitudes. 6 Conclusion In conclusion, this paper presents the FuSSO, a functional analogue to the LASSO. The FuSSO allows one to efficiently find a sparse set of functional input covariates to regress a real-valued response against. The FuSSO makes no parametric assumptions about the nature of input functional covariates and assumes a linear form to the mapping of functional covariates to the response. We provide a statistical backing for use of the FuSSO via proof of asymptotic sparsistency. Acknowledgements This work is supported in part by SF grants IIS47658 and IIS535. 7
9 Oliva, Póczos, Verstynen, Singh, Schneider, eh, Tseng References Raendra Bhatia. Matrix analysis, volume 69. Springer, 997. F. Ferraty and P. Vieu. onparametric functional data analysis: theory and practice. Springer, 6. 3 Gareth M James, Jing Wang, and Ji Zhu. Functional linear regression that s interpretable. The Annals of Statistics, pages 83 8, 9. 4 Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes, volume 3. Springer, Fang-Cheng eh, Van Jay Wedeen, W Tseng, et al. Generalized q-sampling imaging. IEEE transactions on medical imaging, 99:66,. 6 Ming uan and i Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B Statistical Methodology, 68:49 67, 6. 7 ihong Zhao, R Todd Ogden, and Philip T Reiss. Wavelet-based lasso in functional linear regression. Journal of Computational and Graphical Statistics, 3:6 67,. 5 icolai Meinshausen. Relaxed lasso. Computational Statistics & Data Analysis, 5: , 7. 6 icola Mingotti, Rosa E Lillo, and Juan Romo. Lasso variable selection in functional regression Junier B Oliva, Barnabás Póczos, and Jeff Schneider. Distribution to distribution regression. In International Conference on Machine Learning ICML, page ICML, 3. 8 B. Poczos, A. Rinaldo, A. Singh, and L Wasserman. Distribution-Free Distribution Regression. AISTATS, 3. 9 Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B Statistical Methodology, 75:9 3, 9. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B Methodological, pages 67 88, 996. Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer, 8. Martin J Wainwright. Sharp thresholds for highdimensional and noisy recovery of sparsity. arxiv preprint math/6574, 6. 3 Larry Wasserman. Probability inequalities. Lecture.pdf, August. 4 Fang-Cheng eh and Wen-ih Isaac Tseng. tu- 9: a high angular resolution brain atlas constructed by q-space diffeomorphic reconstruction. euroimage, 58:9 99,. 73
Sparse Functional Regression
Sparse Functional Regression Junier B. Olia, Barnabás Póczos, Aarti Singh, Jeff Schneider, Timothy Verstynen Machine Learning Department Robotics Institute Psychology Department Carnegie Mellon Uniersity
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationApplied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery
Applied Machine Learning for Design Optimization in Cosmology, Neuroscience, and Drug Discovery Barnabas Poczos Machine Learning Department Carnegie Mellon University Machine Learning Technologies and
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationHigh-dimensional Statistical Models
High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationDistribution-Free Distribution Regression
Distribution-Free Distribution Regression Barnabás Póczos, Alessandro Rinaldo, Aarti Singh and Larry Wasserman AISTATS 2013 Presented by Esther Salazar Duke University February 28, 2014 E. Salazar (Reading
More informationRobust Inverse Covariance Estimation under Noisy Measurements
.. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related
More informationLinear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1
Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1 ( OWL ) Regularization Mário A. T. Figueiredo Instituto de Telecomunicações and Instituto Superior Técnico, Universidade de
More informationDeep Learning: Approximation of Functions by Composition
Deep Learning: Approximation of Functions by Composition Zuowei Shen Department of Mathematics National University of Singapore Outline 1 A brief introduction of approximation theory 2 Deep learning: approximation
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationSparse representation classification and positive L1 minimization
Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng
More informationCompressed Sensing and Sparse Recovery
ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationThe Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression
The Sparsity and Bias of The LASSO Selection In High-Dimensional Linear Regression Cun-hui Zhang and Jian Huang Presenter: Quefeng Li Feb. 26, 2010 un-hui Zhang and Jian Huang Presenter: Quefeng The Sparsity
More informationSparse Additive Functional and kernel CCA
Sparse Additive Functional and kernel CCA Sivaraman Balakrishnan* Kriti Puniyani* John Lafferty *Carnegie Mellon University University of Chicago Presented by Miao Liu 5/3/2013 Canonical correlation analysis
More informationMarginal Regression For Multitask Learning
Mladen Kolar Machine Learning Department Carnegie Mellon University mladenk@cs.cmu.edu Han Liu Biostatistics Johns Hopkins University hanliu@jhsph.edu Abstract Variable selection is an important and practical
More informationFast Distribution To Real Regression
Junier B. Oliva Willie Neiswanger Barnabás Póczos Jeff Schneider Eric P. Xing Carnegie Mellon University Abstract We study the problem of distribution to real regression, where one aims to regress a mapping
More informationHigh-dimensional graphical model selection: Practical and information-theoretic limits
1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John
More informationHigh-dimensional Statistics
High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given
More informationarxiv: v4 [stat.me] 27 Nov 2017
CLASSIFICATION OF LOCAL FIELD POTENTIALS USING GAUSSIAN SEQUENCE MODEL Taposh Banerjee John Choi Bijan Pesaran Demba Ba and Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Center
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationBlock-sparse Solutions using Kernel Block RIP and its Application to Group Lasso
Block-sparse Solutions using Kernel Block RIP and its Application to Group Lasso Rahul Garg IBM T.J. Watson research center grahul@us.ibm.com Rohit Khandekar IBM T.J. Watson research center rohitk@us.ibm.com
More informationStatistical Machine Learning for Structured and High Dimensional Data
Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationHierarchical kernel learning
Hierarchical kernel learning Francis Bach Willow project, INRIA - Ecole Normale Supérieure May 2010 Outline Supervised learning and regularization Kernel methods vs. sparse methods MKL: Multiple kernel
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationarxiv: v1 [math.st] 20 Nov 2009
REVISITING MARGINAL REGRESSION arxiv:0911.4080v1 [math.st] 20 Nov 2009 By Christopher R. Genovese Jiashun Jin and Larry Wasserman Carnegie Mellon University May 31, 2018 The lasso has become an important
More informationStrengthened Sobolev inequalities for a random subspace of functions
Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)
More informationTales from fmri Learning from limited labeled data. Gae l Varoquaux
Tales from fmri Learning from limited labeled data Gae l Varoquaux fmri data p 100 000 voxels per map Heavily correlated + structured noise Low SNR: 5% 13 db Brain response maps (activation) n Hundreds,
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationSparsity and the Lasso
Sparsity and the Lasso Statistical Machine Learning, Spring 205 Ryan Tibshirani (with Larry Wasserman Regularization and the lasso. A bit of background If l 2 was the norm of the 20th century, then l is
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationApproximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise
IEICE Transactions on Information and Systems, vol.e91-d, no.5, pp.1577-1580, 2008. 1 Approximating the Best Linear Unbiased Estimator of Non-Gaussian Signals with Gaussian Noise Masashi Sugiyama (sugi@cs.titech.ac.jp)
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSTATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public
More informationGuaranteed Sparse Recovery under Linear Transformation
Ji Liu JI-LIU@CS.WISC.EDU Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA Lei Yuan LEI.YUAN@ASU.EDU Jieping Ye JIEPING.YE@ASU.EDU Department of Computer Science
More informationCambridge University Press The Mathematics of Signal Processing Steven B. Damelin and Willard Miller Excerpt More information
Introduction Consider a linear system y = Φx where Φ can be taken as an m n matrix acting on Euclidean space or more generally, a linear operator on a Hilbert space. We call the vector x a signal or input,
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationA Short Introduction to the Lasso Methodology
A Short Introduction to the Lasso Methodology Michael Gutmann sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology March 9, 2016 Michael
More informationStructure Learning of Mixed Graphical Models
Jason D. Lee Institute of Computational and Mathematical Engineering Stanford University Trevor J. Hastie Department of Statistics Stanford University Abstract We consider the problem of learning the structure
More informationThe deterministic Lasso
The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality
More informationLearning Markov Network Structure using Brownian Distance Covariance
arxiv:.v [stat.ml] Jun 0 Learning Markov Network Structure using Brownian Distance Covariance Ehsan Khoshgnauz May, 0 Abstract In this paper, we present a simple non-parametric method for learning the
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationFast Function to Function Regression
Junier B. Oliva Willie Neiswanger Barnabás Póczos Eric Xing Hy Trac Shirley Ho Jeff Schneider Machine Learning Department Robotics Institute Department of Physics Carnegie Mellon University Abstract We
More informationPost-selection Inference for Forward Stepwise and Least Angle Regression
Post-selection Inference for Forward Stepwise and Least Angle Regression Ryan & Rob Tibshirani Carnegie Mellon University & Stanford University Joint work with Jonathon Taylor, Richard Lockhart September
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationEffective Dimension and Generalization of Kernel Learning
Effective Dimension and Generalization of Kernel Learning Tong Zhang IBM T.J. Watson Research Center Yorktown Heights, Y 10598 tzhang@watson.ibm.com Abstract We investigate the generalization performance
More informationCausal Inference: Discussion
Causal Inference: Discussion Mladen Kolar The University of Chicago Booth School of Business Sept 23, 2016 Types of machine learning problems Based on the information available: Supervised learning Reinforcement
More informationHigh dimensional Ising model selection
High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationTHE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES. By Sara van de Geer and Johannes Lederer. ETH Zürich
Submitted to the Annals of Applied Statistics arxiv: math.pr/0000000 THE LASSO, CORRELATED DESIGN, AND IMPROVED ORACLE INEQUALITIES By Sara van de Geer and Johannes Lederer ETH Zürich We study high-dimensional
More informationLASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA
The Annals of Statistics 2009, Vol. 37, No. 1, 246 270 DOI: 10.1214/07-AOS582 Institute of Mathematical Statistics, 2009 LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA BY NICOLAI
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationSparse Graph Learning via Markov Random Fields
Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning
More informationConvergence Rate of Expectation-Maximization
Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationUniversity of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May
University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector
More informationOn Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:
A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationESL Chap3. Some extensions of lasso
ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationGROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA
GROUP-SPARSE SUBSPACE CLUSTERING WITH MISSING DATA D Pimentel-Alarcón 1, L Balzano 2, R Marcia 3, R Nowak 1, R Willett 1 1 University of Wisconsin - Madison, 2 University of Michigan - Ann Arbor, 3 University
More informationThe Nonparanormal skeptic
The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationCompressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach
Compressive Sensing under Matrix Uncertainties: An Approximate Message Passing Approach Asilomar 2011 Jason T. Parker (AFRL/RYAP) Philip Schniter (OSU) Volkan Cevher (EPFL) Problem Statement Traditional
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationAnalysis of Greedy Algorithms
Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm
More informationScalable Hash-Based Estimation of Divergence Measures
Scalable Hash-Based Estimation of Divergence easures orteza oshad and Alfred O. Hero III Electrical Engineering and Computer Science University of ichigan Ann Arbor, I 4805, USA {noshad,hero} @umich.edu
More informationSimultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 6, JUNE 2011 3841 Simultaneous Support Recovery in High Dimensions: Benefits and Perils of Block `1=` -Regularization Sahand N. Negahban and Martin
More information