Generalized Concomitant Multi-Task Lasso for sparse multimodal regression
|
|
- Gwen Shelton
- 5 years ago
- Views:
Transcription
1 Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias INRIA Saclay Joint work with: Olivier Fercoq (Télécom ParisTech) Alexandre Gramfort (INRIA Saclay) Joseph Salmon (Télécom ParisTech)
2 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model
3 "One" motivation: M/EEG inverse problem sensors: magneto- and electro-encephalogram measurements during a cognitive experiment sources: brain locations
4 p(m) e assume Gaussian variables: The M/EEG inverse problem: modelisation variables: E N (0, E ) E N (0, 2.5e 13 X N (0, dditive model: 0 X) B X) 1.3e e 13 X p n 100 d 1.3e 13 X N (0, E) M = GX + E. M = GX +E. X are Y known, X is obtained by maximizing the l is obtained by maximizing the likelihood which lead X = arg min M GX E + X = arg min M ads to: X GX X E X = + X XG T X (G, XG T + E) 1
5 The M/EEG inverse problem: modelisation Multi-task regression: n observations q tasks p features Y R n q observation matrix X R n p forward matrix Y = XB + E where B R p q is the true source activity matrix E R n q is an additive white noise.
6 l 2,1 regularization sources Penalty: Group-Lasso (l 2,1 ) p B 2,1 = B j 2 j=1 (B j : j th row of B) we solve: 1 ˆB arg min B R p q 2 Y XB 2 + λ B 2,1 time Source activity ˆB R p q a.k.a. Multiple Measurement Vector (MMV) in signal processing or multitask Lasso in ML [Obozinski et al., 2010]
7 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model
8 Choice of λ and noise level The noise E is assumed white gaussian, with i.i.d. entries E i,t N (0, σ 2 ) For optimal performance of the Lasso, λ should be proportional to the noise level (λ σ 2009] Yet σ is unknown in practice! log p n ) [Bickel et al.,
9 Joint estimation of β and σ (illustrated on Lasso for simplicity: model is y = Xβ + ε) Intuitive idea: run Lasso with some λ, get ˆβ estimate σ with residuals: σ = y X ˆβ / n relaunch Lasso with λ σ etc. Note: this is the original implementation proposed for the Scaled-Lasso [Sun and Zhang, 2012]
10 Concomitant Lasso Concomitant Lasso [Owen, 2007] (inspired by Huber [1981]): ( ˆβ, y Xβ 2 ˆσ) arg min + σ β R p,σ>0 2nσ 2 + λ β 1 also equivalent to the Square-root/Scaled Lasso [Belloni et al., 2011, Sun and Zhang, 2012]. Note: note that σ 2 acts as a penalty over the noise
11 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007].
12 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n.
13 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n.
14 Solving the Concomitant Lasso A jointly convex formulation (w.r.t. both β and σ): y Xβ 2 arg min + σ β R p,σ>0 2nσ 2 + λ β 1 β update: smooth + separable function, make CD steps [Friedman et al., 2007]. σ update: 1D optimization problem, closed-form σ = y Xβ / n. But what if we hit y = Xβ?
15 Smoothed Concomitant Lasso Smoothed Concomitant Lasso [Ndiaye et al., 2017]: y Xβ 2 arg min + σ β R p,σ>σ 2nσ 2 + λ β 1
16 Smoothed Concomitant Lasso Smoothed Concomitant Lasso [Ndiaye et al., 2017]: y Xβ 2 arg min + σ β R p,σ>σ 2nσ 2 + λ β 1 Called smoothed following the terminology of Nesterov [2005], because it amounts to smoothing in the dual: for X = ˆθ = arg max y, λθ + σ θ X ( ) 1 2 nλ2 2 θ 2 { } θ R n : X θ 1, θ 1 λn. state-of-the-art solvers as fast as for the Lasso.
17 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model
18 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id.
19 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id. In the case Σ = σ Id, we recover the Smoothed Concomitant.
20 What about more complex noise models? What can we do if the noise is not white? Smoothed Generalized Concomitant Lasso (SGCL): (ˆB, ˆΣ) arg min B R p q Σ S n ++,Σ Σ Y XB 2 Σ 1 2nq + Tr(Σ) 2n + λ B 2,1 with Z A = Tr Z AZ and Σ = σ Id. In the case Σ = σ Id, we recover the Smoothed Concomitant. Note: the noise penalty is now on the sum of the eigenvalues of Σ
21 Solving the SGCL Jointly convex formulation, we can still use alternate minimization and use the duality gap as stopping criterion. Σ fixed: smooth + l 1 -type, BCD works
22 Solving the SGCL Jointly convex formulation, we can still use alternate minimization and use the duality gap as stopping criterion. B fixed: with the current residuals R = Y XB, the problem is: 1 arg min Σ S n ++,Σ Σ 2nq Tr[R Σ 1 R] + 1 2n Tr(Σ). Closed-form solution: if U diag(s 1,..., s n )U is the SVD of RR : arg min Σ = U diag(max(σ, s 1 ),..., max(σ, s n ))U.
23 Alternate minimization Algorithm: Alternate min. for multi-task SGCL input : X, Y, Σ, λ, f, T init : B = 0 p,q, Σ 1 = Σ 1, R = Y for iter = 1,..., T do if iter = 1 (mod f) then Σ Ψ(R, Σ) // closed-form sol. of minimization in Σ for j = 1,..., p do L j = Xj Σ 1 X j // Lipschitz constants for j = 1,..., p do R R + X j B j // partial residual update (X j B j BST R, λnq ) L j L j // coef. update R R X j B j // residual update return B, Σ Complexity? OK if we store Σ 1 X, and Σ 1 R instead of R.
24 Main drawbacks Σ update OK for M/EEG because n = 300, but O(n 3 ) SVD problematic otherwise. O(n 2 ) parameters to infer for Σ, with nq observations: works only for q 10n.
25 Table of Contents Motivation - problem setup Joint estimators of the noise level General noise model Block homoscedastic model
26 Block Homoscedastic model In our case we record 3 different types of signals: electrodes measure the electric potentials magnetometers measure the magnetic field gradiometers measure the gradient of the magnetic field physical natures, different noise levels Observations are divided into 3 blocks & the partition is known.
27 Block Homoscedastic model K groups of observations: X = X 1. X K, Y = Y 1. Y K, E = E 1. E K Σ = diag(σ 1 Id n1,..., σ K Id nk ) For each block, homoscedastic model with white noise: Y k = X k B + σ ke k and entries of E k i.i.d. N (0, 1)
28 Smoothed Block Homoscedastic Concomitant (SBHCL) Plugging a diagonal Σ (constant over consecutive blocks) into the general model yields: Block Homoscedastic Concomitant: ( K Y k X k B 2 arg min B R p q, 2nqσ k=1 k σ 1,...,σ K R K ++ σ k σ k, k [K] + n ) kσ k + λ B 2n 2,1 Σ down from n(n 1) 2 parameters (hopeless without more structure) to K.
29 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier
30 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier σ k s updates are simple and can even be performed at each B j update (as for the concomitant)
31 Solving the SBHCL (block) coordinate descent steps remain the same computing Σ 1 R for the BCD is easier σ k s updates are simple and can even be performed at each B j update (as for the concomitant)
32 Alternate minimization Algorithm: Alternate min. for multi-task SBHCL input : X 1,..., X K, Y 1,..., Y K, σ 1,..., σ K, λ, T init : B = 0 p,q, k [K], σ k = Y k / n k q, R k = Y k, k [K], j [p], L k,j = Xj k 2 2 for iter = 1,..., T do for j = 1,..., p do for k = 1,..., K do R k R k + Xj kb j // residual update ( K B j BST k=1 Xj k σ k for k = 1,..., K do R k R k Xj kb j σ k σ k Rk nk q return B, σ 1,..., σ k R k ), λnq / K k=1 L k,j σ k // residual update // smart std dev update
33 In practice Design: (n, p, q) = 300, 1000, 100 X Toeplitz-correlated: Cov(X i, X j ) = ρ i j, ρ ]0, 1[ 3 blocks with noise in ratio 1, 2, 5
34 Support recovery True positive rate True positive rate SBHCL MTL MTL (Block 1) SCL SCL (Block 1) False positive rate SBHCL MTL MTL (Block 1) SCL SCL (Block 1) False positive rate ROC curves of true support recovery: for SBHCL, MTL and SCL on all blocks, and the MTL and SCL on the least noisy block. Top: SNR=1, ρ = 0.1, Bottom: SNR = 1, ρ = 0.9.
35 Prediction performance log10(rmse/rmseoracle) log10(rmse/rmseoracle) SBHCL (block 1) SBHCL (block 2) SBHCL (block 3) SCL (block 1) SCL (block 2) SCL (block 3) / max SBHCL (block 1) SBHCL (block 2) SBHCL (block 3) SCL (block 1) SCL (block 2) SCL (block 3) / max
36 Take home message more general noise models: possible to estimate full covariance if there are enough tasks if more noise structure is known (e.g., block homoscedastic model): not more costly than the multi-task Lasso (MTL) taking into account multiple noise levels helps: both for prediction and support identification using additional (though noisier) data helps! Python code is available at This work was funded by ERC Starting Grant SLAB ERC-YStG Slides powered by MooseTeX.
37 References I G. Obozinski, B. Taskar, and M. I. Jordan. Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 20(2): , P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37(4): , T. Sun and C.-H. Zhang. Scaled sparse linear regression. Biometrika, 99(4): , A. B. Owen. A robust hybrid of lasso and ridge regression. Contemporary Mathematics, 443:59 72, P. J. Huber. Robust Statistics. John Wiley & Sons Inc., A. Belloni, V. Chernozhukov, and L. Wang. Square-root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4): , J. Friedman, T. J. Hastie, H. Höfling, and R. Tibshirani. Pathwise coordinate optimization. Ann. Appl. Stat., 1(2): , E. Ndiaye, O. Fercoq, A. Gramfort, V. Leclère, and J. Salmon. Efficient smoothed concomitant Lasso estimation for high dimensional regression. In NCMIP, Y. Nesterov. Smooth minimization of non-smooth functions. Math. Program., 103(1): , 2005.
Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression
Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression Mathurin Massias Olivier Fercoq Alexandre Gramfort Joseph Salmon Inria LTCI, Télécom ParisTech Inria LTCI, Télécom ParisTech Université
More informationarxiv: v2 [stat.ml] 18 Oct 2017
Generalized Concomitant Multi-Task Lasso for sparse multimodal regression Mathurin Massias *, Olivier Fercoq, Alexandre Gramfort 2, and Joseph Salmon 2 arxiv:705.09778v2 [stat.ml] 8 Oct 207 INRIA, Université
More informationEfficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression
Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, Vincent Leclère, Joseph Salmon To cite this version: Eugene Ndiaye, Olivier
More informationFrom safe screening rules to working sets for faster Lasso-type solvers
From safe screening rules to working sets for faster Lasso-type solvers Mathurin Massias INRIA, Université Paris Saclay, Palaiseau, France Alexandre Gramfort INRIA, Université Paris Saclay, Palaiseau,
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationGaussian Graphical Models and Graphical Lasso
ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationAdaptive Primal Dual Optimization for Image Processing and Learning
Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationSparse inverse covariance estimation with the lasso
Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More informationComputational Statistics and Optimisation. Joseph Salmon Télécom Paristech, Institut Mines-Télécom
Computational Statistics and Optimisation Joseph Salmon http://josephsalmon.eu Télécom Paristech, Institut Mines-Télécom Plan Duality gap and stopping criterion Back to gradient descent analysis Forward-backward
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationarxiv: v2 [math.st] 15 Sep 2015
χ 2 -confidence sets in high-dimensional regression Sara van de Geer, Benjamin Stucky arxiv:1502.07131v2 [math.st] 15 Sep 2015 Abstract We study a high-dimensional regression model. Aim is to construct
More informationLinear Regression. Aarti Singh. Machine Learning / Sept 27, 2010
Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Science News Anemic cell Healthy cell Regression X = Document Y = Topic X
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationCeler: Fast solver for the Lasso with dual extrapolation
Celer: Fast solver for the Lasso with dual extrapolation Mathurin Massias https://mathurinm.github.io INRIA Joint work with: Alexandre Gramfort (INRIA) Joseph Salmon (Télécom ParisTech) To appear in ICML
More informationOrthogonal Matching Pursuit for Sparse Signal Recovery With Noise
Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationarxiv: v2 [stat.ml] 1 May 2017
From safe screening rules to working sets for faster Lasso-type solvers Mathurin Massias Alexandre Gramfort Joseph Salmon arxiv:1703.07285v2 [stat.ml] 1 May 2017 May 1, 2017 Abstract Convex sparsity-promoting
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 1 Feature selection task 2 1 Why might you want to perform feature selection? Efficiency: - If
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationGraphical Model Selection
May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationSparse PCA with applications in finance
Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationGradient descent. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Gradient descent Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Gradient descent First consider unconstrained minimization of f : R n R, convex and differentiable. We want to solve
More informationregression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered,
L penalized LAD estimator for high dimensional linear regression Lie Wang Abstract In this paper, the high-dimensional sparse linear regression model is considered, where the overall number of variables
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Graphical LASSO Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox February 26 th, 2013 Emily Fox 2013 1 Multivariate Normal Models
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationMultivariate Normal Models
Case Study 3: fmri Prediction Coping with Large Covariances: Latent Factor Models, Graphical Models, Graphical LASSO Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationAdaptive Piecewise Polynomial Estimation via Trend Filtering
Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationEstimators based on non-convex programs: Statistical and computational guarantees
Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright
More informationHigh-dimensional regression with unknown variance
High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationSimultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors
Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors Sean Borman and Robert L. Stevenson Department of Electrical Engineering, University of Notre Dame Notre Dame,
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationBayesian Methods for Sparse Signal Recovery
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection Emily Fox University of Washington January 18, 2017 Feature selection task 1 Why might you want to perform feature selection? Efficiency: - If size(w)
More information6. Regularized linear regression
Foundations of Machine Learning École Centrale Paris Fall 2015 6. Regularized linear regression Chloé-Agathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationSOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu
SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationLecture 9: September 28
0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These
More informationStepwise Searching for Feature Variables in High-Dimensional Linear Regression
Stepwise Searching for Feature Variables in High-Dimensional Linear Regression Qiwei Yao Department of Statistics, London School of Economics q.yao@lse.ac.uk Joint work with: Hongzhi An, Chinese Academy
More information1 Regression with High Dimensional Data
6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:
More informationarxiv: v1 [math.st] 13 Feb 2012
Sparse Matrix Inversion with Scaled Lasso Tingni Sun and Cun-Hui Zhang Rutgers University arxiv:1202.2723v1 [math.st] 13 Feb 2012 Address: Department of Statistics and Biostatistics, Hill Center, Busch
More informationSparse orthogonal factor analysis
Sparse orthogonal factor analysis Kohei Adachi and Nickolay T. Trendafilov Abstract A sparse orthogonal factor analysis procedure is proposed for estimating the optimal solution with sparse loadings. In
More informationAdaptive one-bit matrix completion
Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines
More informationPrimal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions
Primal-dual coordinate descent A Coordinate Descent Primal-Dual Algorithm with Large Step Size and Possibly Non-Separable Functions Olivier Fercoq and Pascal Bianchi Problem Minimize the convex function
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationSupplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data
Supplement to A Generalized Least Squares Matrix Decomposition Genevera I. Allen 1, Logan Grosenic 2, & Jonathan Taylor 3 1 Department of Statistics and Electrical and Computer Engineering, Rice University
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationA Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models
A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los
More informationFactor Analysis and Kalman Filtering (11/2/04)
CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used
More informationJoint Gaussian Graphical Model Review Series I
Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationDivide-and-combine Strategies in Statistical Modeling for Massive Data
Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017
More informationRegression, Ridge Regression, Lasso
Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.
More informationHomework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn
Homework 1 Solutions Probability, Maximum Likelihood Estimation (MLE), Bayes Rule, knn CMU 10-701: Machine Learning (Fall 2016) https://piazza.com/class/is95mzbrvpn63d OUT: September 13th DUE: September
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationMachine Learning. Regression basics. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Regression basics Linear regression, non-linear features (polynomial, RBFs, piece-wise), regularization, cross validation, Ridge/Lasso, kernel trick Marc Toussaint University of Stuttgart
More informationComparisons of penalized least squares. methods by simulations
Comparisons of penalized least squares arxiv:1405.1796v1 [stat.co] 8 May 2014 methods by simulations Ke ZHANG, Fan YIN University of Science and Technology of China, Hefei 230026, China Shifeng XIONG Academy
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationSolving Corrupted Quadratic Equations, Provably
Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationRestricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model
Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model Xiuming Zhang zhangxiuming@u.nus.edu A*STAR-NUS Clinical Imaging Research Center October, 015 Summary This report derives
More information