Online Dictionary Learning with Group Structure Inducing Norms
|
|
- Aileen Burns
- 5 years ago
- Views:
Transcription
1 Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh, USA ICML, Structured Sparsity July 2, 2011
2 Contents Sparse coding, structured sparsity, Structured dictionary learning: Our requirements, Cost function, Special cases, Optimization. Numerical examples.
3 Sparse coding Observation (x) = linear combination of a few vectors (α) from a fixed dictionary (D). l 0 -norm solution: NP-hard. Popular relaxations: l p (0 < p 1) norm. Special case: l 1, Lasso problem, efficient algorithms, [ ] 1 min α 2 x Dα κ α 1. (1) Disadvantage: prior knowledge on the structure of the hidden code is not taken into account.
4 Structured sparsity Different kind of structures (e.g., disjunct groups, trees) on the sparse codes increased performances in several applications: robust CS with substantially fewer observations, multi-task learning problems, structure learning in graphical models, natural language processing, fmri analysis, face expression discrimination/recognition.
5 Structured dictionary learning Both dictionary learning (sparse) principal component analysis, (sparse) non-negative matrix factorization (NMF), independent component analysis, independent subspace analysis, and structured sparse coding are very popular. However, very few works have focused on the combination of these two tasks.
6 Structured dictionary learning: wanted properties Interested in algorithms with the following four properties: handle general, overlapping group structures, online: fast, memory efficient, adaptive, non-convex sparsity inducing regularization: fewer measurements, weaker conditions on the dictionary, robust (w.r.t. noise, compressibility). can deal with missing information. Current approaches: handle 2.
7 Cost function Notation: α hidden representation, x observation, D dictionary, G group structure (set system) 2 {1,...,dα}. Group structure inducing on the hidden representation: Ω(α) = ( αg 2 ) G G η, (2) Ω(α) = ( d G α 2 ) G G η, (3) Ω(α) = ( A G α 2 ) G G η, η (0, 2). (4) Approximate on the observed coordinates (x O ): 1 2 x O D O α 2 2. (5)
8 Cost function continued Loss for a fixed observation (κ > 0): l(x O, D O ) = min α [ 1 2 x O D O α κω(α) ]. (6) Goal (OSDL): minimize the average loss of the dictionary min f t (D) := 1 D t t l(x Oi, D Oi ). (7) i=1 Possible dictionary/representation constraints: D D = dα i=1 D i R dx dα : closed, convex, and bounded. α A R dα : convex, closed.
9 Special cases O i = {1,...,d x } ( i): fully observed OSDL task. Special cases for G: Traditional sparse dictionary G = {{1}, {2},..., {d α }}. Hierarchical dictionary G = descendants of the nodes. Grid adopted dictionary G = nearest neighbors of the nodes. Group Lasso G = partition. Elastic net G = singletons and {1,...,d α }. Contiguous code G = intervals.
10 Special cases continued Special cases for {A G } G G : Fused Lasso Ω(α) = dα 1 Graph-guided fusion penalty Ω(α) = Linear trend/polynomial filtering j=1 Ω(α) = dα 1 α j+1 α j. e=(i,j) E:i<j j=2 Generalized Lasso penalty Ω(α) = Aα 1. Total variation Ω(α) = d 1 d 2 i=1 j=1 w ij α i v ij α j. α j 1 + 2α j α j+1. ( α)ij 2.
11 Special cases continued Special cases for D,A: Traditional setting l 2 constrained D. Structured NMF non-negative D and α. Structured mixture-of-topics l 1 constained D, non-negative D, α. Hard representation constraints group norm/elastic net/ fused Lasso constrained α. Double structured dictionaries group norm constraints to α and D.
12 OSDL optimization Online optimization of D through alternations: For fix D t 1 and x Ot, α t is the solution of [ 1 α t = argmin xot (D t 1 ) α A 2 Ot α ] κω(α). (8) Using {α i } t i=1, D t is updated by means of the quadratic optimization ˆft (D t ) = min f t(d, {α i } t i=1 ). (9) D D Solution idea: variational property of η ; BCD + 3 different ˆf t statistics + matrix recursions.
13 Numerical examples inpainting of natural images Structured (toroid) vs. unstructured dictionary: 13 19% improvement. Efficiency in case of missing observations: MSE grows slowly, p tr = 0.9 (training incompleteness: 90%) is still OK. Left: unstructured; center: structured; right: structured, incomplete observations.
14 Numerical examples inpainting, full unseen image Learning: p tr = 0.5. Inpainting: p val test = 0.7
15 Numerical examples inpainting, full unseen image Learning: p tr = 0.5. Inpainting: ptest val = 0.7 (PSNR = 29 db):
16 Numerical examples online structured NMF on faces Online, G-NMF: special case of OSDL. Illustration: color FERET, sized facial dataset. G: complete, 8-level binary tree (d α = 255).
17 Numerical examples collaborative filtering Joke recommendation (Jester): 100 jokes 73, 421 users. Observation: x Ot = ratings of the t th user. Baseline: best known RMSEs (item neighbor), (unstructured dictionary, d α = 100). Result: toroid G (d α = 100): RMSE = , hierarchical G (d α = 15): RMSE =
18 Conclusions We developed a dictionary learning method, which enables general overlapping group structures, is online, applies non-convex sparsity inducing regularization, can deal with missing information. Dictionary learning for several actively studied structured sparse coding problems. Numerical examples: inpainting of natural images, structured NMF, collaborative filtering.
19 Acknowledgments The research was partly supported by the Department of Energy (grant number DESC ).
20 Thank you for the attention!
21 Representation optimization (α) Structured sparse coding task: 1 x 2 Ot (D t 1 ) Ot α 2 + κω(α) min 2. (10) α A Solution: let us use the y η = min z R d + [ 1 2 d i=1 y 2 j z j z β ], (11) variational property of η, where y R d, β = η 2 η, and the minimum value is attained at z i = y i 2 η y η 1 η.
22 Representation optimization (α) continued Our problem is equivalent to the solution of J(α, z) = 1 2 xot (D t 1 ) Ot α ) (α 2 2 +κ1 T Hα + z 2 β min α A,z R G +, where H = H(z) = G G(A G ) T A G /z G. (12) One can optimize J(α, z) by iterative alternating steps: For given α: explicit formula for the optimal z = (z G ) G G z G = A G α 2 η 2 ( A G α 2 ) G G η 1 η. (13) For given α: quadratic cost on the convex set A.
23 Dictionary optimization (D) Cost function (ρ: non-negative forgetting factor): ˆft (D) = 1 t j=1 (j/t)ρ t i=1 ( ) ρ [ ] i 1 t 2 x O i D Oi α i κω(α i) min. D D Optimization (BCD): optimize in d j, while the other columns (d i, i j) are fixed. ˆft is quadratic in d j : 1 Solve the equation: ˆf t d j (u j ) = 0. (14) 2 Project the solution to the constraint set D j : d j = Π Dj (u j ). (15)
24 Computation of u j Task: Solution: u j satisfies the linear equation ˆf t d j (u j ) = 0. (16) C j,t u j = b j,t e j,t + C j,t d j, (17) where for the {{C j,t } dα j=1, B t, {e j,t } dα j=1 } statistics
25 Computation of u j continued C j,t = B t = e j,t = t i=1 t i=1 t i=1 ( ) i ρ i α 2 i,j R dx dx (j = 1,...,d α ), (18) t ( ) i ρ i x i α T i = [b 1,t,..., b dα,t] R dx dα, (19) t ( ) i ρ i Dα i α i,j R dx (j = 1,...,d α ), (20) t where C j,t and i s are diagonal; i matrix O i (element j in the diagonal is 1 if j O i, and 0 otherwise).
26 Matrix recursion lemma Let N t R L 1 L 2 (t = 1, 2,...) be a given matrix series, γ t = ( 1 1 ) ρ, t ρ 0, the M t and M t matrix series be defined as M t = γ t M t 1 + N t R L 1 L 2 (t = 1, 2,...), (21) t ( ) i ρ M t = N i R L 1 L 2 t (t = 1, 2,...). (22) i=1 If ρ = 0, then M t = M 0 + M t M t = M t ( t 1). ( t 1). When ρ > 0, then
27 Computation of u j continued Matrix recursion lemma one can update C j,t and B t as C j,t = γ t C j,t 1 + t α 2 tj, (23) B t = γ t B t 1 + t x t α T t, (24) with C j,0 = 0, B 0 = 0 (ρ = 0), or arbitrary initialization (ρ > 0). Numerical experiences efficient online approximation for e j,t : e j,t = γ t e j,t 1 + t Dα t α t,j, (25) with the actual estimation D and initialization e j,0 = 0.
28 Special, fully observable case In this case ( i = I, i): that is C j,t = I e j,t = t i=1 t i=1 ( ) i ρ α 2 i,j t, B t = ( i t ) ρ Dα i α i,j = D D can be pulled out from e j,t s, and t i=1 t i=1 ( ) i ρ x i α T i, (26) t ( i t it is sufficient to maintain 2 statistics, B t and ) ρ α i α i,j, (27) A t = t i=1 ( ) i ρ α i α T i R dα dα. (28) t
Collaborative Filtering via Group-Structured Dictionary Learning
Collaborative Filtering via Group-Structured Dictionary Learning Zoltán Szabó 1, Barnabás Póczos 2, and András Lőrincz 1 1 Faculty of Informatics, Eötvös Loránd University, Pázmány Péter sétány 1/C, H-1117
More informationGroup-Structured and Independent Subspace Based Dictionary Learning
Group-Structured and Independent Subspace Based Dictionary Learning Zoltán Szabó Eötvös Loránd University Supervisor: András Lőrincz Senior Researcher, CSc Ph.D. School of Mathematics Miklós Laczkovich
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationIndependent Subspace Analysis
Independent Subspace Analysis Barnabás Póczos Supervisor: Dr. András Lőrincz Eötvös Loránd University Neural Information Processing Group Budapest, Hungary MPI, Tübingen, 24 July 2007. Independent Component
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationAutomated word puzzle generation using topic models and semantic relatedness measures
Automated word puzzle generation using topic models and semantic relatedness measures Balázs Pintér, Gyula Vörös, Zoltán Szabó and András Lőrincz ELTE IK 2012. 02. 11. Table of contents 1 Introduction
More informationConvex relaxation for Combinatorial Penalties
Convex relaxation for Combinatorial Penalties Guillaume Obozinski Equipe Imagine Laboratoire d Informatique Gaspard Monge Ecole des Ponts - ParisTech Joint work with Francis Bach Fête Parisienne in Computation,
More informationa Short Introduction
Collaborative Filtering in Recommender Systems: a Short Introduction Norm Matloff Dept. of Computer Science University of California, Davis matloff@cs.ucdavis.edu December 3, 2016 Abstract There is a strong
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013
Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations
More informationSignal Recovery on Graphs: Variation Minimization
1 Signal Recovery on Graphs: Variation Minimization Siheng Chen, Aliaksei Sandryhaila, José M. F. Moura, Jelena Kovačević arxiv:1411.7414v3 [cs.si] 9 May 15 Abstract We consider the problem of signal recovery
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationEE 381V: Large Scale Optimization Fall Lecture 24 April 11
EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 14 A Very Simple Model Suppose we have one feature
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net
More informationTopographic Dictionary Learning with Structured Sparsity
Topographic Dictionary Learning with Structured Sparsity Julien Mairal 1 Rodolphe Jenatton 2 Guillaume Obozinski 2 Francis Bach 2 1 UC Berkeley 2 INRIA - SIERRA Project-Team San Diego, Wavelets and Sparsity
More informationCross-Entropy Optimization for Independent Process Analysis
Cross-Entropy Optimization for Independent Process Analysis Zoltán Szabó, Barnabás Póczos, and András Lőrincz Department of Information Systems Eötvös Loránd University, Budapest, Hungary Research Group
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationConvolutional Dictionary Learning and Feature Design
1 Convolutional Dictionary Learning and Feature Design Lawrence Carin Duke University 16 September 214 1 1 Background 2 Convolutional Dictionary Learning 3 Hierarchical, Deep Architecture 4 Convolutional
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationNeural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology
Neural Networks and Machine Learning research at the Laboratory of Computer and Information Science, Helsinki University of Technology Erkki Oja Department of Computer Science Aalto University, Finland
More informationLecture 25: November 27
10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More informationMachine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014
Case Study 3: fmri Prediction Fused LASSO LARS Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, 2014 Emily Fox 2014 1 LASSO Regression
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationAdaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries
Adaptive Compressive Imaging Using Sparse Hierarchical Learned Dictionaries Jarvis Haupt University of Minnesota Department of Electrical and Computer Engineering Supported by Motivation New Agile Sensing
More informationThe FTRL Algorithm with Strongly Convex Regularizers
CSE599s, Spring 202, Online Learning Lecture 8-04/9/202 The FTRL Algorithm with Strongly Convex Regularizers Lecturer: Brandan McMahan Scribe: Tamara Bonaci Introduction In the last lecture, we talked
More informationAndriy Mnih and Ruslan Salakhutdinov
MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative
More informationSparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28
Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationAn Efficient Proximal Gradient Method for General Structured Sparse Learning
Journal of Machine Learning Research 11 (2010) Submitted 11/2010; Published An Efficient Proximal Gradient Method for General Structured Sparse Learning Xi Chen Qihang Lin Seyoung Kim Jaime G. Carbonell
More informationParcimonie en apprentissage statistique
Parcimonie en apprentissage statistique Guillaume Obozinski Ecole des Ponts - ParisTech Journée Parcimonie Fédération Charles Hermite, 23 Juin 2014 Parcimonie en apprentissage 1/44 Classical supervised
More informationImage processing and nonparametric regression
Image processing and nonparametric regression Rencontres R BoRdeaux 2012 B. Thieurmel Collaborators : P.A. Cornillon, N. Hengartner, E. Matzner-Løber, B. Wolhberg 2 Juillet 2012 Rencontres R BoRdeaux 2012
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationDeep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang Autoencoder Autoencoder Neural networks trained to attempt to copy its input to its output Contain
More informationClassification of Ordinal Data Using Neural Networks
Classification of Ordinal Data Using Neural Networks Joaquim Pinto da Costa and Jaime S. Cardoso 2 Faculdade Ciências Universidade Porto, Porto, Portugal jpcosta@fc.up.pt 2 Faculdade Engenharia Universidade
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationEUSIPCO
EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,
More informationSparse Approximation and Variable Selection
Sparse Approximation and Variable Selection Lorenzo Rosasco 9.520 Class 07 February 26, 2007 About this class Goal To introduce the problem of variable selection, discuss its connection to sparse approximation
More informationRecommender Systems. Dipanjan Das Language Technologies Institute Carnegie Mellon University. 20 November, 2007
Recommender Systems Dipanjan Das Language Technologies Institute Carnegie Mellon University 20 November, 2007 Today s Outline What are Recommender Systems? Two approaches Content Based Methods Collaborative
More informationRecommendation Systems
Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume
More informationProbabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie
More informationSmoothing Proximal Gradient Method. General Structured Sparse Regression
for General Structured Sparse Regression Xi Chen, Qihang Lin, Seyoung Kim, Jaime G. Carbonell, Eric P. Xing (Annals of Applied Statistics, 2012) Gatsby Unit, Tea Talk October 25, 2013 Outline Motivation:
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationSum-Product Networks: A New Deep Architecture
Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationRecent Advances in Structured Sparse Models
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured
More informationClassical Predictive Models
Laplace Max-margin Markov Networks Recent Advances in Learning SPARSE Structured I/O Models: models, algorithms, and applications Eric Xing epxing@cs.cmu.edu Machine Learning Dept./Language Technology
More informationLASSO Review, Fused LASSO, Parallel LASSO Solvers
Case Study 3: fmri Prediction LASSO Review, Fused LASSO, Parallel LASSO Solvers Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade May 3, 2016 Sham Kakade 2016 1 Variable
More informationSparsifying Transform Learning for Compressed Sensing MRI
Sparsifying Transform Learning for Compressed Sensing MRI Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and Coordinated Science Laborarory University of Illinois
More informationPredicting Workplace Incidents with Temporal Graph-guided Fused Lasso
Predicting Workplace Incidents with Temporal Graph-guided Fused Lasso Keerthiram Murugesan 1 and Jaime Carbonell 1 1 Language Technologies Institute Carnegie Mellon University Pittsburgh, USA CMU-LTI-15-??
More informationPre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models
Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationSubset selection with sparse matrices
Subset selection with sparse matrices Alberto Del Pia, University of Wisconsin-Madison Santanu S. Dey, Georgia Tech Robert Weismantel, ETH Zürich February 1, 018 Schloss Dagstuhl Subset selection for regression
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLasso, Ridge, and Elastic Net
Lasso, Ridge, and Elastic Net David Rosenberg New York University February 7, 2017 David Rosenberg (New York University) DS-GA 1003 February 7, 2017 1 / 29 Linearly Dependent Features Linearly Dependent
More informationBehavioral Data Mining. Lecture 7 Linear and Logistic Regression
Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationParameter Norm Penalties. Sargur N. Srihari
Parameter Norm Penalties Sargur N. srihari@cedar.buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationIntroduction p. 1 Fundamental Problems p. 2 Core of Fundamental Theory and General Mathematical Ideas p. 3 Classical Statistical Decision p.
Preface p. xiii Acknowledgment p. xix Introduction p. 1 Fundamental Problems p. 2 Core of Fundamental Theory and General Mathematical Ideas p. 3 Classical Statistical Decision p. 4 Bayes Decision p. 5
More informationVariables. Cho-Jui Hsieh The University of Texas at Austin. ICML workshop on Covariance Selection Beijing, China June 26, 2014
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin ICML workshop on Covariance Selection Beijing, China June 26, 2014 Joint work with M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack,
More informationCS246 Final Exam, Winter 2011
CS246 Final Exam, Winter 2011 1. Your name and student ID. Name:... Student ID:... 2. I agree to comply with Stanford Honor Code. Signature:... 3. There should be 17 numbered pages in this exam (including
More informationLow Rank Matrix Completion Formulation and Algorithm
1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9
More informationGeneralized Orthogonal Matching Pursuit- A Review and Some
Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents
More informationMatrix Factorization Techniques for Recommender Systems
Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationEfficient Data-Driven Learning of Sparse Signal Models and Its Applications
Efficient Data-Driven Learning of Sparse Signal Models and Its Applications Saiprasad Ravishankar Department of Electrical Engineering and Computer Science University of Michigan, Ann Arbor Dec 10, 2015
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationBig & Quic: Sparse Inverse Covariance Estimation for a Million Variables
for a Million Variables Cho-Jui Hsieh The University of Texas at Austin NIPS Lake Tahoe, Nevada Dec 8, 2013 Joint work with M. Sustik, I. Dhillon, P. Ravikumar and R. Poldrack FMRI Brain Analysis Goal:
More information