Some tensor decomposition methods for machine learning
|
|
- Maryann Flowers
- 5 years ago
- Views:
Transcription
1 Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August / 36
2 Outline Problem and motivation Tucker decomposition Two convex relaxations Applications Extensions 2 / 36
3 Problem Learning a tensor from a set of linear measurements Main example: Tensor completion Video denoising/completion Context-aware recommendation Multisensor data analysis Entities-relationships learning... 3 / 36
4 Multilinear multitask learning Tasks are associated with multiple indices, e.g. predict a rating given to different aspects of a restaurant by different critics Tasks regression vectors are vertical fibers of the tensor, e.g. (, food ) 4 / 36
5 0-Shot Transfer Learning Learning tasks for which no training instances are provided 5 / 36
6 Setting We consider a supervised learning setting in which we wish to learn a tensor from a set of measurements y = I (W) + ɛ where I : R d 1 d N R m a linear (sampling) operator, e.g. a subset of the tensor entries The number of parameters explodes with the order of the tensor, so regularization is key: minimize W E(W) + λr(w) for example E(W) = y I (W) 2 6 / 36
7 Matrix case The matrix case has been thoroughly studied, particularly focusing on spectral regularizers which encourage low rank matrices low rank matrix completion multitask feature learning Different notions of rank of a tensor! (some computationally intractable). Many are reviewed in [Kolda and Bader, 2009] Which ones are simple and effective? 7 / 36
8 Objectives We want some kind of guarantees statistical: bounds on the generalization / out-of-sample error optimization: convergence, rates function approximation We discuss two approaches based on: B. Romera-Paredes, H. Aung, N. Bianchi-Berthouze, M. Pontil. Multilinear multitask learning. 30th International Conference on Machine Learning, 2013 B. Romera-Paredes & M. Pontil. A new convex relaxation for tensor completion. Advances in Neural Information Processing Systems 26, / 36
9 Function approximation Learn function parametrized by a tensor from a sample Example: x = (x 1, x 2, x 3 ) V 1 V 2 V 3, three vector spaces f (x) = W, φ 1 (x 1 ) φ 2 (x 2 ) φ 3 (x 3 ), W R d 1 d 2 d 3 φ i : V i R d i feature maps min W m Loss(y i, f (x i )) + γω(w) i=1 Dual problem using kernel methods K(x, t) = K 1 (x 1, t 1 )K 2 (x 2, t 2 )K 3 (x 3, t 3 ) Matrix case: [Abernethy, Bach, Evgeniou, and Vert, JMLR 2009] Tensor case: [Signoretto, De Lathauwer, Suykens, Preprint 2013] 9 / 36
10 Basic operations Tensor W R d 1 d N, with entries W i1,...,i N Mode-n fiber is a vector in R dn formed by the elements of a tensor obtained by fixing all indices but the n-th one, e.g. in the above example W i1,i 2,: R d 3 is the (i 1, i 2 )-th regression vector 10 / 36
11 Basic operations (cont.) Matricization is the process of rearranging the tensor into a matrix Mode-n matricization W (n) R dn Jn, where J n = k n d k, is the matrix, the column of which are the mode-n fibers of W W (1) W (3) 11 / 36
12 Tucker rank TR(W) = (rank(w (1) ),..., rank(w (n) )) A natural regularizer associated to this is the sum of the ranks of the matricizations: R(W) := n rank(w (n) ) 12 / 36
13 Tucker decomposition meaning that W = G 1 A (1) N A (N) K 1 W i1,...,i N = j 1 =1 K N j N =1 K n d n (typically much smaller) G j1,...,j N A (1) i 1,j 1 A (N) i N,j N A (n) R dn Kn, n {1,..., N} are the factor matrices G R K 1 K N is called the core tensor and models the interaction between factors By construction rank(w (n) ) K n 13 / 36
14 Nonconvex approach The Tucker decomposition is invariant under multiplication and division of different factors by the same scalar. With the aim of avoiding this issue and reducing overfitting, we add Frobenius norm regularization terms to the components [ minimize E(G 1 A (1) N A (N) ) + α G,A (1),...,A (N) where α is a regularization parameter We solve this problem by alternate minimization G 2 F + Each step is detailed in the paper (also code available) ] N A (n) 2 F n=1 14 / 36
15 Convex approach An alternative approach is to relax the combinatorial problem argmin W E (W) + γ N rank ( ) W (n) n=1 The trace norm is a widely used convex surrogate for the rank. Therefore, we can consider the following convex relaxation: argmin E (W) + γ N W Tr (n) W n=1 Tensor completion: [Liu et al, 2009, Gandy et al, 2011, Signoretto et al, 2012] 15 / 36
16 Discussion In the Tucker decomposition the factors are explicit, hence we can add a new factor without relearning the full tensor ( N Space complexity: O n=1 d nk n + ) N n=1 K n which can be much smaller than that of the convex approach, particularly if K n d n Main drawback: no guarantee to find a local minimum GRR GTNR GMF TTNR TF Correlation m (Training Set Size) 16 / 36
17 Alternating Direction Method of Multipliers (ADMM) The regularizer N n=1 ADMM decouples the problem W (n) Tr is composite Introduce auxiliary tensors B n, n {1,..., N} accounting for each term in the sum, adding the constraints B n = W Optimize the resultant Lagrangian w.r.t. each B n only involves computing the proximal operator of the trace norm: 1 prox Tr (V ) = argmin 2 X V 2 F + X Tr X R d d 17 / 36
18 ADMM Want to minimize Decouple the regularization term { 1 min W,B 1,...,B N 1 N γ E(W) + Ψ ( ) W (n) γ E (W) + N n=1 n=1 Ψ ( } ) B n(n) : Bn = W, n = 1,...,N Augmented Lagrangian: L (W, B, C) = 1 γ E (W)+ N n=1 [ Ψ ( ) B n(n) Cn, W B n + β ] 2 W B n / 36
19 ADMM (cont.) L (W, B, C) = 1 γ E (W)+ N Updating equations: n=1 W [i+1] argmin W B [i+1] n argmin [ ( ) Ψ (B n(n) ) C n, W B n + β ] 2 W B n 2 2 B n C n [i+1] C n [i] ( L (W, B [i], C [i]) (W [i+1], B, C [i]) L βw [i+1] B [i+1] n ) 2nd step involves the computation of proximity operator of Ψ Convergence properties of ADMM are detailed e.g. in [Eckstein and Bertsekas, Mathematical Programming 1992] 19 / 36
20 Proximity Operator Let B = B n(n) and where A = (W 1 β C n) (n). Rewrite 2nd step as: { 1 ˆB = prox 1 Ψ(A) := argmin β B 2 B A } β Ψ(B) Case of interest: Ψ(B) = ψ(σ(b)) By von Neumann s matrix inequality: ( prox 1 β Ψ (A) = U Adiag prox 1 β ψ (σ A) ) V A 20 / 36
21 Rethinking the convex approach Convex envelope of a function f on a set S is the largest convex function f majorized by f at every point in S E.g: cardinality of a vector: f (v) = card(v) S = {v : v α} f (v) = v 1 /α The smaller S, the tighter the convex envelope card(v) card (v),α = 0.5 card (v),α = 1 card (v),α = In practice α is unknown and tuned by cross validation 21 / 36
22 Rethinking the convex approach The same reasoning applies to matrices W Tr /α is the convex envelope of rank(w) on the spectral unit ball of radius α [Fazel, Hindi, & Boyd, 2001] By using the regularizer N n=1 the same α for the different matricizations W (n) Tr we implicitly assume Difficulty with tensors: W (n) varies with n! 22 / 36
23 Rethinking the convex approach W (n) varies with n Let us consider W R Then: 0,1 0,0 1,0 Vectors of singular values of each matricization Smallest l ball containing each of the vectors Smallest l ball containing all vectors 23 / 36
24 Rethinking the convex approach We are interested in convex functions on matrices which are invariant to the matricization operation The Frobenius norm is very appealing: It is also a spectral function Therefore, we consider the set S = {W : W F α} Calculating the convex envelope of the rank on S can be reduced to calculating the convex envelope of card (v) on the set {v : v 2 α}, where v is the vector of singular values of W (follows by von Neumann s matrix inequality) 24 / 36
25 Convex envelope of the cardinality of a vector in the l 2 ball 0,1 0,0 1,0 Vectors of singular values of each matricization Smallest l ball containing each of the vectors Smallest l ball containing all vectors Smallest l 2 ball containing all vectors 25 / 36
26 Convex envelope of the cardinality of a vector in the l 2 ball Aim: derive convex envelope of f α (x) = card (x) on the set {x R d x 2 α} The conjugate of f α, s R d, is fα (s) = sup x s card (x) = max {α s 1:r 2 r} x B 2 r {0,...,d} Biconjugate of f α : fα (v) = sup s v fα (s), s R d v 2 α We do now know how to compute f. We can compute it and its proximal operator by projected sub-gradient descent 26 / 36
27 Quality of Relaxation (cont.) Lemma. If x 2 = α then ω α (x) = card (x). Let Ω α (W) = N ω α (σ(w (n) )), W tr = n=1 N σ(w (n) ) 1 n=1 Implication: Theorem. If W satisfies (a,b,c) below then Ω pmin (W) > W tr a) W (n) 1 n b) W 2 = p min c) min n rank(w (n) ) < max rank(w (n) ) n On the other hand, ω 1 is the convex envelope of card on l 2 unit ball, so: Ω 1 (W) W tr, W : W / 36
28 Experiments on tensor completion Video compression ( tensor) Exam score prediction ( tensor) Tensor Trace Norm Proposed Regularizer Tensor Trace Norm Proposed Regularizer RMSE RMSE m (Training Set Size) x Time comparison: Tensor Trace Norm Proposed Regularizer m (Training Set Size) 2000 Seconds p 28 / 36
29 Extensions and further work Latent tensor norm Tensor nuclear norm and an open problem Controlling the rank of all matricizations (quantum physics) Kernel methods Statistical analysis 29 / 36
30 Latent tensor nuclear norm Defined by the variational problem [Tomioka and Suzuki 2013, Wimalawarne et al 2014] { m m } W LNN = inf σ(v j ) 1 Mj V j = W. j=1 j=1 where M j is the adjoint of the matricization operator The associated unit ball is conv{w rank(w (n) ) 1, W (n) 1, n} 30 / 36
31 Latent tensor nuclear norm If we change the above to conv{w rank(w (n) ) k, W (n) p 1} the induced norm becomes [Combettes et al 2016] { m m } W LNN = inf σ(v j ) p,k Mj V j = W j=1 where p,k is the (p, k)-support norm [McDonald, Pontil, Stamos 2016]. Its unit ball is Ongoing experiments... j=1 conv{x R d card(x) k, x p 1} 31 / 36
32 Tensor nuclear norm Defined by W TNN,p = inf { λ p K k=1 λ k u (1) k u (N) k } = W where the infimum is over K N, λ = (λ 1,..., λ K ) R K and vectors u (n) k R dn such that u r (n) = 1, n, k Originally proposed and claimed to be a norm in [Lim & Comon, 2010], however [Friedland & Lim, 2016] shows this is well defined and a norm only when p = 1, but it is always zero if p > 0 disproving the claim in the former paper 32 / 36
33 Tensor nuclear norm [Friedland & Lim, 2016] also shows that computing W TNN is NP-hard. Assume for simplicity d 1 = = d N = d Claim 1. We may restrict w.l.o.g. K d N 1 Claim 2. We can rewrite the norm as { K N W TNN = inf where now the v (n) k k=1 n=1 v (n) k : K k=1 λ k v (1) k v (N) k } = W R d are not constrained to have unit norm 33 / 36
34 Tensor nuclear norm Claim 3. Let and ϕ(w) = E(W) + γ W TNN h(v (1),..., V (N) ) = ϕ(v (1) V (N) ) If (V (1),..., V (N) ) is a local minima of h then is a local minima of ϕ True for N = 2 (easy to show) W = V (1) V (N) 34 / 36
35 Controlling the rank of all matricizations Let c {1,..., } and W (i,ī) be the matricization obtained by taking the modes in c as rows and those in c as columns Fact: if max rank(w (c, c) ) k c {1,...,N} then W has a compact decomposition called matrix-product-state in quantum physics [Vidal 2003] W i1,...,i n = trace(a 1 i 1 A N i n ) where A n i are k k matrices, so we can describe the tensor by O(Nk 2 ) parameters. However, with this method we do not have control of max n rank(w (n)) If N 6 we may still obtain a latent norm convex relaxation 35 / 36
36 THANK YOU! 36 / 36
Mathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationSpectral k-support Norm Regularization
Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:
More informationApproximate Low-Rank Tensor Learning
Approximate Low-Rank Tensor Learning Yaoliang Yu Dept of Machine Learning Carnegie Mellon University yaoliang@cs.cmu.edu Hao Cheng Dept of Electric Engineering University of Washington kelvinwsch@gmail.com
More informationA New Convex Relaxation for Tensor Completion
A New Convex Relaxation for Tensor Completion Bernardino Romera-Paredes Department of Computer Science and UCL Interactive Centre University College London Malet Place, London WC1E 6BT, UK B.RomeraParedes@cs.ucl.ac.uk
More informationMulti-task Learning and Structured Sparsity
Multi-task Learning and Structured Sparsity Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London Massimiliano Pontil (UCL)
More informationMulti-task Learning and Structured Sparsity
Multi-task Learning and Structured Sparsity Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London Massimiliano Pontil (UCL)
More informationarxiv: v1 [cs.lg] 27 Dec 2015
New Perspectives on k-support and Cluster Norms New Perspectives on k-support and Cluster Norms arxiv:151.0804v1 [cs.lg] 7 Dec 015 Andrew M. McDonald Department of Computer Science University College London
More informationLearning Good Representations for Multiple Related Tasks
Learning Good Representations for Multiple Related Tasks Massimiliano Pontil Computer Science Dept UCL and ENSAE-CREST March 25, 2015 1 / 35 Plan Problem formulation and examples Analysis of multitask
More informationStatistical Performance of Convex Tensor Decomposition
Slides available: h-p://www.ibis.t.u tokyo.ac.jp/ryotat/tensor12kyoto.pdf Statistical Performance of Convex Tensor Decomposition Ryota Tomioka 2012/01/26 @ Kyoto University Perspectives in Informatics
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Massimiliano Pontil Department of Computer Science University College London (Joint work with A. Argyriou, T. Evgeniou, C.A. Micchelli,
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 08): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationA Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem
A Characterization of Sampling Patterns for Union of Low-Rank Subspaces Retrieval Problem Morteza Ashraphijuo Columbia University ashraphijuo@ee.columbia.edu Xiaodong Wang Columbia University wangx@ee.columbia.edu
More informationAn Algorithm for Transfer Learning in a Heterogeneous Environment
An Algorithm for Transfer Learning in a Heterogeneous Environment Andreas Argyriou 1, Andreas Maurer 2, and Massimiliano Pontil 1 1 Department of Computer Science University College London Malet Place,
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationTheoretical and Experimental Analyses of Tensor-Based Regression and Classification
Neural Computation, to appear. 1 Theoretical and Experimental Analyses of Tensor-Based Regression and Classification Kishan Wimalawarne Toyo Institute of Technology, Japan ishanwn@gmail.com Ryota Tomioa
More informationMatrix Support Functional and its Applications
Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationMulti-Task Learning and Matrix Regularization
Multi-Task Learning and Matrix Regularization Andreas Argyriou Department of Computer Science University College London Collaborators T. Evgeniou (INSEAD) R. Hauser (University of Oxford) M. Herbster (University
More informationAn Optimization-based Approach to Decentralized Assignability
2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract
More informationA Spectral Regularization Framework for Multi-Task Structure Learning
A Spectral Regularization Framework for Multi-Task Structure Learning Andreas Argyriou Department of Computer Science University College London Gower Street, London WC1E 6BT, UK a.argyriou@cs.ucl.ac.uk
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationGeneralized Conditional Gradient and Its Applications
Generalized Conditional Gradient and Its Applications Yaoliang Yu University of Alberta UBC Kelowna, 04/18/13 Y-L. Yu (UofA) GCG and Its Apps. UBC Kelowna, 04/18/13 1 / 25 1 Introduction 2 Generalized
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationFinal exam.
EE364b Convex Optimization II June 4 8, 205 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationTensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization
Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization Canyi Lu 1, Jiashi Feng 1, Yudong Chen, Wei Liu 3, Zhouchen Lin 4,5,, Shuicheng Yan 6,1
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationOWL to the rescue of LASSO
OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,
More informationTensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,
Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional
More informationarxiv: v3 [cs.lg] 25 May 2018
A Riemannian approach to trace norm regularized low-rank tensor completion arxiv:1712.01193v3 [cs.lg] 25 May 2018 Madhav Nimishakavi madhav@iisc.ac.in Pratik Jawanpuria pratik.jawanpuria@microsoft.com
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationFeature Learning for Multitask Learning Using l 2,1 Norm Minimization
Feature Learning for Multitask Learning Using l, Norm Minimization Priya Venkateshan Abstract We look at solving the task of Multitask Feature Learning by way of feature selection. We find that formulating
More informationMatrix Rank Minimization with Applications
Matrix Rank Minimization with Applications Maryam Fazel Haitham Hindi Stephen Boyd Information Systems Lab Electrical Engineering Department Stanford University 8/2001 ACC 01 Outline Rank Minimization
More informationStochastic dynamical modeling:
Stochastic dynamical modeling: Structured matrix completion of partially available statistics Armin Zare www-bcf.usc.edu/ arminzar Joint work with: Yongxin Chen Mihailo R. Jovanovic Tryphon T. Georgiou
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationOn Optimal Frame Conditioners
On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,
More informationEfficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Efficient Sparse Low-Rank Tensor Completion Using the Frank-Wolfe Algorithm Xiawei Guo, Quanming Yao, James T. Kwok
More informationScalable and Sound Low-Rank Tensor Learning
Scalable and Sound Low-Rank Tensor Learning Hao Cheng Yaoliang Yu Xinhua Zhang Eric Xing Dale Schuurmans University of Washington CMU NICTA and ANU University of Alberta Abstract Many real-world data arise
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationConvex sets, conic matrix factorizations and conic rank lower bounds
Convex sets, conic matrix factorizations and conic rank lower bounds Pablo A. Parrilo Laboratory for Information and Decision Systems Electrical Engineering and Computer Science Massachusetts Institute
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More informationSPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS
SPECTRAL THEOREM FOR COMPACT SELF-ADJOINT OPERATORS G. RAMESH Contents Introduction 1 1. Bounded Operators 1 1.3. Examples 3 2. Compact Operators 5 2.1. Properties 6 3. The Spectral Theorem 9 3.3. Self-adjoint
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationOslo Class 6 Sparsity based regularization
RegML2017@SIMULA Oslo Class 6 Sparsity based regularization Lorenzo Rosasco UNIGE-MIT-IIT May 4, 2017 Learning from data Possible only under assumptions regularization min Ê(w) + λr(w) w Smoothness Sparsity
More informationA Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations
A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationAn iterative hard thresholding estimator for low rank matrix recovery
An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical
More informationForward and Reverse Gradient-Based Hyperparameter Optimization
Forward and Reverse Gradient-Based Hyperparameter Optimization Luca Franceschi 1,2, Michele Donini 1, Paolo Frasconi 3, Massimiliano Pontil 1,2 1 Istituto Italiano di Tecnologia, Genoa, Italy 2 Dept of
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationContraction Methods for Convex Optimization and monotone variational inequalities No.12
XII - 1 Contraction Methods for Convex Optimization and monotone variational inequalities No.12 Linearized alternating direction methods of multipliers for separable convex programming Bingsheng He Department
More informationPRINCIPAL Component Analysis (PCA) is a fundamental approach
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Tensor Robust Principal Component Analysis with A New Tensor Nuclear Norm Canyi Lu, Jiashi Feng, Yudong Chen, Wei Liu, Member, IEEE, Zhouchen
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationLow-Rank Regression with Tensor Responses
Low-Rank Regression with Tensor Responses Guillaume Rabusseau and Hachem Kadri Aix Marseille Univ, CNRS, LIF, Marseille, France {firstname.lastname}@lif.univ-mrs.fr Abstract This paper proposes an efficient
More informationSignal Recovery from Permuted Observations
EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLow-Rank Matrix Recovery
ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery
More informationGeneralized Higher-Order Tensor Decomposition via Parallel ADMM
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Generalized Higher-Order Tensor Decomposition via Parallel ADMM Fanhua Shang 1, Yuanyuan Liu 2, James Cheng 1 1 Department of
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationRank minimization via the γ 2 norm
Rank minimization via the γ 2 norm Troy Lee Columbia University Adi Shraibman Weizmann Institute Rank Minimization Problem Consider the following problem min X rank(x) A i, X b i for i = 1,..., k Arises
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More information1 Non-negative Matrix Factorization (NMF)
2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,
More informationAn Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion
An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics
More informationESANN'2003 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2003, d-side publi., ISBN X, pp.
On different ensembles of kernel machines Michiko Yamana, Hiroyuki Nakahara, Massimiliano Pontil, and Shun-ichi Amari Λ Abstract. We study some ensembles of kernel machines. Each machine is first trained
More informationSupplemental Figures: Results for Various Color-image Completion
ANONYMOUS AUTHORS: SUPPLEMENTAL MATERIAL (NOVEMBER 7, 2017) 1 Supplemental Figures: Results for Various Color-image Completion Anonymous authors COMPARISON WITH VARIOUS METHODS IN COLOR-IMAGE COMPLETION
More informationDual Methods. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Dual Methods Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 1 Last time: proimal Newton method Consider the problem min g() + h() where g, h are conve, g is twice differentiable, and h is simple.
More informationThe Perceptron Algorithm, Margins
The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt
More informationRank Determination for Low-Rank Data Completion
Journal of Machine Learning Research 18 017) 1-9 Submitted 7/17; Revised 8/17; Published 9/17 Rank Determination for Low-Rank Data Completion Morteza Ashraphijuo Columbia University New York, NY 1007,
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationRelaxations and Randomized Methods for Nonconvex QCQPs
Relaxations and Randomized Methods for Nonconvex QCQPs Alexandre d Aspremont, Stephen Boyd EE392o, Stanford University Autumn, 2003 Introduction While some special classes of nonconvex problems can be
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationTropical decomposition of symmetric tensors
Tropical decomposition of symmetric tensors Melody Chan University of California, Berkeley mtchan@math.berkeley.edu December 11, 008 1 Introduction In [], Comon et al. give an algorithm for decomposing
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationDivide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates
: A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.
More informationON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS
ON HYPERPARAMETER OPTIMIZATION IN LEARNING SYSTEMS Luca Franceschi 1,2, Michele Donini 1, Paolo Frasconi 3, Massimiliano Pontil 1,2 (1) Istituto Italiano di Tecnologia, Genoa, 16163 Italy (2) Dept of Computer
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationRobust Tensor Completion and its Applications. Michael K. Ng Department of Mathematics Hong Kong Baptist University
Robust Tensor Completion and its Applications Michael K. Ng Department of Mathematics Hong Kong Baptist University mng@math.hkbu.edu.hk Example Low-dimensional Structure Data in many real applications
More information