Variational Functions of Gram Matrices: Convex Analysis and Applications
|
|
- Howard Carson
- 5 years ago
- Views:
Transcription
1 Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource Tradeoffs: Computation, Communication, Information May 216
2 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j
3 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j let X rx 1 X T X, x m s. pairwise inner products x T i x j are entries of Gram matrix Ω M pxq max MPM xxt X, My max MPM trpxmxt q
4 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j let X rx 1 X T X, x m s. pairwise inner products x T i x j are entries of Gram matrix Ω M pxq max MPM xxt X, My max MPM trpxmxt q a.k.a support function of set M, at X T X (recall support function of set M: S M py q max MPM xy, My )
5 : examples box M tm : ĎM ij ď M ij ď Ď M ij u ΩpXq (...more on this later) max M ij ď Ď M ij mÿ i,j 1 M ij x T i x j mÿ i,j 1 ĎM ij x T i x j
6 : examples box M tm : ĎM ij ď M ij ď Ď M ij u ΩpXq (...more on this later) max M ij ď Ď M ij mÿ i,j 1 M ij x T i x j mÿ i,j 1 ĎM ij x T i x j box with n 1: Ωpxq x T Ď M x
7 Outline motivating applications convex analysis of VGFs: convexity, conjugate, subdifferential optimization algorithms for regularized loss minimization min X LpXq ` λωpxq application to a hierarchical classification problem
8 Applications a machine learning application: hierarchical classification vs flat classification three classes
9 Applications a machine learning application: hierarchical classification vs flat classification three classes
10 Applications a machine learning application: hierarchical classification vs flat classification three classes recursive labeling
11 Applications a machine learning application: hierarchical classification vs flat classification three classes recursive labeling x child K x parent classifiers of different levels use different features (or different combinations of features) classifiers desired to be orthogonal to parent classifiers (hierarchy via orthogonal transfer [Zhou,Xiao,Wu 11] ) encourage x l K x f and x b K x f Ωpx m, x f, x l, x b q w 1 x T l x f ` w 2 x T b x f other transfer learning methods e.g., [Cai, Hoffman 4; Dekel et al, 4]
12 Application: multitask learning learn multiple tasks simultaneously using shared information among tasks, e.g. structural assumptions on a matrix of classifiers X tasks» x 1 x 2 x 3 x 4 x 5 x 6 x 7 fi ˆ ˆ ffi ˆ ˆ ˆ ffi ffi ffi ˆ ˆ ffi ffi ffi ˆ ˆ ˆ ˆ ffi ffi ffi ˆ ˆ ˆ ˆ ˆ ffi ffi ffi ˆ ffi fl features
13 Promoting pairwise structure for x 1,..., x m P R n x T i x j s reveal information about relative orientations; can serve as a measure for properties such as orthogonality e.g., minimizing Ωpx 1,..., x m q mÿ i,j 1 ĎM ij x T i x j promotes pairwise orthogonality for certain pairs, specified by Ď M [Zhou,Xiao,Wu, 11] introduced this penalty for hierarchical classification.
14 Promoting pairwise structure Ωpx 1,..., x m q ÿ i,j Ď Mij x T i x j when is it convex? Theorem (Zhou,Xiao,Wu, 11) Ω is convex if Ď M ě, and Ă M, the comparison matrix of Ď M, is PSD. " Ď Mij i j ĂM ĎM ii i j condition is also necessary if n ě m 1.
15 Promoting pairwise structure Ωpx 1,..., x m q ÿ i,j Ď Mij x T i x j when is it convex? Theorem (Zhou,Xiao,Wu, 11) Ω is convex if Ď M ě, and Ă M, the comparison matrix of Ď M, is PSD. " Ď Mij i j ĂM ĎM ii i j condition is also necessary if n ě m 1. proof: brute-force (verify def. of convexity) question: when is a general VGF convex?
16 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q.
17 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q. corollary: when Ω is convex,? Ω is pointwise max of weighted Frobenius norms a ΩpXq }XM 1{2 } F max MPMXS`
18 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q. corollary: when Ω is convex,? Ω is pointwise max of weighted Frobenius norms a ΩpXq }XM 1{2 } F max MPMXS` but when is the condition satisfied?
19 Convexity polytope: M convtm 1,..., M p u. let M eff be the smallest subset satisfying Theorem max MPM trpxmxt q max trpxmx T MPM eff If M is a polytope, Ω is convex if and only if M eff Ă S m`. gray: set M; red: maximal points w.r.t. PSD cone; green: M eff convexity test: check whether green vertices are PSD
20 Convexity examples for M tm : M ij ď M Ď ij u, ΩpXq ř Ď i,j M ij x T i x j M eff Ă tm : M ii M Ď ii, M ij ĎM ij for i j u if n ě m 1, M eff Ă S m` is equivalent to: comparison matrix of M Ď is PSD.
21 Convexity examples for M tm : M ij ď M Ď ij u, ΩpXq ř Ď i,j M ij x T i x j M eff Ă tm : M ii M Ď ii, M ij ĎM ij for i j u if n ě m 1, M eff Ă S m` is equivalent to: comparison matrix of M Ď is PSD.
22 Convexity squared norm }x} 2 for x P R m : convex VGF corresponding to M tuu T : }u} ď 1u
23 Convexity squared norm }x} 2 for x P R m : convex VGF corresponding to M tuu T : }u} ď 1u for M M : ř m i,j 1 pm ij{ Ď M ij q 2 ď 1 (, ΩpXq mÿ i,j 1 ĎM 2 ijpx T i x j q 2 1{2 ĎM ij ě ensures convexity (proof by examining M eff )
24 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM
25 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM " j * M Y 1 2 inf T trpcq : ľ, M P M M, C Y C and is semidefinite representable
26 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM " j * M Y 1 2 inf T trpcq : ľ, M P M M, C Y C and is semidefinite representable the dual norm (if M s invertible): a 2Ω pxq inf MPM }XM 1{2 } F special case: with M tm : αi ĺ M ĺ βi, trpmq γu, gives cluster norm defined by [Jacob, Bach,Vert 8]
27 Proximal Operator proximal operator: prox τh pxq arg min u hpuq ` 1 2τ }u x}2 2 prox τω pxq can be computed by an SDP arg min Y min M,C subject to use QR decomposition X QrR T X, st }Y X}2 2 ` τ trpcq j M Y T ľ, M P M Y C to get SDP of size 2m
28 Proximal Operator proximal operator: prox τh pxq arg min u hpuq ` 1 2τ }u x}2 2 prox τω pxq can be computed by an SDP arg min R min M,C subject to use QR decomposition X QrR T X, st }R RX}2 2 ` τ trpcq j M R T ľ, M P M R C to get SDP of size 2m prox τωp q pxq X prox τω p q pxq
29 Subdifferential ΩpXq max MPM trpxmxt q max MPM ÿ M ij x T i x j subdifferential: B ΩpXq 2 conv XM : M P M, trpxmx T q ΩpXq ( i,j example: for ΩpXq ř i,j Ď M ij x T i x j, B ΩpXq 2 convtxm : M ij Ď M ij signpx T i x j q if x i T x j, M ij ď Ď M ij otherwiseu ([Zhou et al 11] give just one subgradient)
30 Outline motivating applications convex analysis of VGFs: convexity, conjugate, subdifferential optimization algorithms for regularized loss minimization min X LpXq ` λωpxq application to a hierarchical classification problem
31 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc.
32 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc. with smooth loss (and prox cheap): iteratively update X ptq : X pt`1q prox γtω X ptq γ t LpX ptq q, t, 1, 2,..., γ t is step size
33 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc. with smooth loss (and prox cheap): iteratively update X ptq : X pt`1q prox γtω X ptq γ t LpX ptq q, t, 1, 2,..., γ t is step size when LpXq is not smooth: subgradient-based methods; e.g. Regularized Dual Averaging [Xiao 11] convergence can be very slow
34 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q
35 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator
36 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator luckily, covers many important cases: norm loss, Huber loss, binary and muti-class hinge loss...
37 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator luckily, covers many important cases: norm loss, Huber loss, binary and muti-class hinge loss... then, J opt min X max MPM GPG smooth convex-concave saddle-point problem! xx, DpGqy ˆLpGq ` λ trpxmx T q
38 Mirror-Prox Algorithm J opt min X max MPM GPG xx, DpGqy ˆLpGq ` λ trpxmx T q Setup. find the saddle points of smooth convex-concave functions min xpx max ypy fpx, yq Mirror-prox [Nemirovski 4]: Op1{tq convergence Op1{t 2 q convergence if strongly convex useful if projection (or prox) is cheap often can preprocess to reduce to smaller dimension
39 Experiment: Text Categorization Experiment. Text Categorization for Reuters corpus volume 1: archive of manually categorized news stories. A part of the categories hierarchy: corporate funding ownership changes markets bonds acquisition loans asset transfer domestic markets privatization market share
40 Experiment: Text Categorization Experiment. Text Categorization for Reuters corpus volume 1: archive of manually categorized news stories. A part of the categories hierarchy: corporate funding ownership changes markets bonds acquisition loans asset transfer domestic markets privatization market share minimize X, ξ subject to 1 N Nÿ ξ s ` λωpxq s 1 x T i y s x T j y s ě 1 ξ P P A`pz P t1,..., Nu ξ s P t1,..., Nu where y s P R n are the samples, and z s P t1,..., mu are the labels, s 1,..., N.
41 sanity check: angles between pairs of classifiers left: flat classification; right: hierarchical classification red: pairs desired to be orthogonal
42 Experiment: Text Categorization objective function convergence rate Subgradient Method non-smooth, convex Op1{? tq Regularized Dual Averaging non-smooth, strongly cvx (σ) Oplnptq{σtq Mirror-prox smooth var. form, convex Op1{tq Mirror-prox smooth var. form, strongly convex Op1{t 2 q FlatMult HierMult Transfer TreeLoss Orthogonal Transfer 21.39p.29q 21.41p.29q 21.p.31q 26.32p.39q 17.46p.74q prediction error on test data (from [Zhou,Xiao, Wu 11])
43 Summary, future work VGFs: functions of Gram matrix, defined via weight set M unify special cases; lead to new functions convex analysis efficient algorithms future work: design M for different applications other applications: multitask learning (with clustered/diverse tasks); disjoint visual features;... Reference: A. Jalali, L. Xiao, M. Fazel, : Convex Analysis and Optimization, prelim draft available on faculty/washington.edu/mfazel
44 4 3 2 MP RDA Figure: Value of loss function vs iteration t for MP and RDA algorithms (log scale)
45 Figure: }X t X final } F vs iteration t for MP and RDA algorithms (log scale)
46 Summary, future work VGFs: functions of Gram matrix, defined via weight set M unify special cases; lead to new functions convex analysis: conjugate, subdifferential, prox efficient algorithms future work: design M for different applications other applications: multitask learning (with clustered or diverse sets of tasks); disjoint visial features (vision);... Reference: A. Jalali, L. Xiao, M. Fazel, : Convex Analysis and Optimization, from website: faculty/washington.edu/mfazel
Variational Gram Functions: Convex Analysis and Optimization
Variational Gram Functions: Convex Analysis and Optimization Amin Jalali Maryam Fazel Lin Xiao October 28, 2015 Abstract We propose a new class of convex penalty functions, called variational Gram functions
More informationm i,j=1 M ijx T i x j = m
VARIATIONAL GRAM FUNCTIONS: CONVEX ANALYSIS AND OPTIMIZATION AMIN JALALI, MARYAM FAZEL, AND LIN XIAO Abstract. We introduce a new class of convex penalty functions, called variational Gram functions (VGFs),
More informationMatrix Support Functional and its Applications
Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What
More informationSingular Value Decomposition and its. SVD and its Applications in Computer Vision
Singular Value Decomposition and its Applications in Computer Vision Subhashis Banerjee Department of Computer Science and Engineering IIT Delhi October 24, 2013 Overview Linear algebra basics Singular
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationMATH 680 Fall November 27, Homework 3
MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and
More informationConvex Optimization on Large-Scale Domains Given by Linear Minimization Oracles
Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationSemidefinite Programming Basics and Applications
Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent
More informationA Greedy Framework for First-Order Optimization
A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More information1 Quantum states and von Neumann entropy
Lecture 9: Quantum entropy maximization CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: February 15, 2016 1 Quantum states and von Neumann entropy Recall that S sym n n
More informationMaximum Margin Matrix Factorization
Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola
More informationSVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels
SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationRandomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity
Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationSparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations
Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More informationVariational inequality formulation of chance-constrained games
Variational inequality formulation of chance-constrained games Joint work with Vikas Singh from IIT Delhi Université Paris Sud XI Computational Management Science Conference Bergamo, Italy May, 2017 Outline
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationMulti-stage convex relaxation approach for low-rank structured PSD matrix recovery
Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationLow-Rank Matrix Recovery
ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationShort Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning
Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationConditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint
Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y
More informationShort Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning
Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationAccelerated Training of Max-Margin Markov Networks with Kernels
Accelerated Training of Max-Margin Markov Networks with Kernels Xinhua Zhang University of Alberta Alberta Innovates Centre for Machine Learning (AICML) Joint work with Ankan Saha (Univ. of Chicago) and
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationMulti-Label Informed Latent Semantic Indexing
Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationHomework 2. Convex Optimization /36-725
Homework 2 Convex Optimization 0-725/36-725 Due Monday October 3 at 5:30pm submitted to Christoph Dann in Gates 803 Remember to a submit separate writeup for each problem, with your name at the top) Total:
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationAdaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,
More informationGaussian discriminant analysis Naive Bayes
DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods
More informationLecture Note 5: Semidefinite Programming for Stability Analysis
ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationConvex analysis and profit/cost/support functions
Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m Convex analysts may give one of two
More informationSome Properties of the Augmented Lagrangian in Cone Constrained Optimization
MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented
More informationDual and primal-dual methods
ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationINTERPLAY BETWEEN C AND VON NEUMANN
INTERPLAY BETWEEN C AND VON NEUMANN ALGEBRAS Stuart White University of Glasgow 26 March 2013, British Mathematical Colloquium, University of Sheffield. C AND VON NEUMANN ALGEBRAS C -algebras Banach algebra
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationConvex Optimization in Classification Problems
New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationLECTURE 13 LECTURE OUTLINE
LECTURE 13 LECTURE OUTLINE Problem Structures Separable problems Integer/discrete problems Branch-and-bound Large sum problems Problems with many constraints Conic Programming Second Order Cone Programming
More informationAdaptive discretization and first-order methods for nonsmooth inverse problems for PDEs
Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,
More informationLecture 3: Semidefinite Programming
Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming, examples, canonical form, and duality Part II: Strong Duality Failure Examples Part III: Conditions for strong duality
More informationLecture 8 : Eigenvalues and Eigenvectors
CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationA Bahadur Representation of the Linear Support Vector Machine
A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationClassification and Support Vector Machine
Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline
More informationNOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2
NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution
More informationCS60021: Scalable Data Mining. Large Scale Machine Learning
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationCOMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017
COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We
More informationApplication: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth
Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced
More informationConvex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms
Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationStochastic Composition Optimization
Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationAdversarial Surrogate Losses for Ordinal Regression
Adversarial Surrogate Losses for Ordinal Regression Rizal Fathony, Mohammad Bashiri, Brian D. Ziebart (Illinois at Chicago) August 22, 2018 Presented by: Gregory P. Spell Outline 1 Introduction Ordinal
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationRecovery of Simultaneously Structured Models using Convex Optimization
Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)
More informationLearning Task Grouping and Overlap in Multi-Task Learning
Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationI P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION
I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1
More informationDual Decomposition.
1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:
More information