Variational Functions of Gram Matrices: Convex Analysis and Applications

Size: px
Start display at page:

Download "Variational Functions of Gram Matrices: Convex Analysis and Applications"

Transcription

1 Variational Functions of Gram Matrices: Convex Analysis and Applications Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Lin Xiao (Microsoft Research) IMA Workshop on Resource Tradeoffs: Computation, Communication, Information May 216

2 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j

3 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j let X rx 1 X T X, x m s. pairwise inner products x T i x j are entries of Gram matrix Ω M pxq max MPM xxt X, My max MPM trpxmxt q

4 For vectors x 1,..., x m P R n and a compact set M Ă S m, define the variational Gram function (VGF) Ω M px 1,..., x m q max MPM mÿ i,j 1 M ij x T i x j let X rx 1 X T X, x m s. pairwise inner products x T i x j are entries of Gram matrix Ω M pxq max MPM xxt X, My max MPM trpxmxt q a.k.a support function of set M, at X T X (recall support function of set M: S M py q max MPM xy, My )

5 : examples box M tm : ĎM ij ď M ij ď Ď M ij u ΩpXq (...more on this later) max M ij ď Ď M ij mÿ i,j 1 M ij x T i x j mÿ i,j 1 ĎM ij x T i x j

6 : examples box M tm : ĎM ij ď M ij ď Ď M ij u ΩpXq (...more on this later) max M ij ď Ď M ij mÿ i,j 1 M ij x T i x j mÿ i,j 1 ĎM ij x T i x j box with n 1: Ωpxq x T Ď M x

7 Outline motivating applications convex analysis of VGFs: convexity, conjugate, subdifferential optimization algorithms for regularized loss minimization min X LpXq ` λωpxq application to a hierarchical classification problem

8 Applications a machine learning application: hierarchical classification vs flat classification three classes

9 Applications a machine learning application: hierarchical classification vs flat classification three classes

10 Applications a machine learning application: hierarchical classification vs flat classification three classes recursive labeling

11 Applications a machine learning application: hierarchical classification vs flat classification three classes recursive labeling x child K x parent classifiers of different levels use different features (or different combinations of features) classifiers desired to be orthogonal to parent classifiers (hierarchy via orthogonal transfer [Zhou,Xiao,Wu 11] ) encourage x l K x f and x b K x f Ωpx m, x f, x l, x b q w 1 x T l x f ` w 2 x T b x f other transfer learning methods e.g., [Cai, Hoffman 4; Dekel et al, 4]

12 Application: multitask learning learn multiple tasks simultaneously using shared information among tasks, e.g. structural assumptions on a matrix of classifiers X tasks» x 1 x 2 x 3 x 4 x 5 x 6 x 7 fi ˆ ˆ ffi ˆ ˆ ˆ ffi ffi ffi ˆ ˆ ffi ffi ffi ˆ ˆ ˆ ˆ ffi ffi ffi ˆ ˆ ˆ ˆ ˆ ffi ffi ffi ˆ ffi fl features

13 Promoting pairwise structure for x 1,..., x m P R n x T i x j s reveal information about relative orientations; can serve as a measure for properties such as orthogonality e.g., minimizing Ωpx 1,..., x m q mÿ i,j 1 ĎM ij x T i x j promotes pairwise orthogonality for certain pairs, specified by Ď M [Zhou,Xiao,Wu, 11] introduced this penalty for hierarchical classification.

14 Promoting pairwise structure Ωpx 1,..., x m q ÿ i,j Ď Mij x T i x j when is it convex? Theorem (Zhou,Xiao,Wu, 11) Ω is convex if Ď M ě, and Ă M, the comparison matrix of Ď M, is PSD. " Ď Mij i j ĂM ĎM ii i j condition is also necessary if n ě m 1.

15 Promoting pairwise structure Ωpx 1,..., x m q ÿ i,j Ď Mij x T i x j when is it convex? Theorem (Zhou,Xiao,Wu, 11) Ω is convex if Ď M ě, and Ă M, the comparison matrix of Ď M, is PSD. " Ď Mij i j ĂM ĎM ii i j condition is also necessary if n ě m 1. proof: brute-force (verify def. of convexity) question: when is a general VGF convex?

16 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q.

17 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q. corollary: when Ω is convex,? Ω is pointwise max of weighted Frobenius norms a ΩpXq }XM 1{2 } F max MPMXS`

18 Convexity given compact (not necessarily convex) set M, Theorem (Jalali, F., Xiao) ΩpXq max MPM trpxmxt q ΩpXq is convex, if and only if for every X there exists a PSD M P M satisfying ΩpXq trpxmx T q. corollary: when Ω is convex,? Ω is pointwise max of weighted Frobenius norms a ΩpXq }XM 1{2 } F max MPMXS` but when is the condition satisfied?

19 Convexity polytope: M convtm 1,..., M p u. let M eff be the smallest subset satisfying Theorem max MPM trpxmxt q max trpxmx T MPM eff If M is a polytope, Ω is convex if and only if M eff Ă S m`. gray: set M; red: maximal points w.r.t. PSD cone; green: M eff convexity test: check whether green vertices are PSD

20 Convexity examples for M tm : M ij ď M Ď ij u, ΩpXq ř Ď i,j M ij x T i x j M eff Ă tm : M ii M Ď ii, M ij ĎM ij for i j u if n ě m 1, M eff Ă S m` is equivalent to: comparison matrix of M Ď is PSD.

21 Convexity examples for M tm : M ij ď M Ď ij u, ΩpXq ř Ď i,j M ij x T i x j M eff Ă tm : M ii M Ď ii, M ij ĎM ij for i j u if n ě m 1, M eff Ă S m` is equivalent to: comparison matrix of M Ď is PSD.

22 Convexity squared norm }x} 2 for x P R m : convex VGF corresponding to M tuu T : }u} ď 1u

23 Convexity squared norm }x} 2 for x P R m : convex VGF corresponding to M tuu T : }u} ď 1u for M M : ř m i,j 1 pm ij{ Ď M ij q 2 ď 1 (, ΩpXq mÿ i,j 1 ĎM 2 ijpx T i x j q 2 1{2 ĎM ij ě ensures convexity (proof by examining M eff )

24 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM

25 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM " j * M Y 1 2 inf T trpcq : ľ, M P M M, C Y C and is semidefinite representable

26 Conjugate Function conjugate function of ΩpXq max MPM trpxmx T q is Ω py q 1 2 inf trpy M : Y T q : rangepy T q Ď rangepmq ( MPM " j * M Y 1 2 inf T trpcq : ľ, M P M M, C Y C and is semidefinite representable the dual norm (if M s invertible): a 2Ω pxq inf MPM }XM 1{2 } F special case: with M tm : αi ĺ M ĺ βi, trpmq γu, gives cluster norm defined by [Jacob, Bach,Vert 8]

27 Proximal Operator proximal operator: prox τh pxq arg min u hpuq ` 1 2τ }u x}2 2 prox τω pxq can be computed by an SDP arg min Y min M,C subject to use QR decomposition X QrR T X, st }Y X}2 2 ` τ trpcq j M Y T ľ, M P M Y C to get SDP of size 2m

28 Proximal Operator proximal operator: prox τh pxq arg min u hpuq ` 1 2τ }u x}2 2 prox τω pxq can be computed by an SDP arg min R min M,C subject to use QR decomposition X QrR T X, st }R RX}2 2 ` τ trpcq j M R T ľ, M P M R C to get SDP of size 2m prox τωp q pxq X prox τω p q pxq

29 Subdifferential ΩpXq max MPM trpxmxt q max MPM ÿ M ij x T i x j subdifferential: B ΩpXq 2 conv XM : M P M, trpxmx T q ΩpXq ( i,j example: for ΩpXq ř i,j Ď M ij x T i x j, B ΩpXq 2 convtxm : M ij Ď M ij signpx T i x j q if x i T x j, M ij ď Ď M ij otherwiseu ([Zhou et al 11] give just one subgradient)

30 Outline motivating applications convex analysis of VGFs: convexity, conjugate, subdifferential optimization algorithms for regularized loss minimization min X LpXq ` λωpxq application to a hierarchical classification problem

31 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc.

32 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc. with smooth loss (and prox cheap): iteratively update X ptq : X pt`1q prox γtω X ptq γ t LpX ptq q, t, 1, 2,..., γ t is step size

33 Regularized Loss Minimization J opt min X LpX ; dataq ` λ ΩpXq common losses: norm loss, Huber loss, hinge, logistic, etc. with smooth loss (and prox cheap): iteratively update X ptq : X pt`1q prox γtω X ptq γ t LpX ptq q, t, 1, 2,..., γ t is step size when LpXq is not smooth: subgradient-based methods; e.g. Regularized Dual Averaging [Xiao 11] convergence can be very slow

34 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q

35 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator

36 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator luckily, covers many important cases: norm loss, Huber loss, binary and muti-class hinge loss...

37 VGF with Structured Loss Functions exploit smooth variational representation of a VGF, J opt min X max MPM LpX ; dataq ` λ trpxmxt q and consider losses with nice representation (called Fenchel-type): LpXq max GPG xx, DpGqy ˆLpGq where ˆLp q is convex, G is compact, Dp q is a linear operator luckily, covers many important cases: norm loss, Huber loss, binary and muti-class hinge loss... then, J opt min X max MPM GPG smooth convex-concave saddle-point problem! xx, DpGqy ˆLpGq ` λ trpxmx T q

38 Mirror-Prox Algorithm J opt min X max MPM GPG xx, DpGqy ˆLpGq ` λ trpxmx T q Setup. find the saddle points of smooth convex-concave functions min xpx max ypy fpx, yq Mirror-prox [Nemirovski 4]: Op1{tq convergence Op1{t 2 q convergence if strongly convex useful if projection (or prox) is cheap often can preprocess to reduce to smaller dimension

39 Experiment: Text Categorization Experiment. Text Categorization for Reuters corpus volume 1: archive of manually categorized news stories. A part of the categories hierarchy: corporate funding ownership changes markets bonds acquisition loans asset transfer domestic markets privatization market share

40 Experiment: Text Categorization Experiment. Text Categorization for Reuters corpus volume 1: archive of manually categorized news stories. A part of the categories hierarchy: corporate funding ownership changes markets bonds acquisition loans asset transfer domestic markets privatization market share minimize X, ξ subject to 1 N Nÿ ξ s ` λωpxq s 1 x T i y s x T j y s ě 1 ξ P P A`pz P t1,..., Nu ξ s P t1,..., Nu where y s P R n are the samples, and z s P t1,..., mu are the labels, s 1,..., N.

41 sanity check: angles between pairs of classifiers left: flat classification; right: hierarchical classification red: pairs desired to be orthogonal

42 Experiment: Text Categorization objective function convergence rate Subgradient Method non-smooth, convex Op1{? tq Regularized Dual Averaging non-smooth, strongly cvx (σ) Oplnptq{σtq Mirror-prox smooth var. form, convex Op1{tq Mirror-prox smooth var. form, strongly convex Op1{t 2 q FlatMult HierMult Transfer TreeLoss Orthogonal Transfer 21.39p.29q 21.41p.29q 21.p.31q 26.32p.39q 17.46p.74q prediction error on test data (from [Zhou,Xiao, Wu 11])

43 Summary, future work VGFs: functions of Gram matrix, defined via weight set M unify special cases; lead to new functions convex analysis efficient algorithms future work: design M for different applications other applications: multitask learning (with clustered/diverse tasks); disjoint visual features;... Reference: A. Jalali, L. Xiao, M. Fazel, : Convex Analysis and Optimization, prelim draft available on faculty/washington.edu/mfazel

44 4 3 2 MP RDA Figure: Value of loss function vs iteration t for MP and RDA algorithms (log scale)

45 Figure: }X t X final } F vs iteration t for MP and RDA algorithms (log scale)

46 Summary, future work VGFs: functions of Gram matrix, defined via weight set M unify special cases; lead to new functions convex analysis: conjugate, subdifferential, prox efficient algorithms future work: design M for different applications other applications: multitask learning (with clustered or diverse sets of tasks); disjoint visial features (vision);... Reference: A. Jalali, L. Xiao, M. Fazel, : Convex Analysis and Optimization, from website: faculty/washington.edu/mfazel

Variational Gram Functions: Convex Analysis and Optimization

Variational Gram Functions: Convex Analysis and Optimization Variational Gram Functions: Convex Analysis and Optimization Amin Jalali Maryam Fazel Lin Xiao October 28, 2015 Abstract We propose a new class of convex penalty functions, called variational Gram functions

More information

m i,j=1 M ijx T i x j = m

m i,j=1 M ijx T i x j = m VARIATIONAL GRAM FUNCTIONS: CONVEX ANALYSIS AND OPTIMIZATION AMIN JALALI, MARYAM FAZEL, AND LIN XIAO Abstract. We introduce a new class of convex penalty functions, called variational Gram functions (VGFs),

More information

Matrix Support Functional and its Applications

Matrix Support Functional and its Applications Matrix Support Functional and its Applications James V Burke Mathematics, University of Washington Joint work with Yuan Gao (UW) and Tim Hoheisel (McGill), CORS, Banff 2016 June 1, 2016 Connections What

More information

Singular Value Decomposition and its. SVD and its Applications in Computer Vision

Singular Value Decomposition and its. SVD and its Applications in Computer Vision Singular Value Decomposition and its Applications in Computer Vision Subhashis Banerjee Department of Computer Science and Engineering IIT Delhi October 24, 2013 Overview Linear algebra basics Singular

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Proximal-Gradient Mark Schmidt University of British Columbia Winter 2018 Admin Auditting/registration forms: Pick up after class today. Assignment 1: 2 late days to hand in

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

MATH 680 Fall November 27, Homework 3

MATH 680 Fall November 27, Homework 3 MATH 680 Fall 208 November 27, 208 Homework 3 This homework is due on December 9 at :59pm. Provide both pdf, R files. Make an individual R file with proper comments for each sub-problem. Subgradients and

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

A Greedy Framework for First-Order Optimization

A Greedy Framework for First-Order Optimization A Greedy Framework for First-Order Optimization Jacob Steinhardt Department of Computer Science Stanford University Stanford, CA 94305 jsteinhardt@cs.stanford.edu Jonathan Huggins Department of EECS Massachusetts

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

1 Quantum states and von Neumann entropy

1 Quantum states and von Neumann entropy Lecture 9: Quantum entropy maximization CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: February 15, 2016 1 Quantum states and von Neumann entropy Recall that S sym n n

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Variational inequality formulation of chance-constrained games

Variational inequality formulation of chance-constrained games Variational inequality formulation of chance-constrained games Joint work with Vikas Singh from IIT Delhi Université Paris Sud XI Computational Management Science Conference Bergamo, Italy May, 2017 Outline

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

Distirbutional robustness, regularizing variance, and adversaries

Distirbutional robustness, regularizing variance, and adversaries Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

Low-Rank Matrix Recovery

Low-Rank Matrix Recovery ELE 538B: Mathematics of High-Dimensional Data Low-Rank Matrix Recovery Yuxin Chen Princeton University, Fall 2018 Outline Motivation Problem setup Nuclear norm minimization RIP and low-rank matrix recovery

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Is the test error unbiased for these programs?

Is the test error unbiased for these programs? Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning

Short Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 08: Sparsity Based Regularization. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 08: Sparsity Based Regularization Lorenzo Rosasco Learning algorithms so far ERM + explicit l 2 penalty 1 min w R d n n l(y

More information

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning Short Course Robust Optimization and 3. Optimization in Supervised EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 Outline Overview of Supervised models and variants

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Accelerated Training of Max-Margin Markov Networks with Kernels

Accelerated Training of Max-Margin Markov Networks with Kernels Accelerated Training of Max-Margin Markov Networks with Kernels Xinhua Zhang University of Alberta Alberta Innovates Centre for Machine Learning (AICML) Joint work with Ankan Saha (Univ. of Chicago) and

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Multi-Label Informed Latent Semantic Indexing

Multi-Label Informed Latent Semantic Indexing Multi-Label Informed Latent Semantic Indexing Shipeng Yu 12 Joint work with Kai Yu 1 and Volker Tresp 1 August 2005 1 Siemens Corporate Technology Department of Neural Computation 2 University of Munich

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Homework 2. Convex Optimization /36-725

Homework 2. Convex Optimization /36-725 Homework 2 Convex Optimization 0-725/36-725 Due Monday October 3 at 5:30pm submitted to Christoph Dann in Gates 803 Remember to a submit separate writeup for each problem, with your name at the top) Total:

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer Vicente L. Malave February 23, 2011 Outline Notation minimize a number of functions φ

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

Gaussian discriminant analysis Naive Bayes

Gaussian discriminant analysis Naive Bayes DM825 Introduction to Machine Learning Lecture 7 Gaussian discriminant analysis Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Outline 1. is 2. Multi-variate

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Lecture Note 5: Semidefinite Programming for Stability Analysis ECE7850: Hybrid Systems:Theory and Applications Lecture Note 5: Semidefinite Programming for Stability Analysis Wei Zhang Assistant Professor Department of Electrical and Computer Engineering Ohio State

More information

Ad Placement Strategies

Ad Placement Strategies Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m Convex analysts may give one of two

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

INTERPLAY BETWEEN C AND VON NEUMANN

INTERPLAY BETWEEN C AND VON NEUMANN INTERPLAY BETWEEN C AND VON NEUMANN ALGEBRAS Stuart White University of Glasgow 26 March 2013, British Mathematical Colloquium, University of Sheffield. C AND VON NEUMANN ALGEBRAS C -algebras Banach algebra

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Convex Optimization in Classification Problems

Convex Optimization in Classification Problems New Trends in Optimization and Computational Algorithms December 9 13, 2001 Convex Optimization in Classification Problems Laurent El Ghaoui Department of EECS, UC Berkeley elghaoui@eecs.berkeley.edu 1

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

LECTURE 13 LECTURE OUTLINE

LECTURE 13 LECTURE OUTLINE LECTURE 13 LECTURE OUTLINE Problem Structures Separable problems Integer/discrete problems Branch-and-bound Large sum problems Problems with many constraints Conic Programming Second Order Cone Programming

More information

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs

Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Adaptive discretization and first-order methods for nonsmooth inverse problems for PDEs Christian Clason Faculty of Mathematics, Universität Duisburg-Essen joint work with Barbara Kaltenbacher, Tuomo Valkonen,

More information

Lecture 3: Semidefinite Programming

Lecture 3: Semidefinite Programming Lecture 3: Semidefinite Programming Lecture Outline Part I: Semidefinite programming, examples, canonical form, and duality Part II: Strong Duality Failure Examples Part III: Conditions for strong duality

More information

Lecture 8 : Eigenvalues and Eigenvectors

Lecture 8 : Eigenvalues and Eigenvectors CPS290: Algorithmic Foundations of Data Science February 24, 2017 Lecture 8 : Eigenvalues and Eigenvectors Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Hermitian Matrices It is simpler to begin with

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

A Bahadur Representation of the Linear Support Vector Machine

A Bahadur Representation of the Linear Support Vector Machine A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Classification and Support Vector Machine

Classification and Support Vector Machine Classification and Support Vector Machine Yiyong Feng and Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) ELEC 5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline

More information

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2

NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 NOTES ON SOME EXERCISES OF LECTURE 5, MODULE 2 MARCO VITTURI Contents 1. Solution to exercise 5-2 1 2. Solution to exercise 5-3 2 3. Solution to exercise 5-7 4 4. Solution to exercise 5-8 6 5. Solution

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017

COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 COMS 4721: Machine Learning for Data Science Lecture 6, 2/2/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University UNDERDETERMINED LINEAR EQUATIONS We

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

More information

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014 Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Adversarial Surrogate Losses for Ordinal Regression

Adversarial Surrogate Losses for Ordinal Regression Adversarial Surrogate Losses for Ordinal Regression Rizal Fathony, Mohammad Bashiri, Brian D. Ziebart (Illinois at Chicago) August 22, 2018 Presented by: Gregory P. Spell Outline 1 Introduction Ordinal

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Online Convex Optimization

Online Convex Optimization Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Recovery of Simultaneously Structured Models using Convex Optimization

Recovery of Simultaneously Structured Models using Convex Optimization Recovery of Simultaneously Structured Models using Convex Optimization Maryam Fazel University of Washington Joint work with: Amin Jalali (UW), Samet Oymak and Babak Hassibi (Caltech) Yonina Eldar (Technion)

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION

I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION I P IANO : I NERTIAL P ROXIMAL A LGORITHM FOR N ON -C ONVEX O PTIMIZATION Peter Ochs University of Freiburg Germany 17.01.2017 joint work with: Thomas Brox and Thomas Pock c 2017 Peter Ochs ipiano c 1

More information

Dual Decomposition.

Dual Decomposition. 1/34 Dual Decomposition http://bicmr.pku.edu.cn/~wenzw/opt-2017-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/34 1 Conjugate function 2 introduction:

More information