StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory

Size: px
Start display at page:

Download "StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory"

Transcription

1 StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft October 9, 2012 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 1 / 41

2 Linear Support Vector Machines Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 2 / 41

3 Linear Support Vector Machines Binary Classification y i = +1 y i = 1 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41

4 Linear Support Vector Machines Binary Classification y i = +1 y i = 1 {x w, x = 0} S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 3 / 41

5 Linear Support Vector Machines Linear Support Vector Machines y i = +1 w, x 1 x 2 = 2 w w, x 1 x 2 = 2 w x 2 x 1 y i = 1 {x w, x = 1} {x w, x = 1} {x w, x = 0} S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 4 / 41

6 Linear Support Vector Machines Linear Support Vector Machines Optimization Problem 1 min w 2 w 2 + C m max (0, 1 y i w, x i ) i=1 Hinge Loss 3 Loss 2 1 Margin S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 5 / 41

7 Linear Support Vector Machines The Dual Objective min α D(α) := 1 2 s.t. α i α j y i y j xi, x j i,j 0 α i C i α i Convex and quadratic objective Linear (box) constraints w = i α iy i x i Each α i corresponds to an x i S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 6 / 41

8 Linear Support Vector Machines Coordinate Descent S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

9 Linear Support Vector Machines Coordinate Descent x y S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

10 Linear Support Vector Machines Coordinate Descent x S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

11 Linear Support Vector Machines Coordinate Descent x S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

12 Linear Support Vector Machines Coordinate Descent x y S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

13 Linear Support Vector Machines Coordinate Descent y S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

14 Linear Support Vector Machines Coordinate Descent y S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 7 / 41

15 Linear Support Vector Machines Coordinate Descent in the Dual One dimensional function ˆD(α t ) = α2 t 2 x t, x t + α t α i y i y t x i, x t α t + const. i t s.t. 0 α t C Solve ˆD(α t ) = α t x t, x t + i t α i y i y t x i, x t 1 = 0 ( α t = median 0, C, 1 i t α ) iy i y t x i, x t x t, x t S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 8 / 41

16 StreamSVM Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 9 / 41

17 StreamSVM We are Collecting Lots of Data... y i = +1 y i = 1 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

18 StreamSVM We are Collecting Lots of Data... y i = +1 y i = 1 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

19 StreamSVM We are Collecting Lots of Data... y i = +1 y i = 1 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

20 StreamSVM We are Collecting Lots of Data... y i = +1 y i = 1 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 10 / 41

21 StreamSVM What if Data Does not Fit in Memory? Idea 1: Block Minimization [Yu et al., KDD 2010] Split data into blocks B 1, B 2... such that B j fits in memory Compress and store each block separately Load one block of data at a time and optimize only those α i s Idea 2: Selective Block Minimization [Chang and Roth, KDD 2011] Split data into blocks B 1, B 2... such that B j fits in memory Compress and store each block separately Load one block of data at a time and optimize only those α i s Retain informative samples from each block in main memory S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 11 / 41

22 StreamSVM What are Informative Samples? S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 12 / 41

23 StreamSVM Some Observations SBM and BM are wasteful Both split data into blocks and compress the blocks This requires reading the entire data at least once (expensive) Both pause optimization while a block is loaded into memory Hardware 101 Disk I/O is slower than CPU (sometimes by a factor of 100) Random access on HDD is terrible sequential access is reasonably fast (factor of 10) Multi-core processors are becoming commonplace How can we exploit this? S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 13 / 41

24 StreamSVM Our Architecture Reader Trainer HDD RAM RAM Data Working Set Weight Vec S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 14 / 41

25 StreamSVM Our Philosophy Iterate over the data in main memory while streaming data from disk. Evict primarily examples from main memory that are uninformative. S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 15 / 41

26 StreamSVM Reader for k = 1,..., max_iter do for i = 1,..., n do if A = Ω then randomly select i A A = A \ {i } delete y i, Q ii, x i from RAM end if read y i, x i from Disk calculate Q ii = x i, x i store y i, Q ii, x i in RAM A = A {i} end for if stopping criterion is met then exit end if end for S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 16 / 41

27 StreamSVM Trainer α 1 = 0, w 1 = 0, ε = 9, ε new = 0, β = 0.9 while stopping criterion is not met do for t = 1,..., n do If A > 0.9 Ω then ε = βε randomly select i A and read y i, Q ii, x i from RAM compute i D := y i w t, x i 1 if (αi t = 0 and i D > ε) or (αi t = C and i D < ε) then A = A \ {i} and delete y i, Q ii, x i from RAM continue end if = median α t+1 i ( 0, C, α t i i D ε new = max(ε new, i D ) end for Update stopping criterion ε = ε new end while Q ii ), w t+1 = w t + (α t+1 αi t)y ix i S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 17 / 41 i

28 StreamSVM Proof of Convergence (Sketch) Definition (Luo and Tseng Problem) minimize α g(eα) + b α subject to L i α i U i α, b R n, E R d n, L i [, ), U i (, ] The above optimization problem is a Luo and Tseng problem if the following conditions hold: 1 E has no zero columns 2 the set of optimal solutions A is non-empty 3 the function g is strictly convex and twice cont. differentiable 4 for all optimal solutions α A, 2 g(eα ) is positive definite. S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 18 / 41

29 StreamSVM Proof of Convergence (Sketch) Lemma (see Theorem 1 of Hsieh et al., ICML 2008) If no data x i = 0, then the SVM dual is a Luo and Tseng problem. Definition (Almost cyclic rule) There exists an integer B n such that every coordinate is iterated upon at least once every B successive iterations. Theorem (Theorem 2.1 of Luo and Tseng, JOTA 1992) Let {α t } be a sequence of iterates generated by a coordinate descent method using the almost cyclic rule. The α t converges linearly to an element of A. S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 19 / 41

30 StreamSVM Proof of Convergence (Sketch) Lemma If the trainer accesses the cached training data sequentially, and the following conditions hold: The trainer is at most κ 1 times faster than the reading thread, that is, the trainer performs at most κ coordinate updates in the time that it takes the reader to read one training data from disk. A point is never evicted from the RAM unless the α i corresponding to that point has been updated. Then this confirms to the almost cyclic rule. S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 20 / 41

31 Experiments - SVM Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 21 / 41

32 Experiments - SVM Experiments dataset n d s(%) n + :n Datasize ocr 3.5 M GB dna 50 M e GB webspam-t 0.35 M M GB kddb M M 1e GB S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 22 / 41

33 Experiments - SVM Does Active Eviction Work? Relative Function Value Difference Linear SVM webspam-t C = 1.0 Random Active Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 23 / 41

34 Experiments - SVM Comparison with Block Minimization Relative Function Value Difference ocr C = 1.0 StreamSVM SBM BM Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

35 Experiments - SVM Comparison with Block Minimization Relative Function Value Difference webspam-t C = 1.0 StreamSVM SBM BM Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

36 Experiments - SVM Comparison with Block Minimization Relative Function Value Difference kddb C = 1.0 StreamSVM SBM BM Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

37 Experiments - SVM Comparison with Block Minimization Relative Objective Function Value dna C = 1.0 StreamSVM SBM BM Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 24 / 41

38 Experiments - SVM Effect of Varying C Relative Function Value Difference dna C = 10.0 StreamSVM SBM BM Wall Clock Time (sec) 10 5 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

39 Experiments - SVM Effect of Varying C Relative Function Value Difference dna C = StreamSVM SBM BM Wall Clock Time (sec) 10 5 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

40 Experiments - SVM Effect of Varying C Relative Objective Function Value dna C = StreamSVM SBM BM Wall Clock Time (sec) 10 5 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 25 / 41

41 Experiments - SVM Varying Cache Size Relative Function Value Difference kddb C = MB 1 GB 4 GB 16 GB Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 26 / 41

42 Experiments - SVM Expanding Features Relative Function Value Difference dna expanded C = GB 32GB Wall Clock Time (sec) 10 5 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 27 / 41

43 Logistic Regression Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 28 / 41

44 Logistic Regression Logistic Regression Optimization Problem 1 min w 2 w 2 + C m log (1 + exp ( y i w, x i )) i=1 Hinge Loss 3 Loss 2 Logistic Loss 1 Margin S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 29 / 41

45 Logistic Regression The Dual Objective min α D(α) := 1 2 s.t. α i α j y i y j xi, x j + α i log α i + (C α i ) log(c α i ) i,j 0 α i C i w = i α iy i x i Convex objective (but not quadratic) Box constraints S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 30 / 41

46 Logistic Regression Coordinate Descent One dimensional function ˆD(α t ) := α2 t 2 x t, x t + α t α i y i y t x i, x t i t s.t. + α t log α t + (C α t ) log(c α t ) 0 α t C Solve No closed form solution A modified Newton method does the job! See Yu, Huang, Lin 2011 for details. S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 31 / 41

47 Logistic Regression Determining Importance: SVM case Shrinking α t i = 0 and i D(α) > ε α t i = C and i D(α) < ε or S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 32 / 41

48 Logistic Regression Determining Importance: SVM case 1 Loss Gradient 0.5 Margin w loss(w, x i, y i ) = α i π i D(α) = 0 w, x i 1 ɛ (KKT condition) Computing ɛ ɛ = max i A π i D(α) S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 33 / 41

49 Logistic Regression Determining Importance: Logistic Regression 1 Loss Gradient Margin w loss(w, x i, y i ) α i i D(α) 0 w, x i ɛ (KKT condition) Computing ɛ ɛ = max i A i D(α) S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 34 / 41

50 Experiments - Logistic Regression Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 35 / 41

51 Experiments - Logistic Regression Does Active Eviction Work? Relative Function Value Difference Logistic Regression webspam-t C = 1.0 Random Active Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 36 / 41

52 Experiments - Logistic Regression Comparison with Block Minimization Relative Function Value Difference webspam-t C = 1.0 StreamSVM BM Wall Clock Time (sec) 10 4 S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 37 / 41

53 Conclusion Outline 1 Linear Support Vector Machines 2 StreamSVM 3 Experiments - SVM 4 Logistic Regression 5 Experiments - Logistic Regression 6 Conclusion S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 38 / 41

54 Conclusion Things I did not talk about/work in Progress StreamSVM Amortizing the cost of training different models Other storage Heirarchies (e.g. Solid State Drives) Extensions to other problems Multiclass problems Structured Output Prediction Distributed version of the algorithm S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 39 / 41

55 Conclusion References Shin Matsushima, S. V. N. Vishwanathan, and Alexander J Smola. Linear Support Vector Machines via Dual Cached Loops. KDD Code for StreamSVM available from streamsvm.html S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 40 / 41

56 Conclusion Joint work with Reading Thread Read (Sequential Access) Training Thread Load (Random Access) Read (Random Access) Update Disk Dataset RAM Cached Data (Working Set) RAM Weight Vector S.V. N. Vishwanathan (Purdue, Microsoft) StreamSVM 41 / 41

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM

Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM Lee, Ching-pei University of Illinois at Urbana-Champaign Joint work with Dan Roth ICML 2015 Outline Introduction Algorithm Experiments

More information

Incremental and Decremental Training for Linear Classification

Incremental and Decremental Training for Linear Classification Incremental and Decremental Training for Linear Classification Authors: Cheng-Hao Tsai, Chieh-Yen Lin, and Chih-Jen Lin Department of Computer Science National Taiwan University Presenter: Ching-Pei Lee

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Solving the SVM Optimization Problem

Solving the SVM Optimization Problem Solving the SVM Optimization Problem Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Matrix Factorization and Factorization Machines for Recommender Systems

Matrix Factorization and Factorization Machines for Recommender Systems Talk at SDM workshop on Machine Learning Methods on Recommender Systems, May 2, 215 Chih-Jen Lin (National Taiwan Univ.) 1 / 54 Matrix Factorization and Factorization Machines for Recommender Systems Chih-Jen

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Training Support Vector Machines: Status and Challenges

Training Support Vector Machines: Status and Challenges ICML Workshop on Large Scale Learning Challenge July 9, 2008 Chih-Jen Lin (National Taiwan Univ.) 1 / 34 Training Support Vector Machines: Status and Challenges Chih-Jen Lin Department of Computer Science

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

A Parallel SGD method with Strong Convergence

A Parallel SGD method with Strong Convergence A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Alvin Grissom II University of Colorado Boulder Slides adapted from Emily Fox Machine Learning: Alvin Grissom II Boulder Classification:

More information

Classification: Logistic Regression from Data

Classification: Logistic Regression from Data Classification: Logistic Regression from Data Machine Learning: Jordan Boyd-Graber University of Colorado Boulder LECTURE 3 Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber Boulder Classification:

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland LOGISTIC REGRESSION FROM TEXT Slides adapted from Emily Fox Machine Learning: Jordan Boyd-Graber UMD Introduction

More information

A Dual Coordinate Descent Method for Large-scale Linear SVM

A Dual Coordinate Descent Method for Large-scale Linear SVM Cho-Jui Hsieh b92085@csie.ntu.edu.tw Kai-Wei Chang b92084@csie.ntu.edu.tw Chih-Jen Lin cjlin@csie.ntu.edu.tw Department of Computer Science, National Taiwan University, Taipei 06, Taiwan S. Sathiya Keerthi

More information

Improvements to Platt s SMO Algorithm for SVM Classifier Design

Improvements to Platt s SMO Algorithm for SVM Classifier Design LETTER Communicated by John Platt Improvements to Platt s SMO Algorithm for SVM Classifier Design S. S. Keerthi Department of Mechanical and Production Engineering, National University of Singapore, Singapore-119260

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification

A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification JMLR: Workshop and Conference Proceedings 1 16 A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification Chih-Yang Hsia r04922021@ntu.edu.tw Dept. of Computer Science,

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from Jordan Boyd-Graber, Tom Mitchell, Ziv Bar-Joseph Machine Learning: Chenhao Tan Boulder 1 of 27 Quiz question For

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

A Dual Coordinate Descent Method for Large-scale Linear SVM

A Dual Coordinate Descent Method for Large-scale Linear SVM Cho-Jui Hsieh b92085@csie.ntu.edu.tw Kai-Wei Chang b92084@csie.ntu.edu.tw Chih-Jen Lin cjlin@csie.ntu.edu.tw Department of Computer Science, National Taiwan University, Taipei 106, Taiwan S. Sathiya Keerthi

More information

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System CA-SVM: Communication-Avoiding Support Vector Machines on Distributed System Yang You 1, James Demmel 1, Kent Czechowski 2, Le Song 2, Richard Vuduc 2 UC Berkeley 1, Georgia Tech 2 Yang You (Speaker) James

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

This ensures that we walk downhill. For fixed λ not even this may be the case.

This ensures that we walk downhill. For fixed λ not even this may be the case. Gradient Descent Objective Function Some differentiable function f : R n R. Gradient Descent Start with some x 0, i = 0 and learning rate λ repeat x i+1 = x i λ f(x i ) until f(x i+1 ) ɛ Line Search Variant

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 3: Linear Models I (LFD 3.2, 3.3) Cho-Jui Hsieh UC Davis Jan 17, 2018 Linear Regression (LFD 3.2) Regression Classification: Customer record Yes/No Regression: predicting

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer

More information

SMO Algorithms for Support Vector Machines without Bias Term

SMO Algorithms for Support Vector Machines without Bias Term Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002

More information

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM

Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Supplement: Distributed Box-constrained Quadratic Optimization for Dual Linear SVM Ching-pei Lee LEECHINGPEI@GMAIL.COM Dan Roth DANR@ILLINOIS.EDU University of Illinois at Urbana-Champaign, 201 N. Goodwin

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine

More information

Kernel Logistic Regression and the Import Vector Machine

Kernel Logistic Regression and the Import Vector Machine Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression

Behavioral Data Mining. Lecture 7 Linear and Logistic Regression Behavioral Data Mining Lecture 7 Linear and Logistic Regression Outline Linear Regression Regularization Logistic Regression Stochastic Gradient Fast Stochastic Methods Performance tips Linear Regression

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Mini-Batch Primal and Dual Methods for SVMs

Mini-Batch Primal and Dual Methods for SVMs Mini-Batch Primal and Dual Methods for SVMs Peter Richtárik School of Mathematics The University of Edinburgh Coauthors: M. Takáč (Edinburgh), A. Bijral and N. Srebro (both TTI at Chicago) arxiv:1303.2314

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Sequential Minimal Optimization (SMO)

Sequential Minimal Optimization (SMO) Data Science and Machine Intelligence Lab National Chiao Tung University May, 07 The SMO algorithm was proposed by John C. Platt in 998 and became the fastest quadratic programming optimization algorithm,

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning Convex Optimization Lecture 15 - Gradient Descent in Machine Learning Instructor: Yuanzhang Xiao University of Hawaii at Manoa Fall 2017 1 / 21 Today s Lecture 1 Motivation 2 Subgradient Method 3 Stochastic

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent

PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent Cho-Jui Hsieh Hsiang-Fu Yu Inderjit S. Dhillon Department of Computer Science, The University of Texas, Austin, TX 787, USA CJHSIEH@CS.UTEXAS.EDU ROFUYU@CS.UTEXAS.EDU INDERJIT@CS.UTEXAS.EDU Abstract Stochastic

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

A Distributed Solver for Kernelized SVM

A Distributed Solver for Kernelized SVM and A Distributed Solver for Kernelized Stanford ICME haoming@stanford.edu bzhe@stanford.edu June 3, 2015 Overview and 1 and 2 3 4 5 Support Vector Machines and A widely used supervised learning model,

More information

Large-scale Collaborative Ranking in Near-Linear Time

Large-scale Collaborative Ranking in Near-Linear Time Large-scale Collaborative Ranking in Near-Linear Time Liwei Wu Depts of Statistics and Computer Science UC Davis KDD 17, Halifax, Canada August 13-17, 2017 Joint work with Cho-Jui Hsieh and James Sharpnack

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information