Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent

Size: px
Start display at page:

Download "Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent"

Transcription

1 Journal of Computational Information Systems 9: 15 (2013) Available at Large Scale Semi-supervised Linear SVM with Stochastic Gradient Descent Xin ZHOU, Conghui ZHU, Sheng LI, Mo YU School of Computer Science and Technology, Harbin Institute of Technology, Harbin , China Abstract Semi-supervised learning tries to employ a large collection of unlabeled data and a few labeled examples for improving generalization performance, which has been proved meaningful in real-world applications. The bottleneck of exiting semi-supervised approaches lies in over long training time due to the large scale unlabeled data. In this article we introduce a novel method for semi-supervised linear support vector machine based on average stochastic gradient descent, which significantly enhances the training speed of S3VM over existing toolkits, such as SVMlight-TSVM, CCCP-TSVM and SVMlin. We evaluate our method on text categorization and sentiment classification respectively, which indicates its efficiency on large scale semi-supervised tasks. Keywords: Semi-supervised Learning; Stochastic Gradient Descent; Support Vector Machines 1 Introduction Supervised learning is employed in many real-world tasks, such as text categorization, web page classification, spam mail detection and image processing. However, their performance depends on the scale of labeled data. While in some cases, such as image processing, biology, natural language processing, annotations are extremely exhausting, while unlabeled data can be relatively easier to obtain. In order to utilize additional unlabeled data, semi-supervised learning was introduced. The goal of semi-supervised learning is to improve generalization performance by using labeled data and large amount of unlabeled data. Most methods current used include: EM with generative mixture models, self-training, cotraining, transductive support vector machines (TSVMs) and graph-based methods. Generally, the bottleneck of all the approaches lies in over long training time, which cannot meet the training requirement on large scale data. So far few work has utilized ultra-large-scale unlabeled data, because of unacceptable training time. Project supported by National High Technology Research and Development Program of China (863 Program) (No. 2011AA01A207) and the National Nature Science Foundation of China (No ) and (No ). Corresponding author. address: zhouxin@mtlab.hit.edu.cn (Xin ZHOU) / Copyright 2013 Binary Information Press DOI: /jcisP0590 August 1, 2013

2 6252 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) This article aims at reducing the training time of semi-supervised learning. In this article, we take TSVMs as an example. Semi-supervised support vector machines (S3VMs), also called TSVMs [1], are a method of improving the generalization accuracy of SVMs [2] by using unlabeled data. Stochastic gradient descent (SGD) shows amazing performance for large-scale problems in supervised learning [3], we introduce a large scale training method for semi-supervised support vector machine using stochastic gradient descent. Our method reduces training time significantly, and achieves better performance. 2 Related Work S3VM [1] is an extension of standard support vector machines with unlabeled data. The goal is to find labels of the unlabeled data, so that a linear boundary with the maximum margin still can be used on both the original labeled data and the unlabeled data (now labeled). However, to find an exact transductive SVM solution is an NP-hard problem. Major effort has focused on efficient approximation algorithms. At the beginning, method [4] can t deal with more than a few hundred unlabeled points. Joachims [5] proposed a heuristic optimization algorithm implemented in SVMlight-TSVM, which improves the object function by iteratively switching the labels of two unlabeled points. However, too many iterations needed to reach the minimum makes it intractable for large scale problems in practice. De Bie and Cristianini [6] regard the transductive learning problem as semi-definite programming (SDP). Xu and Schuurmans [7] present a similar multi-class version of SDP formulation, which results in multi-class SVM for semi-supervised learning. But the computational cost is still expensive. TSVM proposed by Chapelle and Zien [8] performs gradient search in the primal space. The overall worst complexity of is O((L + U) 3 ), which is still not suitable for large scale data set. Collobert et al. [9] optimize the hard TSVM directly, using an approximate optimization procedure known as concave-convex procedure (CCCP) which significantly speeds up the training time. Sindhwani and Keerthi [10] proposed a fast algorithm for linear S3VMs, which uses multiple switching trick implemented in SVMlin, suitable for large scale text applications. Stochastic gradient descent (SGD), also referred to as stochastic approximation algorithms [11] has been shown to have great promise for large scale learning. In the stochastic approximation literature, the averaging technique comes with great theoretical promises [12]. Averaged stochastic gradient descent (ASGD) was proved that it can get good enough result in one pass through all data [13]. 3 Semi-supervised SVM with Stochastic Gradient Descent 3.1 Semi-supervised support vector machines In semi-supervised learning, a training data set can be viewed as a labeled set {(x i, y i ) 1 i L}, x R n, y {1, 1} and an unlabeled set {x i L + 1 i L + U}.

3 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) SVMs have a decision function: f θ (x) = ω Φ(x) + b (1) where θ=(ω, b) are the parameters of the model, and Φ( ) is the chosen feature map. The Semi-supervised SVM (S3VM), also called TSVM, was first introduced by Vapnik [1] and implemented by different algorithms. S3VM was based on the cluster assumption: the examples in the same cluster should have the same class level, which is also the key to other successful semisupervised learning methods. The assumption implies that the classifiers should avoid putting their decision boundaries through high density regions [8]. S3VM achieved this goal by making the margin (distance) between the examples and the decision boundary of a classifier as far as possible. The idea is to find an SVM separating the training set under constraints which force the unlabeled examples to be as far as possible from the margin. This is encoded with the following optimization problem: Subject to: Let and min ω,b 1 2 ω2 + C L ξ i + C i=1 y i f θ (x i ) 1 ξ i, L+U i=l+1 ξ i i = 1,..., L f θ (x i ) 1 ξ i, i = L + 1,..., L + U ξ i 0, i = 1,, L + U l(y i f θ (x i )) = [1 y i f θ (x i )] + = ξ i, ξ 0, i = 1,, L l ( f θ (x i ) ) = [1 f θ (x i ) ] + = ξ i, ξ 0, i = L + 1,, L + U Then the problem above can be proved to be equivalent to min ω,b 1 2 ω2 + C L l(y i f θ (x i )) + C i=1 L+U i=l+1 l ( f θ (x i ) ) (2) l( ) and l ( ) are loss functions for S3VMs. These functions are typically chosen to be hinge loss function (3) and the symmetric hinge loss function (4) respectively according to Joachims [5]. H 1 (t) = max(0, 1 t) (3) H 2 ( t ) = max(0, 1 t ) (4) Chapelle and Zien [8] proposed a smooth version of hinge loss (5) for unlabeled data. Then Ramp Loss function (6) was used for unlabeled examples by Collobert et al. [9] and show good performance. S(t) = exp( 3t 2 ) (5) R s (t) = min(1 s, max(0, 1 t)) (6) Here we also use Ramp Loss for unlabeled data and chose s=0.3 to be same as Collobert et al. [9]. However we will use Logistic Loss (7) for labeled examples since Logistic Loss is suitable for large sparse data. Logloss(t) = log(1 + exp(t)) (7)

4 6254 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) Now we can rewrite (2) as min ω,b 1 2 ω2 + C L Logloss(y i f θ (x i )) + C i=1 L+U i=l+1 R s ( f θ (x i ) ) (8) This is the minimization problem we now consider in the rest of the paper. 3.2 Stochastic gradient descent algorithms Stochastic gradient descent (SGD), also referred to as stochastic approximation algorithms [11], has been extensively applied to many machine learning schemes, like support vector machines, neural networks. SGD updates the weight vector ω in the online setting. The standard SGD algorithm is shown as follow. (1) Initialize ω 0. (2) for t=1, 2,... a. Draw z t = (x t, y t ) randomly from D. b. ω t+1 = ω t η t ω Q(z t, ω t ). The stochastic gradient descent algorithms are particularly suitable for large scale applications, where the number of data points and the problem dimensionality are both large. Large scale experiments with stochastic gradient descent achieved good performance [3]. Additionally, the optimal prediction performance can be achieved with only a small number of iterations over the training data. In order to accelerate the convergence speed of SGD, averaged stochastic gradient descent (ASGD) was proposed by Polyak and Juditsky [12]. ASGD performs the normal stochastic gradient updating of ω t just like the standard SGD and recursively computes the average ω t as (9) ω t+1 = ω t η t ω Q(z t, ω t ) ω t+1 = t t + 1 ω t + 1 t + 1 ω t (9) A smart selection of the gains η t (10) can help achieving the promised performance [13]. η t = η 0 (1 + λη 0 t) 0.75 (10) The gain η 0 is set by observing the performance running on a subset of the labeled samples. Polyak and Juditsky [12] proved that if there are enough training samples, ASGD can obtain the parameters as good as the empirical optimal in just one epoch of the data. Besides, ASGD is extremely easy to implement compared to second order SGD. Since S3VM is designed to use labeled data and large scale unlabeled data, SGD algorithms are well suited for S3VM. Due to the advantages above of ASGD, we choose ASGD as the optimization approach. We follow concrete implementation method from Xu [13].

5 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) ASGD for S3VM To use unlabeled examples, well known SVMlight [5] assigned positive and negative lables to unlabeled data with the same fraction as found in the labeled data and then switch the labels heuristically. CCCP-TSVM [9] labels the unlabeled data both positive and negative. In this paper we proposed a method like self-learning as follow. (1) use ASGD to train a model on labeled data with logistic loss (2) use the model above to label the unlabeled data (3) mix up the labeled and unlabeled data, then shuffle (4) for each epoch (a pass through all data) a. use ASGD to train a model on labeled data and unlabeled data with logistic loss and ramp loss respectively b. use the model above to label the unlabeled data In our experiment we set C = 1, L C = 1 U determined. empirically, λ varies in different tasks and is easy to The main point we want to emphasize is the advantage in terms of training time consumption comparison between our method and exited approaches. Our method can train S3VMs in linear time. The complexity of ASGD-S3VM is O(T (L + U)) (T means the number of iterations, L and U represent the number of labeled and unlabeled data respectively). From our experience, it will reach the minimum in five iterations. So our method can deal with very large scale data. 4 Experiment and Analysis This section reports experimental results on two typical tasks, both of which are large scale. We take SVMlight-TSVM, CCCP-TSVM, and stat-of-the-art SVMlin as the baselines. Furthermore, we compare our method with standard SVM running on pure labeled examples. For different tasks, we use related measurements to evaluate the results of algorithms respectively. All the experiments were run on a machine with six 64 bit Xeon (R) processors (1.87GHz) and 256G memory. 4.1 RCV1 experiment The first experiment is a text categorization task, whose data set is from Reuters prepared by Lewis et al. [14] to classify CCAT (Corporate/Industrial) and NOT CCAT categories. The features are constructed using the bag of words technique, weighted with a TF.IDF scheme and normalized to 1. The partition of the data is the same as [2]. As other S3VMs are intractable for such large data set, we only perform experiments using 1000 labeled examples. The measurement is simply accuracy.

6 6256 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) Table 1: Accuracy (%) comparison method labeled unlabeled size size SVM SVMlight-TSVM SVMlin CCCP-TSVM ASGD-S3VM 1k k 1k k 2k k 5k k 10k We then varied the number of unlabeled examples U, and reported the test accuracy for each selection of U. The accuracy comparison against different S3VMs can be seen in Table 1. In general, the accuracies of all the methods are improved with the increasing of unlabeled examples. Compared to the standard SVM, all the S3VMs get higher accuracy. This shows that unlabeled data can improve the results on this problem. From table 1 we can see SVMlight-TSVM outperforms in this task, ASGD-S3VM can get comparable results with other S3VM algorithms top results. Table 2 shows the training time of all the S3VM algorithms with respect to the number of unlabeled examples. In the case of using 1000 labeled points and unlabeled points, ASGD- S3VM approximately times faster than SVMlight-TSVM, 5230 times faster than CCCP- TSVM, 100 times faster than SVMlin. Moreover, the training time of ASGD-S3VM grows slightly as the number of the unlabeled examples increase, whereas the other three algorithms cost too much. labeled size unlabeled size Table 2: Training time (s) comparison method SVMlight-TSVM SVMlin CCCP-TSVM ASGD-S3VM 1k 1k k 2k k 5k k 10k NLP&CC 2012 evaluation experiment The second experiment is a sentiment analysis task whose data set is from (NLP&CC Evaluation 2012). What we do is subjectivity identification on Chinese micro blog. To compare with the best result of (NLP&CC Evaluation 2012) we choose the same evaluation criterion, F-measure, which is defined as (13) system correct P recision = (11) system proposed

7 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) Recall = system correct gold 2 P recision Recall F measure = (13) P recision + Recall The number of our labeled set is the same as the first of (NLP&CC Evaluation 2012) [15]. Each Chinese micro blog is represented by the basic features including part of speech, TF-IDF, emoticon and Hownet. All the features are normalized to 1. We run all the S3VM algorithms and got similar conclusions with last task. In brief we just report the results of ASGD-S3VM and SVMlight-TSVM as SVMlight-TSVM outperforms CCCP- TSVM and SVMlin. From table 3, we can get similar conclusion that unlabeled data greatly improves performance. We can see accuracy and F-measure rise up as the unlabeled examples increase. However, if we still add more unlabeled data, both criterions grow slightly. Table 4 shows that the training time consumption of ASGD-S3VM can outperform SVMlight- TSVM by order of magnitudes amazingly. Table 5 shows that our method outperforms the best result of the NLP&CC Evaluation in F-measure and recall. It is worth noting that their method uses some complex and additional features like subjective word and opinion word, which help achieve good result obviously, whereas we just use the simple basic features. This means that our semi-supervised method learns potential knowledge from unlabeled data effectively. labeled size unlabeled size Table 3: Accuracy (%) and F1 comparison standard SVM SVMlight-TSVM ASGD-S3VM ACC(%) F1 ACC(%) F1 ACC(%) F1 2k k 5k k 10k k 20k (12) Table 4: Training time (s) comparison labeled size unlabeled size Training time(s) SVMlight-TSVM ASGD-S3VM 2k 5k k 10k k 20k Table 5: Our best result vs first of evaluation Precision Recall F1 First of Evaluation ASGD-S3VM

8 6258 X. Zhou et al. /Journal of Computational Information Systems 9: 15 (2013) Conclusion In this article we have proposed an efficient method for large scale linear semi-supervised support vector machine based on averaged stochastic gradient descent. Our method significantly improve the training speed of S3VM over existing approaches, such as SVMlight-TSVM, CCCP-TSVM and SVMlin. Our method is more practical in real applications, when labeled data is scarce and plenty of unlabeled data is easily available. As our future work, we will apply this approach to non-linear setting. References [1] V. Vapnik, The Nature of Statistical Learning Theory, Springer, second edition, [2] Boser B E, Guyon I M, Vapnik V N. A training algorithm for optimal margin classifiers, in: Proc. the fifth annual workshop on Computational learning theory, 1992, pp [3] L. Bottou, Large-scale machine learning with stochastic gradient descent, in: Proc. COMPSTAT, 2010, pp [4] Bennett K, Demiriz A, Semi-supervised support vector machines, Advances in Neural Information processing systems (1999) [5] Joachims T, Transductive inference for text classification using support vector machines, in: Proc. Machine learning-international workshop then conference, 1999, pp [6] Cristianini, T.D.B.N, Convex methods for transduction, Advances in neural information processing systems 16 (2004) 73. [7] Xu L, Schuurmans D, Unsupervised and semi-supervised multi-class support vector machines, in: Proc. the national conference on artificial intelligence, 1999, pp [8] O. Chapelle, A. Zien, Semi-supervised classification by low density separation, in: Proc. the Tenth International Workshop on Artificial Intelligence and Statistics, [9] Collobert R, Sinz F, Weston J, et al. Large scale transductive SVMs, The Journal of Machine Learning Research 7 (2006) [10] Sindhwani V, Keerthi S S, Large scale semi-supervised linear SVMs in: Proc. the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006, pp [11] Kushner, Yin, Stochastic approximation algorithms and applications, NewYork: Springer-Verlag, [12] Polyak, Boris T., and Anatoli B. Juditsky. Acceleration of stochastic approximation by averaging, SIAM Journal on Control and Optimization (1992) [13] Xu W. Towards optimal one pass large scale learning with averaged stochastic gradient descent, arxiv preprint arxiv: , [14] Lewis, David D., et al. Rcv1: A new benchmark collection for text categorization research, The Journal of Machine Learning Research 5 (2004) [15] Xiao Zhou, Zhenyu Zhou, Fang Li, LTLAB at Chinese Microblog Sentiment Analysis Track, in: Proc.1st NLP&CC Evaluation, 2012.

Trading Convexity for Scalability

Trading Convexity for Scalability Trading Convexity for Scalability Léon Bottou leon@bottou.org Ronan Collobert, Fabian Sinz, Jason Weston ronan@collobert.com, fabee@tuebingen.mpg.de, jasonw@nec-labs.com NEC Laboratories of America A Word

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

Large Scale Machine Learning with Stochastic Gradient Descent

Large Scale Machine Learning with Stochastic Gradient Descent Large Scale Machine Learning with Stochastic Gradient Descent Léon Bottou leon@bottou.org Microsoft (since June) Summary i. Learning with Stochastic Gradient Descent. ii. The Tradeoffs of Large Scale Learning.

More information

Large Scale Semi-supervised Linear SVMs. University of Chicago

Large Scale Semi-supervised Linear SVMs. University of Chicago Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Inference with the Universum

Inference with the Universum Jason Weston NEC Labs America, Princeton NJ, USA. Ronan Collobert NEC Labs America, Princeton NJ, USA. Fabian Sinz NEC Labs America, Princeton NJ, USA; and Max Planck Insitute for Biological Cybernetics,

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS

SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS SUPPORT VECTOR REGRESSION WITH A GENERALIZED QUADRATIC LOSS Filippo Portera and Alessandro Sperduti Dipartimento di Matematica Pura ed Applicata Universit a di Padova, Padova, Italy {portera,sperduti}@math.unipd.it

More information

Polyhedral Computation. Linear Classifiers & the SVM

Polyhedral Computation. Linear Classifiers & the SVM Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

CS60021: Scalable Data Mining. Large Scale Machine Learning

CS60021: Scalable Data Mining. Large Scale Machine Learning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Large Scale Machine Learning Sourangshu Bhattacharya Example: Spam filtering Instance

More information

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017 Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem

More information

Cutting Plane Training of Structural SVM

Cutting Plane Training of Structural SVM Cutting Plane Training of Structural SVM Seth Neel University of Pennsylvania sethneel@wharton.upenn.edu September 28, 2017 Seth Neel (Penn) Short title September 28, 2017 1 / 33 Overview Structural SVMs

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

Semi-Supervised Optimal Margin Distribution Machines

Semi-Supervised Optimal Margin Distribution Machines Semi-Supervised Optimal Margin Distribution Machines Teng Zhang and Zhi-Hua Zhou National Key Lab for Novel Software Technology, Nanjing University, Nanjing 210023, China {zhangt, zhouzh}@lamda.nju.edu.cn

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

A Parallel SGD method with Strong Convergence

A Parallel SGD method with Strong Convergence A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Cluster Kernels for Semi-Supervised Learning

Cluster Kernels for Semi-Supervised Learning Cluster Kernels for Semi-Supervised Learning Olivier Chapelle, Jason Weston, Bernhard Scholkopf Max Planck Institute for Biological Cybernetics, 72076 Tiibingen, Germany {first. last} @tuebingen.mpg.de

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Does Unlabeled Data Help?

Does Unlabeled Data Help? Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline

More information

Semi-supervised Learning

Semi-supervised Learning Semi-supervised Learning Introduction Supervised learning: x r, y r R r=1 E.g.x r : image, y r : class labels Semi-supervised learning: x r, y r r=1 R, x u R+U u=r A set of unlabeled data, usually U >>

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic

More information

Active Learning with Support Vector Machines for Tornado Prediction

Active Learning with Support Vector Machines for Tornado Prediction International Conference on Computational Science (ICCS) 2007 Beijing, China May 27-30, 2007 Active Learning with Support Vector Machines for Tornado Prediction Theodore B. Trafalis 1, Indra Adrianto 1,

More information

arxiv: v4 [math.oc] 5 Jan 2016

arxiv: v4 [math.oc] 5 Jan 2016 Restarted SGD: Beating SGD without Smoothness and/or Strong Convexity arxiv:151.03107v4 [math.oc] 5 Jan 016 Tianbao Yang, Qihang Lin Department of Computer Science Department of Management Sciences The

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Coordinate Descent and Ascent Methods

Coordinate Descent and Ascent Methods Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:

More information

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces

Notes on the framework of Ando and Zhang (2005) 1 Beyond learning good functions: learning good spaces Notes on the framework of Ando and Zhang (2005 Karl Stratos 1 Beyond learning good functions: learning good spaces 1.1 A single binary classification problem Let X denote the problem domain. Suppose we

More information

Cost-Sensitive Semi-Supervised Support Vector Machine

Cost-Sensitive Semi-Supervised Support Vector Machine Cost-Sensitive Semi-Supervised Support Vector Machine Yu-Feng Li James T. Kwok Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 93, China Department of Computer

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

Model Selection for LS-SVM : Application to Handwriting Recognition

Model Selection for LS-SVM : Application to Handwriting Recognition Model Selection for LS-SVM : Application to Handwriting Recognition Mathias M. Adankon and Mohamed Cheriet Synchromedia Laboratory for Multimedia Communication in Telepresence, École de Technologie Supérieure,

More information

Multi-view Laplacian Support Vector Machines

Multi-view Laplacian Support Vector Machines Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Semi-Supervised Classification with Universum

Semi-Supervised Classification with Universum Semi-Supervised Classification with Universum Dan Zhang 1, Jingdong Wang 2, Fei Wang 3, Changshui Zhang 4 1,3,4 State Key Laboratory on Intelligent Technology and Systems, Tsinghua National Laboratory

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

Graph-Based Semi-Supervised Learning

Graph-Based Semi-Supervised Learning Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005 Graph-Based Semi-Supervised Learning Yoshua Bengio, Olivier

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

Transductive Support Vector Machines for Structured Variables

Transductive Support Vector Machines for Structured Variables Alexander Zien Max Planck Institute for Biological Cybernetics Ulf Brefeld Tobias Scheffer Max Planck Institute for Computer Science alexander.zien@tuebingen.mpg.de brefeld@mpi-inf.mpg.de scheffer@mpi-inf.mpg.de

More information

Machine Learning Lecture 6 Note

Machine Learning Lecture 6 Note Machine Learning Lecture 6 Note Compiled by Abhi Ashutosh, Daniel Chen, and Yijun Xiao February 16, 2016 1 Pegasos Algorithm The Pegasos Algorithm looks very similar to the Perceptron Algorithm. In fact,

More information

Training Support Vector Machines: Status and Challenges

Training Support Vector Machines: Status and Challenges ICML Workshop on Large Scale Learning Challenge July 9, 2008 Chih-Jen Lin (National Taiwan Univ.) 1 / 34 Training Support Vector Machines: Status and Challenges Chih-Jen Lin Department of Computer Science

More information

Robust Support Vector Machine Using Least Median Loss Penalty

Robust Support Vector Machine Using Least Median Loss Penalty Preprints of the 8th IFAC World Congress Milano (Italy) August 8 - September, Robust Support Vector Machine Using Least Median Loss Penalty Yifei Ma Li Li Xiaolin Huang Shuning Wang Department of Automation,

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Semi-Supervised Learning

Semi-Supervised Learning Semi-Supervised Learning getting more for less in natural language processing and beyond Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning many human

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Kernel Regression with Order Preferences

Kernel Regression with Order Preferences Kernel Regression with Order Preferences Xiaojin Zhu and Andrew B. Goldberg Department of Computer Sciences University of Wisconsin, Madison, WI 37, USA Abstract We propose a novel kernel regression algorithm

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Algorithmic Stability and Generalization Christoph Lampert

Algorithmic Stability and Generalization Christoph Lampert Algorithmic Stability and Generalization Christoph Lampert November 28, 2018 1 / 32 IST Austria (Institute of Science and Technology Austria) institute for basic research opened in 2009 located in outskirts

More information

arxiv: v1 [cs.lg] 15 Aug 2017

arxiv: v1 [cs.lg] 15 Aug 2017 Theoretical Foundation of Co-Training and Disagreement-Based Algorithms arxiv:1708.04403v1 [cs.lg] 15 Aug 017 Abstract Wei Wang, Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing

More information

A Dual Coordinate Descent Method for Large-scale Linear SVM

A Dual Coordinate Descent Method for Large-scale Linear SVM Cho-Jui Hsieh b92085@csie.ntu.edu.tw Kai-Wei Chang b92084@csie.ntu.edu.tw Chih-Jen Lin cjlin@csie.ntu.edu.tw Department of Computer Science, National Taiwan University, Taipei 106, Taiwan S. Sathiya Keerthi

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Midterm exam CS 189/289, Fall 2015

Midterm exam CS 189/289, Fall 2015 Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points

More information

Transductive Minimax Probability Machine

Transductive Minimax Probability Machine Transductive Minimax Probability Machine Gao Huang 1,2, Shiji Song 1,2, Zhixiang (Eddie) Xu 3, and Kilian Weinberger 3 1 Tsinghua National Laboratory for Information Science and Technology (TNList) 2 Department

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Parametric Mixture Models for Multi-Labeled Text

Parametric Mixture Models for Multi-Labeled Text Parametric Mixture Models for Multi-Labeled Text Naonori Ueda Kazumi Saito NTT Communication Science Laboratories 2-4 Hikaridai, Seikacho, Kyoto 619-0237 Japan {ueda,saito}@cslab.kecl.ntt.co.jp Abstract

More information

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University

Semi-Supervised Learning with Graphs. Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu School of Computer Science Carnegie Mellon University 1 Semi-supervised Learning classification classifiers need labeled data to train labeled data

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction

Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Structured Statistical Learning with Support Vector Machine for Feature Selection and Prediction Yoonkyung Lee Department of Statistics The Ohio State University http://www.stat.ohio-state.edu/ yklee Predictive

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions AUSTRIAN JOURNAL OF STATISTICS Volume 37 (2008), Number 1, 109 118 Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions Vaidas Balys and Rimantas Rudzkis Institute

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Notes on Noise Contrastive Estimation (NCE)

Notes on Noise Contrastive Estimation (NCE) Notes on Noise Contrastive Estimation NCE) David Meyer dmm@{-4-5.net,uoregon.edu,...} March 0, 207 Introduction In this note we follow the notation used in [2]. Suppose X x, x 2,, x Td ) is a sample of

More information

Agnostic Domain Adaptation

Agnostic Domain Adaptation Agnostic Domain Adaptation Alexander Vezhnevets Joachim M. Buhmann ETH Zurich 8092 Zurich, Switzerland {alexander.vezhnevets,jbuhmann}@inf.ethz.ch Abstract. The supervised learning paradigm assumes in

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

CS221 / Autumn 2017 / Liang & Ermon. Lecture 2: Machine learning I

CS221 / Autumn 2017 / Liang & Ermon. Lecture 2: Machine learning I CS221 / Autumn 2017 / Liang & Ermon Lecture 2: Machine learning I cs221.stanford.edu/q Question How many parameters (real numbers) can be learned by machine learning algorithms using today s computers?

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information