Online Manifold Regularization: A New Learning Setting and Empirical Study
|
|
- Hilda Lang
- 5 years ago
- Views:
Transcription
1 Online Manifold Regularization: A New Learning Setting and Empirical Study Andrew B. Goldberg 1, Ming Li 2, Xiaojin Zhu 1 1 Computer Sciences, University of Wisconsin Madison, USA. {goldberg,jerryzhu}@cs.wisc.edu 2 National Key Laboratory for Novel Software Technology, Nanjing University, China. lim@lamda.nju.edu.cn Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 1 / 14
2 Life-long learning x 1 x 2... x x y 1 = 0 n/a... y 1000 = 1... y = 0... Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 2 / 14
3 Life-long learning x 1 x 2... x x y 1 = 0 n/a... y 1000 = 1... y = 0... Unlike standard supervised learning: n examples arrive sequentially, cannot even store them all most examples are unlabeled no iid assumption, p(x, y) can change over time Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 2 / 14
4 This is how children learn, too x 1 x 2... x x y 1 = 0 n/a... y 1000 = 1... y = 0... Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 3 / 14
5 New paradigm: online semi-supervised learning Main contribution: merging learning settings 1 Online: learn non-iid sequentially, but fully labeled 2 Semi-supervised: learn from iid batch, (mostly) unlabeled 1 At time t, adversary picks x t X, y t Y not necessarily iid, shows x t 2 Learner has classifier f t : X R, predicts f t (x t ) 3 With small probability, adversary reveals y t ; otherwise it abstains (unlabeled) 4 Learner updates to f t+1 based on x t and y t (if given). Repeat. Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 4 / 14
6 New paradigm: online semi-supervised learning Main contribution: merging learning settings 1 Online: learn non-iid sequentially, but fully labeled 2 Semi-supervised: learn from iid batch, (mostly) unlabeled 1 At time t, adversary picks x t X, y t Y not necessarily iid, shows x t 2 Learner has classifier f t : X R, predicts f t (x t ) 3 With small probability, adversary reveals y t ; otherwise it abstains (unlabeled) 4 Learner updates to f t+1 based on x t and y t (if given). Repeat. Many batch SSL algorithms exist; we focus on manifold regularization Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 4 / 14
7 Review: batch manifold regularization A form of graph-based semi-supervised learning [Belkin et al. JMLR06]: Graph on x 1... x n Edge weights w st encode similarity between x s, x t, e.g., knn Assumption: similar examples have similar labels Manifold regularization minimizes risk: J(f) = 1 l T t=1 δ(y t )c(f(x t ), y t ) + λ 1 2 f 2 K + λ 2 2T T (f(x s ) f(x t )) 2 w st s,t=1 c(f(x), y) convex loss function, e.g., the hinge loss. Solution f = arg min f J(f). Generalizes graph mincut and label propagation. d1 d2 d3 Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 5 / 14 d4
8 From batch to online batch risk = average instantaneous risks J(f) = 1 T T t=1 J t(f) Batch risk J(f) = 1 l T t=1 δ(y t )c(f(x t ), y t ) + λ 1 2 f 2 K + λ 2 2T T (f(x s ) f(x t )) 2 w st s,t=1 Instantaneous risk J t (f) = T l δ(y t)c(f(x t ), y t ) + λ 1 2 f 2 K + λ 2 t (f(x i ) f(x t )) 2 w it (includes graph edges between x t and all previous examples) i=1 Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 6 / 14
9 Online convex programming Instead of minimizing convex J(f), reduce convex J t (f) at each step t. J t (f) f t+1 = f t η t f Remarkable no regret guarantee against adversary: Note: accuracy can be arbitrarily bad if adversary flips target often If so, no batch learner in hindsight can do well either ft regret 1 T T J t (f t ) J(f ) t=1 [Zinkevich ICML03] No regret: lim sup T 1 T T t=1 J t(f t ) J(f ) 0. If no adversary (iid), the average classifier f = 1/T T t=1 f t is good: J( f) J(f ). Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 7 / 14
10 Sparse approximation 1: buffering The algorithm is impractical Space O(T ): stores all previous examples Time O(T 2 ): each new example compared to all previous ones In reality, T Approximation: Keep a size τ buffer Approximate representers: f t = t 1 i=t τ α(t) i K(x i, ) Approximate instantaneous risk J t (f) = T l δ(y t)c(f(x t ), y t ) + λ 1 t 2 f 2 K + λ 2 τ Dynamic graph on examples in the buffer t (f(x i ) f(x t )) 2 w it i=t τ Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 8 / 14
11 Sparse approximation 2: random projection tree [Dasgupta and Freund, STOC08] Discretize data manifold by online clustering. When a cluster accumulates enough examples, split along random hyperplane. Extends k-d tree. Approximation uses a graph over clusters (represented by Gaussians) Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 9 / 14
12 Experimental Results Compare batch manifold regularization (MR) with several online variants (i.e., full graph, buffering, random projection tree) Runtime Risk (to assess no regret guarantee) Generalization error using average classifier (assumes iid) Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 10 / 14
13 Experiment: runtime Buffering and RPtree scale linearly, enabling life-long learning. Spirals MNIST 0 vs. 1 Time (seconds) Batch MR Online MR Online MR (buffer) Online RPtree T T Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 11 / 14
14 Experiment: risk Online MR risk J air (T ) 1 T T t=1 J t(f t ) approaches batch risk J(f ) as T increases. Risk J(f*) Batch MR J air (T) Online MR J air (T) Online MR (buffer) J air (T) Online RPtree T Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 12 / 14
15 Experiment: generalization error of f if iid A variation of buffering as good as batch MR (preferentially keep labeled examples in buffer). (a) Spirals (b) Face Generalization error rate Batch MR Online MR Online MR (buffer) Online MR (buffer U) Online RPtree Online RPtree (PPK) Generalization error rate Batch MR Online MR Online MR (buffer) Online MR (buffer U) Online RPtree T T (c) MNIST 0 vs. 1 (d) MNIST 1 vs. 2 Generalization error rate Batch MR Online MR Online MR (buffer) Online MR (buffer U) Online RPtree Generalization error rate Batch MR Online MR Online MR (buffer) Online MR (buffer U) Online RPtree T T Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 13 / 14
16 Conclusions Online semi-supervised learning framework Sparse approximations: buffering and RPtree Future work: new bounds, new algorithms (e.g., S3VM) Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 14 / 14
17 Conclusions Online semi-supervised learning framework Sparse approximations: buffering and RPtree Future work: new bounds, new algorithms (e.g., S3VM) Thank you! Any questions? Andrew B. Goldberg (UW-Madison) Online Manifold Regularization 14 / 14
18 Experiment: adversarial concept drift Slowly rotating spirals, both p(x) and p(y x) changing. Batch f vs. online MR buffering f T Test set drawn from the current p(x, y) at time T. 0.7 Batch MR Online MR (buffer) 0.6 Generalization error rate T
19 Kernelized algorithm Init: t = 1, f 1 = 0 Repeat t 1 f t ( ) = α (t) i K(x i, ) i=1 1 receive x t, predict f t (x t ) = t 1 i=1 α(t) i K(x i, x t ) 2 occasionally receive y t 3 update f t to f t+1 by adjusting coefficients: α (t+1) i = (1 η t λ 1 )α (t) i 2η t λ 2 (f t (x i ) f t (x t ))w it, i < t t α (t+1) t = 2η t λ 2 4 store x t, let t = t + 1 i=1 (f t (x i ) f t (x t ))w it η t T l δ(y t)c (f(x t ), y t )
20 Sparse approximation 1: buffer update At each step, start with the current τ representers: f t = t 1 i=t τ Gradient descent on τ + 1 terms: f = α (t) i K(x i, ) + 0K(x t, ) t i=t τ Reduce to τ representers f t+1 = t Kernel matching pursuit α ik(x i, ) i=t τ+1 α(t+1) i min f f t+1 2 α (t+1) K(x i, ) by
21 Sparse approximation 2: random projection tree We use the clusters N (µ i, Σ i ) as representers: f t = s i=1 β (t) i K(µ i, ) Cluster graph edge weight between a cluster µ i and example x t is [ w µi t = E x N (µi,σ i ) exp ( x x t 2 )] 2σ 2 = (2π) d 2 Σi 1 2 Σ Σ 2 ( exp A further approximation is ( 1 2 µ i Σ 1 i w µi t = e µ i x t 2 /2σ 2 Update f (i.e., β) and the RPtree, discard x t. µ i + x t Σ 1 0 x t µ Σ µ ) )
Seeing stars when there aren't many stars Graph-based semi-supervised learning for sentiment categorization
Seeing stars when there aren't many stars Graph-based semi-supervised learning for sentiment categorization Andrew B. Goldberg, Xiaojin Jerry Zhu Computer Sciences Department University of Wisconsin-Madison
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More information4.1 Online Convex Optimization
CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..
More informationSemi-Supervised Learning by Multi-Manifold Separation
Semi-Supervised Learning by Multi-Manifold Separation Xiaojin (Jerry) Zhu Department of Computer Sciences University of Wisconsin Madison Joint work with Andrew Goldberg, Zhiting Xu, Aarti Singh, and Rob
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationFull-information Online Learning
Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2
More informationTutorial: PART 1. Online Convex Optimization, A Game- Theoretic Approach to Learning.
Tutorial: PART 1 Online Convex Optimization, A Game- Theoretic Approach to Learning http://www.cs.princeton.edu/~ehazan/tutorial/tutorial.htm Elad Hazan Princeton University Satyen Kale Yahoo Research
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationA graph based approach to semi-supervised learning
A graph based approach to semi-supervised learning 1 Feb 2011 Two papers M. Belkin, P. Niyogi, and V Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples.
More informationOLSO. Online Learning and Stochastic Optimization. Yoram Singer August 10, Google Research
OLSO Online Learning and Stochastic Optimization Yoram Singer August 10, 2016 Google Research References Introduction to Online Convex Optimization, Elad Hazan, Princeton University Online Learning and
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationKernels. Machine Learning CSE446 Carlos Guestrin University of Washington. October 28, Carlos Guestrin
Kernels Machine Learning CSE446 Carlos Guestrin University of Washington October 28, 2013 Carlos Guestrin 2005-2013 1 Linear Separability: More formally, Using Margin Data linearly separable, if there
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationStochastic Gradient Descent
Stochastic Gradient Descent Machine Learning CSE546 Carlos Guestrin University of Washington October 9, 2013 1 Logistic Regression Logistic function (or Sigmoid): Learn P(Y X) directly Assume a particular
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationBetter Algorithms for Selective Sampling
Francesco Orabona Nicolò Cesa-Bianchi DSI, Università degli Studi di Milano, Italy francesco@orabonacom nicolocesa-bianchi@unimiit Abstract We study online algorithms for selective sampling that use regularized
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationLearning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31
Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSemi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances
Semi-Supervised Learning in Reproducing Kernel Hilbert Spaces Using Local Invariances Wee Sun Lee,2, Xinhua Zhang,2, and Yee Whye Teh Department of Computer Science, National University of Singapore. 2
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationComputational and Statistical Learning Theory
Computational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 17: Stochastic Optimization Part II: Realizable vs Agnostic Rates Part III: Nearest Neighbor Classification Stochastic
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationAdaptive Sampling Under Low Noise Conditions 1
Manuscrit auteur, publié dans "41èmes Journées de Statistique, SFdS, Bordeaux (2009)" Adaptive Sampling Under Low Noise Conditions 1 Nicolò Cesa-Bianchi Dipartimento di Scienze dell Informazione Università
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationGraphs in Machine Learning
Graphs in Machine Learning Michal Valko Inria Lille - Nord Europe, France TA: Pierre Perrault Partially based on material by: Mikhail Belkin, Jerry Zhu, Olivier Chapelle, Branislav Kveton October 30, 2017
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationMidterm exam CS 189/289, Fall 2015
Midterm exam CS 189/289, Fall 2015 You have 80 minutes for the exam. Total 100 points: 1. True/False: 36 points (18 questions, 2 points each). 2. Multiple-choice questions: 24 points (8 questions, 3 points
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationReducing contextual bandits to supervised learning
Reducing contextual bandits to supervised learning Daniel Hsu Columbia University Based on joint work with A. Agarwal, S. Kale, J. Langford, L. Li, and R. Schapire 1 Learning to interact: example #1 Practicing
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationThe No-Regret Framework for Online Learning
The No-Regret Framework for Online Learning A Tutorial Introduction Nahum Shimkin Technion Israel Institute of Technology Haifa, Israel Stochastic Processes in Engineering IIT Mumbai, March 2013 N. Shimkin,
More informationSample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University
Sample and Computationally Efficient Active Learning Maria-Florina Balcan Carnegie Mellon University Machine Learning is Shaping the World Highly successful discipline with lots of applications. Computational
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationA Framework for Modeling Positive Class Expansion with Single Snapshot
A Framework for Modeling Positive Class Expansion with Single Snapshot Yang Yu and Zhi-Hua Zhou LAMDA Group National Key Laboratory for Novel Software Technology Nanjing University, China Motivating task
More informationLinear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))
Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationOnline Bayesian Passive-Aggressive Learning
Online Bayesian Passive-Aggressive Learning Full Journal Version: http://qr.net/b1rd Tianlin Shi Jun Zhu ICML 2014 T. Shi, J. Zhu (Tsinghua) BayesPA ICML 2014 1 / 35 Outline Introduction Motivation Framework
More informationLearning theory. Ensemble methods. Boosting. Boosting: history
Learning theory Probability distribution P over X {0, 1}; let (X, Y ) P. We get S := {(x i, y i )} n i=1, an iid sample from P. Ensemble methods Goal: Fix ɛ, δ (0, 1). With probability at least 1 δ (over
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationSupport Vector Machines
Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization
More information1 Machine Learning Concepts (16 points)
CSCI 567 Fall 2018 Midterm Exam DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO PLEASE TURN OFF ALL CELL PHONES Problem 1 2 3 4 5 6 Total Max 16 10 16 42 24 12 120 Points Please read the following instructions
More informationSupport Vector Machines (SVMs).
Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most
More informationHou, Ch. et al. IEEE Transactions on Neural Networks March 2011
Hou, Ch. et al. IEEE Transactions on Neural Networks March 2011 Semi-supervised approach which attempts to incorporate partial information from unlabeled data points Semi-supervised approach which attempts
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationLinear Models for Regression
Linear Models for Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLarge Scale Semi-supervised Linear SVMs. University of Chicago
Large Scale Semi-supervised Linear SVMs Vikas Sindhwani and Sathiya Keerthi University of Chicago SIGIR 2006 Semi-supervised Learning (SSL) Motivation Setting Categorize x-billion documents into commercial/non-commercial.
More informationAd Placement Strategies
Case Study : Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 7 th, 04 Ad
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationUnlabeled Data: Now It Helps, Now It Doesn t
institution-logo-filena A. Singh, R. D. Nowak, and X. Zhu. In NIPS, 2008. 1 Courant Institute, NYU April 21, 2015 Outline institution-logo-filena 1 Conflicting Views in Semi-supervised Learning The Cluster
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationHow to learn from very few examples?
How to learn from very few examples? Dengyong Zhou Department of Empirical Inference Max Planck Institute for Biological Cybernetics Spemannstr. 38, 72076 Tuebingen, Germany Outline Introduction Part A
More informationEscaping the curse of dimensionality with a tree-based regressor
Escaping the curse of dimensionality with a tree-based regressor Samory Kpotufe UCSD CSE Curse of dimensionality In general: Computational and/or prediction performance deteriorate as the dimension D increases.
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationNew Algorithms for Contextual Bandits
New Algorithms for Contextual Bandits Lev Reyzin Georgia Institute of Technology Work done at Yahoo! 1 S A. Beygelzimer, J. Langford, L. Li, L. Reyzin, R.E. Schapire Contextual Bandit Algorithms with Supervised
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationLecture 16: Perceptron and Exponential Weights Algorithm
EECS 598-005: Theoretical Foundations of Machine Learning Fall 2015 Lecture 16: Perceptron and Exponential Weights Algorithm Lecturer: Jacob Abernethy Scribes: Yue Wang, Editors: Weiqing Yu and Andrew
More informationCase Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!
Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:
More informationFoundations of Machine Learning On-Line Learning. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning On-Line Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationCoordinate Descent and Ascent Methods
Coordinate Descent and Ascent Methods Julie Nutini Machine Learning Reading Group November 3 rd, 2015 1 / 22 Projected-Gradient Methods Motivation Rewrite non-smooth problem as smooth constrained problem:
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationEfficient and Principled Online Classification Algorithms for Lifelon
Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,
More informationLinear Models in Machine Learning
CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,
More informationLinear classifiers: Overfitting and regularization
Linear classifiers: Overfitting and regularization Emily Fox University of Washington January 25, 2017 Logistic regression recap 1 . Thus far, we focused on decision boundaries Score(x i ) = w 0 h 0 (x
More informationPresented by. Committee
Presented by Committee Learning from Ambiguous Examples K-Means, PCA Mixture- Models,... Semi- Supervised Clustering,... Semi-Supervised Learning, Transductive- Inference, Multiple-Instance Learning,
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationMidterm, Fall 2003
5-78 Midterm, Fall 2003 YOUR ANDREW USERID IN CAPITAL LETTERS: YOUR NAME: There are 9 questions. The ninth may be more time-consuming and is worth only three points, so do not attempt 9 unless you are
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationOptimal and Adaptive Online Learning
Optimal and Adaptive Online Learning Haipeng Luo Advisor: Robert Schapire Computer Science Department Princeton University Examples of Online Learning (a) Spam detection 2 / 34 Examples of Online Learning
More informationLearnability, Stability, Regularization and Strong Convexity
Learnability, Stability, Regularization and Strong Convexity Nati Srebro Shai Shalev-Shwartz HUJI Ohad Shamir Weizmann Karthik Sridharan Cornell Ambuj Tewari Michigan Toyota Technological Institute Chicago
More informationLogistic Regression Logistic
Case Study 1: Estimating Click Probabilities L2 Regularization for Logistic Regression Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 10 th,
More informationarxiv: v1 [cs.lg] 8 Feb 2018
Online Learning: A Comprehensive Survey Steven C.H. Hoi, Doyen Sahoo, Jing Lu, Peilin Zhao School of Information Systems, Singapore Management University, Singapore School of Software Engineering, South
More informationKernel Regression with Order Preferences
Kernel Regression with Order Preferences Xiaojin Zhu and Andrew B. Goldberg Department of Computer Sciences University of Wisconsin, Madison, WI 37, USA Abstract We propose a novel kernel regression algorithm
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationMulti-View Point Cloud Kernels for Semi-Supervised Learning
Multi-View Point Cloud Kernels for Semi-Supervised Learning David S. Rosenberg, Vikas Sindhwani, Peter L. Bartlett, Partha Niyogi May 29, 2009 Scope In semi-supervised learning (SSL), we learn a predictive
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationPerceptron Mistake Bounds
Perceptron Mistake Bounds Mehryar Mohri, and Afshin Rostamizadeh Google Research Courant Institute of Mathematical Sciences Abstract. We present a brief survey of existing mistake bounds and introduce
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationLogistic regression and linear classifiers COMS 4771
Logistic regression and linear classifiers COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationIs the test error unbiased for these programs?
Is the test error unbiased for these programs? Xtrain avg N o Preprocessing by de meaning using whole TEST set 2017 Kevin Jamieson 1 Is the test error unbiased for this program? e Stott see non for f x
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationHow can a Machine Learn: Passive, Active, and Teaching
How can a Machine Learn: Passive, Active, and Teaching Xiaojin Zhu jerryzhu@cs.wisc.edu Department of Computer Sciences University of Wisconsin-Madison NLP&CC 2013 Zhu (Univ. Wisconsin-Madison) How can
More informationStochastic Optimization
Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More information