Efficient Complex Output Prediction
|
|
- Annabella Greer
- 6 years ago
- Views:
Transcription
1 Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay
2 Outline Motivation and Goals Operator-valued Kernel Regression Scaling up Operator-valued Kernel Regression Conclusion 1
3 Classic Regression Using training data {(x i, y i ), i = 1,... N}, build a scalar-valued function f that predicts an output y R, given some input x X : Complex output regression: when Y = R p or a structured objects set or a functional space 2
4 Multiple Output Regression When Y = R p Image understanding : predict the name of an object in an image X : image representation space, Y = R p : semantic space Joint Quantile Regression Y = R p as a multitask learning for the values of the wished quantiles: τ 1,..., τ p (Sangnier et al. 2016) 3
5 Multiple Output Regression When Y is set of structured objects Identification of metabolites from mass spectra X : mass spectra space, Y= set of metabolites When Y = F a space of functions Functional quantile regression X = R d and Y = H a Reproducing Kernel Hilbert Space (Brault 2017) 4
6 Learning functions with values in a Hilbert space Y Operator-valued kernels, vector-valued Reproducing Kernel Hilbert Spaces Nonparametric learning Various loss functions for data-fitting Various kinds of regularization With theoretical guarantees both in terms of statistics and optimization that also leads to efficient learning algorithms 5
7 Outline Motivation and Goals Operator-valued Kernel Regression Scaling up Operator-valued Kernel Regression Conclusion 6
8 Operator-valued Kernels Natural extension of scalar kernels for vector-valued functions Allows coupling between outputs Let X some input space and Y a Hilbert Space Scalar Operator-valued Domain k(x, z) R K(x, z) L(Y) Symmetry k(x, z) = k(z, x) K(x, z) = K(z, x) N PD c i, c j R, i,j=1 c ic j k(x i, x j ) 0 c i, c j R p N, i,j=1 < c i, K(x i, x j )c j > Y 0 The simplest operator-valued kernel, the decomposable one: K(x, x ) = k(x, x )B, where B a positive semi-definite p p matrix; k is scalar-valued kernel on X. B = I : recovers the case of p independent scalar-valued kernel machines. 7
9 Vector-valued RKHS Given an OVK K, unique vector-valued RKHS (H K ), Feature maps: K(x, z) = Φ(x) Φ(z), Representer theorems. Representer theorem (Micchelli and Pontil, 2005) Given a training set {(x 1, y 1 ),..., (x N, y N )} X Y, the minimizer N ˆf = arg minf f 2 H K + λ N l(y i, f(x i )) admits an expansion of the form: where c i Y. i=1 ˆf( ) N = K(, x i )c i, i=1 8
10 Regression in Vector-valued RKHS: a few examples Image understanding : predict object name in an image Surrogate loss l Fisher (y, f(x)) = θ ln p θ (y) f(x) 2 + pre-image (Djerrab et al. 2017) Decomposable kernels Very good results on Few-shot-learning (Caltech101) 9
11 Regression in Vector-valued RKHS: a few examples Image understanding : predict object name in an image Surrogate loss l Fisher (y, f(x)) = θ ln p θ (y) f(x) 2 + pre-image (Djerrab et al. 2017) Decomposable kernels Very good results on Few-shot-learning (Caltech101) Sparse modeling of time series Loss: ϵ-sensitive loss and transformable kernels (Lim et al. 2013, 2015; Sangnier et al. 2016) Application : modeling climate data (Lim et al. 2015) 9
12 Joint Quantile Regression as multitask learning Loss: l pinball (y, f(x)) = l τ (y.1 f(x) b), y R but f(x) R p Decomposable matrix parameterized with the values of the wished quantiles: τ 1,..., τ p (Sangnier et al. 2016) Pinball loss: l τ (r) = { p j=1 τ j r j if r j 0, (τ j 1)r j if r j < 0. 10
13 Outline Motivation and Goals Operator-valued Kernel Regression Scaling up Operator-valued Kernel Regression Conclusion 11
14 Scalability of Regression in vector-valued RKHS Focus on Kernel ridge regression for Y = R p : Prediction in linear time w.r.t data O(Np 2 ), Naive learning (closed form) in O(N 3 p 3 ) How to make the method scalable? Find a matrix-valued feature map, ϕ, such that K(x, z) K(x, z) = Φ(x) Φ(z), (1) In order to work with the following linear model f(x) = Φ(x) θ (2) where θ R D. 12
15 Toward spectral approximation of OVK Theorem (Bochner for OVK (Carmeli et al. 2010)) Let K: R d R d L(Y) be a translation invariant positive definite continuous OVK. There exists a unique non-negative Borel operator-valued measure Q such that x, z R d R d K(x, z) = cos ( x z, ω )dq(ω) R d 13
16 Toward spectral approximation of OVK Theorem (Bochner for OVK (Carmeli et al. 2010)) Let K: R d R d L(Y) be a translation invariant positive definite continuous OVK. There exists a unique non-negative Borel operator-valued measure Q such that x, z R d R d K(x, z) = cos ( x z, ω )dq(ω) R d Find B : R d L(U; Y), µ scalar positive measure, such that dq(ω) = B(ω)B(ω) dµ(ω) 13
17 Operator Random Fourier Features (ORFF) Assume µ is a probability distribution Then given (ω j ) D j=1 µd i.i.d construct (Brault et al. 2016) Φ: X L(Y, U 2D ) ( ) cos( x, ωj )B(ω j ) x 1 D D j=1 sin( x, ω j )B(ω j ) Φ approximated feature map for kernel K. x, z R d R d, Φ(x) Φ(z) = 1 D D cos ( x z, ω j )B(ω j )B(ω j ) K(x, z) D j=1 where the convergence holds µ-almost everywhere in the weak sense. 14
18 Application to Functional Quantile Regression Toy dataset: N = 1000 points, D = 100(ORFF on input kernel) and D = 100(RFF on output kernel) Matches performance obtained by multi-task learning (Sangnier et al. 2016), but faster, and with an access to all quantiles levels. 15
19 Outline Motivation and Goals Operator-valued Kernel Regression Scaling up Operator-valued Kernel Regression Conclusion 16
20 Conclusion Operator-valued Kernel Regression: extends kernel methods to more involved prediction problems Versatile framework : losses and kernels Scalability obtained with Random Fourier Feature techniques Theoretical guarantees on approximation 17
21 Perspectives Theoretical properties of learning with ORFF Stacking ORFF / links with Deep Learning Towards Hybrid Architectures (Mairal 2016) Image/text understanding (combining Deep Neural architectures and ORFF) Anomaly detection (extending one-class SVM) Spatio-temporal data : climatics, epidemics data 18
22 Thank you for your attention Our contributions C. Brouard, F. d Alché-Buc, M.Szafranski: Semi-supervised Penalized Output Kernel Regression for Link Prediction. ICML 2011: N. Lim, Y. Senbabaoglu, G.Michailidis, F. d Alché-Buc: OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks. Bioinformatics 29(11): (2013) N. Lim, F. d Alché-Buc, C. Auliac, G. Michailidis, Operator-valued Kernel based Vector Autoregressive Models for Network Inference, Machine Learning Journal, April C. Brouard, M. Szafranski, F. d Alché-Buc,, Input Output Kernel Regression for supervised and semi-supervised structured output learning, JMLR, 2016 C. Brouard, H. Shen, K. Dührkop, F. d Alché-Buc, S. Böcker, J. Rousu: Fast metabolite identification with Input Output Kernel Regression. Bioinformatics 32(12): (2016) M. Sangnier, O. Fercoq, F. d Alché-Buc, Joint quantile regression in vector-valued RKHSs, NIPS 2016: , (2016) R. Brault, M. Heinonen, F. d Alché-Buc, Random Fourier Features for Operator-valued Kernels, ACML 2016, (2016) M. Djerrab, A. Garcia, M. Sangnier, F. d Alché-Buc, Output Fisher Embedding Regression, Machine Learning Journal (in revision), (2017) M. Sangnier, O. Fercoq, F. d Alché-Buc, Data sparse nonparametric regression with ϵ-insensitive losses, ACML 2017, (2017) 19
23 Collaborations are welcome One-year postdoc position (from January) Master internship positions (April-september) co-supervising PhD thesis Contact: 20
Scaling up Vector Autoregressive Models with Operator Random Fourier Features
Scaling up Vector Autoregressive Models with Operator Random Fourier Features Romain Brault, Néhémy Lim, Florence d Alché-Buc August, 6 Abstract A nonparametric approach to Vector Autoregressive Modeling
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationData sparse nonparametric regression with ɛ-insensitive losses
Proceedings of Machine Learning Research 77:92 207, 207 ACML 207 Data sparse nonparametric regression with ɛ-insensitive losses Maxime Sangnier maxime.sangnier@upmc.fr Sorbonne Universités, UPMC Univ Paris
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels
1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:
More information10-701/ Recitation : Kernels
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMachine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari
Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets
More informationFast metabolite identification with Input Output Kernel Regression
Bioinformatics doi.10.1093/bioinformatics/xxxxxx Advance Access Publication Date: Day Month Year Manuscript Category Fast metabolite identification with Input Output Kernel Regression Céline Brouard 1,2,,
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationTUM 2016 Class 3 Large scale learning by regularization
TUM 2016 Class 3 Large scale learning by regularization Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Learning problem Solve min w E(w), E(w) = dρ(x, y)l(w x, y) given (x 1, y 1 ),..., (x n, y n ) Beyond
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationKernels for Automatic Pattern Discovery and Extrapolation
Kernels for Automatic Pattern Discovery and Extrapolation Andrew Gordon Wilson agw38@cam.ac.uk mlg.eng.cam.ac.uk/andrew University of Cambridge Joint work with Ryan Adams (Harvard) 1 / 21 Pattern Recognition
More informationRobust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationOperator-valued kernel-based vector autoregressive models for network inference
Mach Learn (2015) 99:489 513 DOI 10.1007/s10994-014-5479-3 Operator-valued kernel-based vector autoregressive models for network inference Néhémy Lim Florence d Alché-Buc Cédric Auliac George Michailidis
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSimple Optimization, Bigger Models, and Faster Learning. Niao He
Simple Optimization, Bigger Models, and Faster Learning Niao He Big Data Symposium, UIUC, 2016 Big Data, Big Picture Niao He (UIUC) 2/26 Big Data, Big Picture Niao He (UIUC) 3/26 Big Data, Big Picture
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationStein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm
Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm Qiang Liu and Dilin Wang NIPS 2016 Discussion by Yunchen Pu March 17, 2017 March 17, 2017 1 / 8 Introduction Let x R d
More informationMTTTS16 Learning from Multiple Sources
MTTTS16 Learning from Multiple Sources 5 ECTS credits Autumn 2018, University of Tampere Lecturer: Jaakko Peltonen Lecture 6: Multitask learning with kernel methods and nonparametric models On this lecture:
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationAn Adaptive Test of Independence with Analytic Kernel Embeddings
An Adaptive Test of Independence with Analytic Kernel Embeddings Wittawat Jitkrittum 1 Zoltán Szabó 2 Arthur Gretton 1 1 Gatsby Unit, University College London 2 CMAP, École Polytechnique ICML 2017, Sydney
More informationWeak Signals: machine-learning meets extreme value theory
Weak Signals: machine-learning meets extreme value theory Stephan Clémençon Télécom ParisTech, LTCI, Université Paris Saclay machinelearningforbigdata.telecom-paristech.fr 2017-10-12, Workshop Big Data
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationLinear Regression 1 / 25. Karl Stratos. June 18, 2018
Linear Regression Karl Stratos June 18, 2018 1 / 25 The Regression Problem Problem. Find a desired input-output mapping f : X R where the output is a real value. x = = y = 0.1 How much should I turn my
More informationMemory Efficient Kernel Approximation
Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationStatistical learning on graphs
Statistical learning on graphs Jean-Philippe Vert Jean-Philippe.Vert@ensmp.fr ParisTech, Ecole des Mines de Paris Institut Curie INSERM U900 Seminar of probabilities, Institut Joseph Fourier, Grenoble,
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 18, 2016 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationFastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)
Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu
More informationMulti-View Point Cloud Kernels for Semi-Supervised Learning
Multi-View Point Cloud Kernels for Semi-Supervised Learning David S. Rosenberg, Vikas Sindhwani, Peter L. Bartlett, Partha Niyogi May 29, 2009 Scope In semi-supervised learning (SSL), we learn a predictive
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationFoundations of Deep Learning from a Kernel Point of View. Julien Mairal
Foundations of Deep Learning from a Kernel Point of View Julien Mairal Inria Grenoble Berlin, CoSIP winter school, 2017 Julien Mairal Foundations of DL from a kernel point of view 1/124 1 Several Paradigms
More informationOn non-parametric robust quantile regression by support vector machines
On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationLearning parameters in ODEs
Learning parameters in ODEs Application to biological networks Florence d Alché-Buc Joint work with Minh Quach and Nicolas Brunel IBISC FRE 3190 CNRS, Université d Évry-Val d Essonne, France /14 Florence
More informationAdvances in kernel exponential families
Advances in kernel exponential families Arthur Gretton Gatsby Computational Neuroscience Unit, University College London NIPS, 2017 1/39 Outline Motivating application: Fast estimation of complex multivariate
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationStatistical Learning Reading Assignments
Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationLecture 18: Multiclass Support Vector Machines
Fall, 2017 Outlines Overview of Multiclass Learning Traditional Methods for Multiclass Problems One-vs-rest approaches Pairwise approaches Recent development for Multiclass Problems Simultaneous Classification
More informationMulti-view Metric Learning in Vector-valued Kernel Spaces
Riikka Huusari Hachem Kadri Cécile Capponi Aix Marseille Univ, Université de Toulon, CNRS, LIS, Marseille, France Abstract We consider the problem of metric learning for multi-view data and present a novel
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationLess is More: Computational Regularization by Subsampling
Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationLearning Good Representations for Multiple Related Tasks
Learning Good Representations for Multiple Related Tasks Massimiliano Pontil Computer Science Dept UCL and ENSAE-CREST March 25, 2015 1 / 35 Plan Problem formulation and examples Analysis of multitask
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationKernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444
Kernel Methods Jean-Philippe Vert Jean-Philippe.Vert@mines.org Last update: Jan 2015 Jean-Philippe Vert (Mines ParisTech) 1 / 444 What we know how to solve Jean-Philippe Vert (Mines ParisTech) 2 / 444
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationStability of Multi-Task Kernel Regression Algorithms
JMLR: Workshop and Conference Proceedings 29:1 16, 2013 ACML 2013 Stability of Multi-Task Kernel Regression Algorithms Julien Audiffren Hachem Kadri Aix-Marseille Université, CNRS, LIF UMR 7279, 13000,
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationAn Introduction to Kernel Methods 1
An Introduction to Kernel Methods 1 Yuri Kalnishkan Technical Report CLRC TR 09 01 May 2009 Department of Computer Science Egham, Surrey TW20 0EX, England 1 This paper has been written for wiki project
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Oct 27, 2015 Outline One versus all/one versus one Ranking loss for multiclass/multilabel classification Scaling to millions of labels Multiclass
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationTowards Deep Kernel Machines
Towards Deep Kernel Machines Julien Mairal Inria, Grenoble Prague, April, 2017 Julien Mairal Towards deep kernel machines 1/51 Part I: Scientific Context Julien Mairal Towards deep kernel machines 2/51
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationCLOSE-TO-CLEAN REGULARIZATION RELATES
Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationPREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou
PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS Chenlin Wu Yuhan Lou Background Smart grid: increasing the contribution of renewable in grid energy Solar generation: intermittent and nondispatchable
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationLecture 5: GPs and Streaming regression
Lecture 5: GPs and Streaming regression Gaussian Processes Information gain Confidence intervals COMP-652 and ECSE-608, Lecture 5 - September 19, 2017 1 Recall: Non-parametric regression Input space X
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information