On non-parametric robust quantile regression by support vector machines
|
|
- Junior Hutchinson
- 5 years ago
- Views:
Transcription
1 On non-parametric robust quantile regression by support vector machines Andreas Christmann joint work with: Ingo Steinwart (Los Alamos National Lab) Arnout Van Messem (Vrije Universiteit Brussel) ERCIM 2008, Neuchâtel, Switzerland, June 19-21, 2008 On non-parametric robust quantile regression by support vector machines 1
2 Example linear QR logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 2
3 Example SVM for quantile regression logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 On non-parametric robust quantile regression by support vector machines 3
4 Nonparametric Quantile Regression Assumptions: X = complete measurable space, e.g. X = R d Y R closed, Y = D = D n = (z 1,..., z n ), z i := (x i, y i ) Z := X Y, D := 1 n n i=1 δ (x i,y i ) (X i, Y i ) i.i.d. P M 1, P (totally) unknown Goal: estimate quantile function f α,p(x) = inf {q Y; P(Y q X = x) α}, x X Assumption: f α,p unique If f α,p is linear: Koenker & Bassett 78, Koenker 05 On non-parametric robust quantile regression by support vector machines 4
5 Loss Function and Risk Pinball loss function: { (α 1)(y t), (y t) < 0 L α (y, t) := α(y t), (y t) 0 alpha=0.1 alpha=0.75 Loss of y t Loss of y t Risk: R Lα,P(f) := E P L α ( Y, f(x) ), y t y t P M1 On non-parametric robust quantile regression by support vector machines 5
6 Support Vector Machine approach Schölkopf et al 00 and Takeuchi et al. 06 proposed: f D,λ = arg min f H 1 n n ( L α Yi, f(x i ) ) + λ f 2 H, i=1 where H is a reproducing kernel Hilbert space (RKHS) and λ > 0. S(P) = f P,λ = arg min f H E PL α ( Y, f(x) ) + λ f 2 H, P M 1, On non-parametric robust quantile regression by support vector machines 6
7 Kernel k is kernel: k : X X K, if K-Hilbert space H and Φ : X H such that k(x, x ) = Φ(x ), Φ(x) H, x, x X H is reproducing kernel Hilbert space (RKHS): x X : δ x (f) := f(x), f H, is continuous reproducing kernel: k(, x) H, x X, and f(x) = f, k(, x) H, f H, x X Φ is canonical feature map: Φ(x) = k(, x), x X bounded: k := sup x X k(x, x) < GRBF: k(x, x ) = e γ x x 2 2, γ > 0 On non-parametric robust quantile regression by support vector machines 7
8 Consistency f D,λn is risk consistent, if R Lα,P(f D,λn ) P R L α,p := inf f:x R measurable R L α,p(f) (2.1) R L α,p,h := inf f H R L α,p(f) = R L α,p (2.2) Large RKHS (CHR & Steinwart 08) Let H be the RKHS of a bounded kernel k : X X R and µ be a distribution on X. Then the following statements are equivalent: 1 H is dense in L 1 (µ). 2 (2.2) holds for all P M 1 with P X = µ and E P Y <. On non-parametric robust quantile regression by support vector machines 8
9 Bounded L-risk (CHR & Steinwart 08) Assume: P M 1 with E P Y <, f : X R with f L 1 (P). Then: R Lα,P(f) <. On non-parametric robust quantile regression by support vector machines 9
10 Existence and uniqueness (CHR & Steinwart 08) Assume: P M 1 with E P Y <, H RKHS of bounded kernel k, λ > 0. Then: 1 exists unique minimizer S(P) = f P,λ H 2 f P,λ H R Lα,P(0)/λ. On non-parametric robust quantile regression by support vector machines 10
11 Consistency (Steinwart & CHR 08a) Assume: H separable RKHS of a bounded measurable kernel k such that H is dense in L 1 (µ) for all distributions µ on X (λ n ) n N with λ n 0. If λ 2 nn, then for all P M 1, E P Y < : 1 R Lα,P(f D,λn ) P R L α,p 2 f D,λn f α,p L 0 (P X ) P 0 If δ > 0 and λ 2+δ n n, then for all P M 1, E P Y < : 3 R Lα,P(f D,λn ) a.s. R L α,p a.s. 0 4 f D,λn f α,p L 0 (P X ) On non-parametric robust quantile regression by support vector machines 11
12 Rate of convergence 1 No-free-lunch theorem: no uniform rate of convergence! [Devroye 82] 2 Under many assumptions we have [Steinwart & CHR 08b] f D,λn f α,p L (P X ) c n 1/3 On non-parametric robust quantile regression by support vector machines 12
13 Robustness What is the impact on S(P) or S(P n ) due to violations from (X i, Y i ) i.i.d. P, P M 1 unknown? On non-parametric robust quantile regression by support vector machines 13
14 Bias, maxbias, sensitivity curve (CHR & Steinwart 07) Assume: E P Y < and E P Y <, (can be weakend) H RKHS of continuous and bounded kernel k, λ > 0, ε > 0. Then we have with c := λ 1 k max{α, 1 α} 1 Bias: f (1 ε)p+ε P,λ f P,λ H c P P tv ε 2 Maxbias: sup Q Nε(P) f Q,λ f P,λ H 2 c ε 3 Sensitivity curve: SC n (z; S n ) H 2c, z X Y On non-parametric robust quantile regression by support vector machines 14
15 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 15 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε(q P) 0 S ( P + ε(q P) ) S(P) BIF(Q; S, P) H ε Q P = 0.
16 On non-parametric Goal: Bounded robust quantile regression BIF by support vector machines 16 Bouligand Influence Function g : X Z is Bouligand differentiable at x 0 X, if a positive homogeneous function B g(x 0 ) : X Z exists with g(x0 + h) g(x 0 ) B g(x 0 )(h) Z lim = 0. h 0 h X Def. (CHR & Van Messem 08) The Bouligand influence function (BIF) of a function S : M 1 H for a distribution P in the direction of a distribution Q P is the special B-derivative (if it exists) lim ε 0 S ( (1 ε)p + εq ) S(P) BIF(Q; S, P) H ε = 0.
17 Bounded BIF (CHR & Van Messem 08) Assume: k bounded and measurable E P Y <, E Q Y < (can be weakend) δ > 0 positive constants ξ P, ξ Q, c P, and c Q such that t R with t f P,λ (x) δ k the following inequalities hold a [0, 2δ k ] and x X : Then: P ( Y [t, t+a] x ) c P a 1+ξ P, Q ( Y [t, t+a] x ) c Q a 1+ξ Q. BIF(Q; S, P) with S(P) := f P,λ 1 exists ( ( 2 1 equals 2λ X P Y fp,λ (x) ) ) x α Φ(x) dpx (x) ( ( 1 2λ X Q Y fp,λ (x) ) ) x α Φ(x) dqx (x) 3 bounded. On non-parametric robust quantile regression by support vector machines 17
18 Conclusions Non-parametric quantile regression by SVMs 1 exists a unique solution 2 consistent L α -risk of f D,λn converges to Bayes risk (in P) f D,λn converges to true quantile function (in P or a.s.) rate of convergence 3 robust if kernel bounded Bouligand-IF, sensitivity curve, maxbias are bounded 4 computable for large high-dimensional data sets On non-parametric robust quantile regression by support vector machines 18
19 References CHR & Steinwart (2007). Bernoulli. CHR & Steinwart (2008). Appl. Stochastic Models Bus. Ind. CHR & Van Messem (2008). J. Mach. Learn. Res. Koenker (2005). Quantile Regression. Cambridge U.P. Koenker & Bassett (1978). Econometrica Schölkopf & Smola (2002). Learning with Kernels. MIT Press. Steinwart & CHR (2008). Support Vector Machines. Springer. Steinwart & Chr (2008b). Advances in Neural Information Processing Systems, 20, Takeuchi, Le, Sears, Smola (2006). J. Mach. Learn. Res. On non-parametric robust quantile regression by support vector machines 19
20 Appendix: Example nonparametric QRSS logratio range LIDAR data set. Quantiles α = 0.95, 0.50, 0.05 [R-package quantreg, rqss(logratio qss(range,constraint= N,lambda=25), tau=0.5)] On non-parametric robust quantile regression by support vector machines 20
Robust Support Vector Machines for Probability Distributions
Robust Support Vector Machines for Probability Distributions Andreas Christmann joint work with Ingo Steinwart (Los Alamos National Lab) ICORS 2008, Antalya, Turkey, September 8-12, 2008 Andreas Christmann,
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More informationConsistency and robustness of kernel-based regression in convex risk minimization
Bernoulli 13(3), 2007, 799 819 DOI: 10.3150/07-BEJ5102 Consistency and robustness of kernel-based regression in convex risk minimization ANDREAS CHRISTMANN 1 and INGO STEINWART 2 1 Department of Mathematics,
More informationConsistency and robustness of kernel-based regression in convex risk minimization
Bernoulli 13(3), 2007, 799 819 DOI: 10.3150/07-BEJ5102 arxiv:0709.0626v1 [math.st] 5 Sep 2007 Consistency and robustness of kernel-based regression in convex risk minimization ANDREAS CHRISTMANN 1 and
More informationStatistical Optimality of Stochastic Gradient Descent through Multiple Passes
Statistical Optimality of Stochastic Gradient Descent through Multiple Passes Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Loucas Pillaud-Vivien
More informationStrictly Positive Definite Functions on a Real Inner Product Space
Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite
More informationRegML 2018 Class 2 Tikhonov regularization and kernels
RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,
More informationOslo Class 2 Tikhonov regularization and kernels
RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationConvergence Rates of Kernel Quadrature Rules
Convergence Rates of Kernel Quadrature Rules Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE NIPS workshop on probabilistic integration - Dec. 2015 Outline Introduction
More informationGaps in Support Vector Optimization
Gaps in Support Vector Optimization Nikolas List 1 (student author), Don Hush 2, Clint Scovel 2, Ingo Steinwart 2 1 Lehrstuhl Mathematik und Informatik, Ruhr-University Bochum, Germany nlist@lmi.rub.de
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationMethoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft
Methoden des maschinellen Lernens für Daten aus der Versicherungswirtschaft Universität Dortmund, Fachbereich Statistik christmann@statistik.uni-dortmund.de DoMuS Kolloquium und Vollversammlung, Dortmund,
More informationTUM 2016 Class 1 Statistical learning theory
TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All
More informationQuantile Processes for Semi and Nonparametric Regression
Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response
More informationStrictly positive definite functions on a real inner product space
Advances in Computational Mathematics 20: 263 271, 2004. 2004 Kluwer Academic Publishers. Printed in the Netherlands. Strictly positive definite functions on a real inner product space Allan Pinkus Department
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationMATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels
1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia
More informationSparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results
Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationAn Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM
An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department
More informationHilbert Space Embedding of Probability Measures
Lecture 2 Hilbert Space Embedding of Probability Measures Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 2017 Recap of Lecture
More informationDoes Modeling Lead to More Accurate Classification?
Does Modeling Lead to More Accurate Classification? A Comparison of the Efficiency of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationBINARY CLASSIFICATION
BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the
More informationMathematical Methods for Data Analysis
Mathematical Methods for Data Analysis Massimiliano Pontil Istituto Italiano di Tecnologia and Department of Computer Science University College London Massimiliano Pontil Mathematical Methods for Data
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationEfficient Complex Output Prediction
Efficient Complex Output Prediction Florence d Alché-Buc Joint work with Romain Brault, Alex Lambert, Maxime Sangnier October 12, 2017 LTCI, Télécom ParisTech, Institut-Mines Télécom, Université Paris-Saclay
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationStochastic optimization in Hilbert spaces
Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert
More informationarxiv: v2 [stat.ml] 10 Jul 2017
Mean Absolute Percentage Error for Regression Models arxiv:1605.02541v2 [stat.ml] 10 Jul 2017 5 Arnaud de Myttenaere a,c, Boris Golden a, Bénédicte Le Grand b, Fabrice Rossi c, a Viadeo, 30 rue de la Victoire,
More informationA Bahadur Representation of the Linear Support Vector Machine
A Bahadur Representation of the Linear Support Vector Machine Yoonkyung Lee Department of Statistics The Ohio State University October 7, 2008 Data Mining and Statistical Learning Study Group Outline Support
More informationRecovering Distributions from Gaussian RKHS Embeddings
Motonobu Kanagawa Graduate University for Advanced Studies kanagawa@ism.ac.jp Kenji Fukumizu Institute of Statistical Mathematics fukumizu@ism.ac.jp Abstract Recent advances of kernel methods have yielded
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationStatistical Convergence of Kernel CCA
Statistical Convergence of Kernel CCA Kenji Fukumizu Institute of Statistical Mathematics Tokyo 106-8569 Japan fukumizu@ism.ac.jp Francis R. Bach Centre de Morphologie Mathematique Ecole des Mines de Paris,
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationClassification and statistical machine learning
http://www.di.ens.fr/~arlot/ 1 Cnrs 2 École Normale Supérieure (Paris), DI/ENS, Équipe Sierra CEMRACS 2013, July 26th, 2013 1/53 Outline 1 Introduction 2 Goals 3 Overfitting 4 Examples 5 Key issues 2/53
More informationInterval-valued regression and classification models in the framework of machine learning
7th International Symposium on Imprecise Probability: Theories and Applications, Innsbruck, Austria, 2011 Interval-valued regression and classification models in the framework of machine learning Lev V.
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationSupport Vector Method for Multivariate Density Estimation
Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationarxiv: v1 [stat.ml] 19 Mar 2017
Universal Consistency and Robustness of Localized Support Vector Machines Florian Dumpert Department of Mathematics, University of Bayreuth, Germany ariv:1703.06528v1 [stat.ml] 19 Mar 2017 Abstract The
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationCS325 Artificial Intelligence Chs. 18 & 4 Supervised Machine Learning (cont)
CS325 Artificial Intelligence Cengiz Spring 2013 Model Complexity in Learning f(x) x Model Complexity in Learning f(x) x Let s start with the linear case... Linear Regression Linear Regression price =
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1- PSI - FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insa-rouen.fr 2- Statistical Machine
More informationKernel methods for Bayesian inference
Kernel methods for Bayesian inference Arthur Gretton Gatsby Computational Neuroscience Unit Lancaster, Nov. 2014 Motivating Example: Bayesian inference without a model 3600 downsampled frames of 20 20
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationHilbert Space Methods in Learning
Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem
More informationOnline Gradient Descent Learning Algorithms
DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline
More informationOptimal Rates for Regularized Least Squares Regression
Optimal Rates for Regularized Least Suares Regression Ingo Steinwart, Don Hush, and Clint Scovel Modeling, Algorithms and Informatics Group, CCS-3 Los Alamos National Laboratory {ingo,dhush,jcs}@lanl.gov
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationMaximum Mean Discrepancy
Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia
More informationDistribution Regression
Zoltán Szabó (École Polytechnique) Joint work with Bharath K. Sriperumbudur (Department of Statistics, PSU), Barnabás Póczos (ML Department, CMU), Arthur Gretton (Gatsby Unit, UCL) Dagstuhl Seminar 16481
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationSMO Algorithms for Support Vector Machines without Bias Term
Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT
Submitted to the Annals of Statistics ASSESSING ROBUSTNESS OF CLASSIFICATION USING ANGULAR BREAKDOWN POINT By Junlong Zhao, Guan Yu and Yufeng Liu, Beijing Normal University, China State University of
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationSupport Vector Machine Regression for Volatile Stock Market Prediction
Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationINDEPENDENCE MEASURES
INDEPENDENCE MEASURES Beatriz Bueno Larraz Máster en Investigación e Innovación en Tecnologías de la Información y las Comunicaciones. Escuela Politécnica Superior. Máster en Matemáticas y Aplicaciones.
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More information1. Mathematical Foundations of Machine Learning
1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of
More informationQuantiles symmetry. Keywords: Quantile function, distribution function, symmetry, equivariance
1 Quantiles symmetry Reza Hosseini, University of British Columbia 333-6356 Agricultural Road, Vancouver, BC, Canada, V6T1Z2 reza1317@gmail.com arxiv:1004.0540v1 [math.st] 4 Apr 2010 1 Abstract This paper
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationSemi-Supervised Learning through Principal Directions Estimation
Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationCurve learning. p.1/35
Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationA Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes
A Note on Extending Generalization Bounds for Binary Large-Margin Classifiers to Multiple Classes Ürün Dogan 1 Tobias Glasmachers 2 and Christian Igel 3 1 Institut für Mathematik Universität Potsdam Germany
More informationOn-line Support Vector Machines and Optimization Strategies
On-line Support Vector Machines and Optimization Strategies J.M. Górriz, C.G. Puntonet, M. Salmerón, R. Martin-Clemente and S. Hornillo-Mellado and J. Ortega E.S.I., Informática, Universidad de Granada
More information3. Some tools for the analysis of sequential strategies based on a Gaussian process prior
3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:
More informationAn introduction to Support Vector Machines
1 An introduction to Support Vector Machines Giorgio Valentini DSI - Dipartimento di Scienze dell Informazione Università degli Studi di Milano e-mail: valenti@dsi.unimi.it 2 Outline Linear classifiers
More informationAn Introduction to Kernel Methods 1
An Introduction to Kernel Methods 1 Yuri Kalnishkan Technical Report CLRC TR 09 01 May 2009 Department of Computer Science Egham, Surrey TW20 0EX, England 1 This paper has been written for wiki project
More informationReferences for online kernel methods
References for online kernel methods W. Liu, J. Principe, S. Haykin Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, 2010. W. Liu, P. Pokharel, J. Principe. The kernel least mean square
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationSparseness of Support Vector Machines
Journal of Machine Learning Research 4 2003) 07-05 Submitted 0/03; Published /03 Sparseness of Support Vector Machines Ingo Steinwart Modeling, Algorithms, and Informatics Group, CCS-3 Mail Stop B256 Los
More informationIteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression
Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Songfeng Zheng Department of Mathematics Missouri State University Springfield, MO 65897 SongfengZheng@MissouriState.edu
More informationHomework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018
Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Topics: consistent estimators; sub-σ-fields and partial observations; Doob s theorem about sub-σ-field measurability;
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationν =.1 a max. of 10% of training set can be margin errors ν =.8 a max. of 80% of training can be margin errors
p.1/1 ν-svms In the traditional softmargin classification SVM formulation we have a penalty constant C such that 1 C size of margin. Furthermore, there is no a priori guidance as to what C should be set
More informationStatistical Properties and Adaptive Tuning of Support Vector Machines
Machine Learning, 48, 115 136, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Statistical Properties and Adaptive Tuning of Support Vector Machines YI LIN yilin@stat.wisc.edu
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationNonparametric Quantile Estimation
Nonparametric Quantile Estimation NICTA Technical Report 1000005T.1 Statistical Machine Learning Program, Canberra June 2005 Quoc V. Le Tim Sears Alexander J. Smola National ICTA Australia and National
More informationMetric Embedding for Kernel Classification Rules
Metric Embedding for Kernel Classification Rules Bharath K. Sriperumbudur University of California, San Diego (Joint work with Omer Lang & Gert Lanckriet) Bharath K. Sriperumbudur (UCSD) Metric Embedding
More information