Learnability of Gaussians with flexible variances

Size: px
Start display at page:

Download "Learnability of Gaussians with flexible variances"

Transcription

1 Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

2 Least-square Regularized Regression Learn f : X Y fro rando saples z = {(x i, y i )} i=1 Take X to be a copact subset of R n and Y = R. y f(x) Due to noises or other uncertainty, we assue a (unknown) probability easure ρ on Z = X Y governs the sapling. arginal distribution ρ X on X: {x i } i=1 drawn according to ρ X conditional distribution ρ( x) at x X Learning the regression function: f ρ (x) = Y ydρ(y x) y i f ρ (x i ) First Previous Next Last Back Close Quit 1

3 Learning with a Fixed Gaussian f z,λ,σ := arg in f H K σ 1 i=1 (f(x i ) y i ) 2 + λ f 2 K σ, (1) where λ = λ() > 0, and K σ (x, y) = e x y 2 2σ 2 kernel on X is a Gaussian Reproducing Kernel Hilbert Space (RKHS) H Kσ copletion of span{(k σ ) t := K σ (t, ) : t X} with the inner product, Kσ satisfying (K σ ) x, (K σ ) y K = K σ (x, y). First Previous Next Last Back Close Quit 2

4 Theore 1 (Sale-Zhou, Constr. Approx. 2007) Assue y M and that f ρ = X K σ(x, y)g(y)dρ X (y) for soe g L 2 ρ X. For any 0 < δ < 1, with confidence 1 δ, f z,λ,σ f ρ L 2 2 log ( 4/δ )( 12M ) 2/3 1/3 ( 1 ) 1/3 g ρx L 2 ρ X where λ = λ() = log ( 4/δ )( 12M/ g L 2 ρx ) 2/3 ( 1/ ) 1/3. In Theore 1, f ρ C First Previous Next Last Back Close Quit 3

5 RKHS H Kσ generated by a Gaussian kernel on X H Kσ = H Kσ (R n ) X where H Kσ (R n ) is the RKHS generated by K σ as a Mercer kernel on R n : H Kσ (R n ) = { f L 2 (R n ) : f HK σ (Rn ) < } where f HK σ (Rn ) = Rn ˆf(ξ) 2 ( 2πσ) n e σ2 ξ 2 2 dξ 1/2. Thus H Kσ (R n ) C (R n ) Steinwart First Previous Next Last Back Close Quit 4

6 If X is a doain with piecewise sooth boundary and dρ X (x) c 0 dx for soe c 0 > 0, then for any β > 0, } D σ (λ) := { f f ρ 2L 2ρX + λ f 2 K σ = O(λ β ) inf f H K σ iplies f ρ C (X). Note f f ρ 2 L 2 ρ X = E(f) E(f ρ ) where E(f) := Z (f(x) y)2 dρ. Denote E z (f) = 1 i=1 (f(x i ) y i ) 2 E(f). Then f z,λ,σ = arg in f H K σ { Ez (f) + λ f 2 K σ }. First Previous Next Last Back Close Quit 5

7 If we define f λ,σ = arg in f H K σ { E(f) + λ f 2 Kσ }, then f z,λ,σ f λ,σ and the error can be estiated in ters of λ by the theory of unifor convergence over the copact function set B M/ λ := {f H Kσ : f Kσ M/ λ} since f z,λ,σ B M/ λ. But f λ,σ f ρ 2 L 2 ρ X = O(λ β ) for any β > 0 iplies f ρ C (X). So the learning ability of a single Gaussian is weak. One ay choose less sooth kernel, but we would like radial basis kernels for anifold learning. One way to increase the learning ability of Gaussian kernels: let σ depend on and σ = σ() 0 as. Steinwart-Scovel, Xiang-Zhou,... First Previous Next Last Back Close Quit 6

8 Another way: allow all possible variances σ (0, ) Regularization Schees with Flexible Gaussians: Zhou, Wu-Ying-Zhou, Ying-Zhou, Micchelli-Pontil-Wu-Zhou,... f z,λ := arg in 0<σ< in f H K σ 1 i=1 (f(x i ) y i ) 2 + λ f 2 K σ Theore 2 (Ying-Zhou, J. Mach. Learning Res. 2007) Let ρ X be the Lebesgue easure on a doain X in R n with inially sooth boundary. If f ρ H s (X) for soe s 2 and λ = 2s+n 4(4s+n), then we have ( E z Z fz,λ f ρ 2 ) L 2 = O ( s ) 2(4s+n) log.. First Previous Next Last Back Close Quit 7

9 Major difficulty: is the function set H = 0<σ< { f HKσ : f Kσ R } with R > 0 learnable? That is, is this function set a unifor Glivenko-Cantelli class? Its closure is not a copact subset of C(X). Theory of Unifor Convergence for sup f H E z (f) E(f). Given a bounded set H of functions on X, when do we have li sup l ρ Prob sup l sup f H 1 i=1 f(x i ) f(x)dρ X > ɛ = 0, ɛ > 0? Such a set is called a unifor Glivenko-Cantelli (UGC) class. Characterizations: Vapnik-Chervonenkis, and Alon, Ben-David, Cesa-Bianchi, Haussler (1997) First Previous Next Last Back Close Quit 8

10 Our quantitative estiates: If V : Y R R + is convex with respect to the second variable, M = V (y, 0) L ρ (Z) <, and C R = sup{ax{ V (y, t), V + (y, t) } : y Y, t R} <, then we have { supf H 1 i=1 V (y i, f(x i )) Z V (y, f(x))dρ } E z Z C C R R log 1/4 + 2M, where C is a constant depending on n. Ideas: reducing the estiates for H to a uch saller subset F = {(K σ ) x : x X, 0 < σ < }, then bounding epirical covering nubers. The UGC property follows fro the characterization of Dudley-Giné-Zinn. First Previous Next Last Back Close Quit 9

11 Iprove the learning rates when X is a anifold of diension d with d uch saller than the diension n of the underlying Euclidean space. Approxiation by Gaussians on Rieannian anifolds Let X be a d-diensional connected copact C subanifold of R n without boundary. The approxiation schee is given by a faily of linear operators {I σ : C(X) C(X)} σ>0 as I σ (f)(x) = = 1 ( 2πσ) d 1 ( 2πσ) d X K σ(x, y)f(y)dv (y) X exp { x y 2 2σ 2 where V is the Rieannian volue easure of X. } f(y)dv (y), x X, First Previous Next Last Back Close Quit 10

12 Theore 3 (Ye-Zhou, Adv. Coput. Math. 2007) If f ρ Lip(s) with 0 < s 1, then I σ (f ρ ) f ρ C(X) C X f ρ Lip(s) σ s σ > 0, (2) where C X is a positive constant independent of f ρ or σ. taking λ = ( log 2 ) s+d 8s+4d, we have E z Z { f z,λ f ρ 2L 2ρX } = O (( log 2 ) s 8s+4d ). By s The index 8s+4d in Theore 3 is saller than s in Theore 2 when the anifold diension d is uch saller than 2(4s+n) n. First Previous Next Last Back Close Quit 11

13 Classification by Gaussians on Rieannian anifolds Let φ(t) = ax{1 t, 0} be the hinge loss for the support vector achine classification. Define f z,λ = arg in σ (0,+ ) in f H K σ 1 i=1 φ (y i f(x i )) + λ f 2 K σ. By using I σ : L p (X) L p (X), we obtain learning rates for binary classification to learn the Bayes rule: f c (x) = { 1, if ρ(y = 1 x) ρ(y = 1 x) 1, if ρ(y = 1 x) < ρ(y = 1 x). Here Y = {1, 1} represents two classes. The isclassification error is defined as R(f) : Prob{y f(x)} R(f c ) for any f : X Y. First Previous Next Last Back Close Quit 12

14 The Sobolev space H k p (X) is the copletion of C (X) with respect to the nor f H k p (X) = k j=0 ( X j f p dv ) 1/p, where j f denotes the jth covariant derivative of f. Theore 4 If f c lies in the interpolation space (L 1 (X), H1 2(X)) θ ( ) 2θ+d 12θ+2d, we have for soe 0 < θ 1, then by taking λ = E z Z { R(sgn(fz,λ )) R(f c ) } C where C is a constant independent of. log 2 ( log 2 ) θ 6θ+d, First Previous Next Last Back Close Quit 13

15 Ongoing topics: variable selection diensionality reduction graph Laplacian diffusion ap First Previous Next Last Back Close Quit 14

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Multi-kernel Regularized Classifiers

Multi-kernel Regularized Classifiers Multi-kernel Regularized Classifiers Qiang Wu, Yiing Ying, and Ding-Xuan Zhou Departent of Matheatics, City University of Hong Kong Kowloon, Hong Kong, CHINA wu.qiang@student.cityu.edu.hk, yying@cityu.edu.hk,

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming

SVM Soft Margin Classifiers: Linear Programming versus Quadratic Programming SVM Soft Margin Classifiers: Linear Prograing versus Quadratic Prograing Qiang Wu wu.qiang@student.cityu.edu.hk Ding-Xuan Zhou azhou@cityu.edu.hk Departent of Matheatics, City University of Hong Kong,

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Journal of Mathematical Analysis and Applications

Journal of Mathematical Analysis and Applications J Math Anal Appl 386 202 205 22 Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Learning Theory of Randomized Kaczmarz Algorithm

Learning Theory of Randomized Kaczmarz Algorithm Journal of Machine Learning Research 16 015 3341-3365 Submitted 6/14; Revised 4/15; Published 1/15 Junhong Lin Ding-Xuan Zhou Department of Mathematics City University of Hong Kong 83 Tat Chee Avenue Kowloon,

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Manifold learning via Multi-Penalty Regularization

Manifold learning via Multi-Penalty Regularization Manifold learning via Multi-Penalty Regularization Abhishake Rastogi Departent of Matheatics Indian Institute of Technology Delhi New Delhi 006, India abhishekrastogi202@gail.co Abstract Manifold regularization

More information

Online Gradient Descent Learning Algorithms

Online Gradient Descent Learning Algorithms DISI, Genova, December 2006 Online Gradient Descent Learning Algorithms Yiming Ying (joint work with Massimiliano Pontil) Department of Computer Science, University College London Introduction Outline

More information

Geometry on Probability Spaces

Geometry on Probability Spaces Geometry on Probability Spaces Steve Smale Toyota Technological Institute at Chicago 427 East 60th Street, Chicago, IL 60637, USA E-mail: smale@math.berkeley.edu Ding-Xuan Zhou Department of Mathematics,

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

Universal algorithms for learning theory Part II : piecewise polynomial functions

Universal algorithms for learning theory Part II : piecewise polynomial functions Universal algoriths for learning theory Part II : piecewise polynoial functions Peter Binev, Albert Cohen, Wolfgang Dahen, and Ronald DeVore Deceber 6, 2005 Abstract This paper is concerned with estiating

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Theore A. Let n (n 4) be arcoplete inial iersed hypersurface in R n+. 4(n ) Then there exists a constant C (n) = (n 2)n S (n) > 0 such that if kak n d

Theore A. Let n (n 4) be arcoplete inial iersed hypersurface in R n+. 4(n ) Then there exists a constant C (n) = (n 2)n S (n) > 0 such that if kak n d Gap theores for inial subanifolds in R n+ Lei Ni Departent of atheatics Purdue University West Lafayette, IN 47907 99 atheatics Subject Classification 53C2 53C42 Introduction Let be a copact iersed inial

More information

Metric Entropy of Convex Hulls

Metric Entropy of Convex Hulls Metric Entropy of Convex Hulls Fuchang Gao University of Idaho Abstract Let T be a precopact subset of a Hilbert space. The etric entropy of the convex hull of T is estiated in ters of the etric entropy

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

On Learnability, Complexity and Stability

On Learnability, Complexity and Stability On Learnability, Complexity and Stability Silvia Villa, Lorenzo Rosasco and Tomaso Poggio 1 Introduction A key question in statistical learning is which hypotheses (function) spaces are learnable. Roughly

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

On the Homothety Conjecture

On the Homothety Conjecture On the Hoothety Conjecture Elisabeth M Werner Deping Ye Abstract Let K be a convex body in R n and δ > 0 The hoothety conjecture asks: Does K δ = ck iply that K is an ellipsoid? Here K δ is the convex

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Approximation by Piecewise Constants on Convex Partitions

Approximation by Piecewise Constants on Convex Partitions Aroxiation by Piecewise Constants on Convex Partitions Oleg Davydov Noveber 4, 2011 Abstract We show that the saturation order of iecewise constant aroxiation in L nor on convex artitions with N cells

More information

Are Loss Functions All the Same?

Are Loss Functions All the Same? Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

Online gradient descent learning algorithm

Online gradient descent learning algorithm Online gradient descent learning algorithm Yiming Ying and Massimiliano Pontil Department of Computer Science, University College London Gower Street, London, WCE 6BT, England, UK {y.ying, m.pontil}@cs.ucl.ac.uk

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

TUM 2016 Class 1 Statistical learning theory

TUM 2016 Class 1 Statistical learning theory TUM 2016 Class 1 Statistical learning theory Lorenzo Rosasco UNIGE-MIT-IIT July 25, 2016 Machine learning applications Texts Images Data: (x 1, y 1 ),..., (x n, y n ) Note: x i s huge dimensional! All

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Note on generating all subsets of a finite set with disjoint unions

Note on generating all subsets of a finite set with disjoint unions Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Derivative reproducing properties for kernel methods in learning theory

Derivative reproducing properties for kernel methods in learning theory Journal of Computational and Applied Mathematics 220 (2008) 456 463 www.elsevier.com/locate/cam Derivative reproducing properties for kernel methods in learning theory Ding-Xuan Zhou Department of Mathematics,

More information

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior 3. Some tools for the analysis of sequential strategies based on a Gaussian process prior E. Vazquez Computer experiments June 21-22, 2010, Paris 21 / 34 Function approximation with a Gaussian prior Aim:

More information

A NEW CHARACTERIZATION OF THE CR SPHERE AND THE SHARP EIGENVALUE ESTIMATE FOR THE KOHN LAPLACIAN. 1. Introduction

A NEW CHARACTERIZATION OF THE CR SPHERE AND THE SHARP EIGENVALUE ESTIMATE FOR THE KOHN LAPLACIAN. 1. Introduction A NEW CHARACTERIZATION OF THE CR SPHERE AND THE SHARP EIGENVALUE ESTIATE FOR THE KOHN LAPLACIAN SONG-YING LI, DUONG NGOC SON, AND XIAODONG WANG 1. Introduction Let, θ be a strictly pseudoconvex pseudo-heritian

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Complex Quadratic Optimization and Semidefinite Programming

Complex Quadratic Optimization and Semidefinite Programming Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Research in Area of Longevity of Sylphon Scraies

Research in Area of Longevity of Sylphon Scraies IOP Conference Series: Earth and Environental Science PAPER OPEN ACCESS Research in Area of Longevity of Sylphon Scraies To cite this article: Natalia Y Golovina and Svetlana Y Krivosheeva 2018 IOP Conf.

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes

Rademacher Averages and Phase Transitions in Glivenko Cantelli Classes IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 251 Rademacher Averages Phase Transitions in Glivenko Cantelli Classes Shahar Mendelson Abstract We introduce a new parameter which

More information

FAST DYNAMO ON THE REAL LINE

FAST DYNAMO ON THE REAL LINE FAST DYAMO O THE REAL LIE O. KOZLOVSKI & P. VYTOVA Abstract. In this paper we show that a piecewise expanding ap on the interval, extended to the real line by a non-expanding ap satisfying soe ild hypthesis

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Max-Product Shepard Approximation Operators

Max-Product Shepard Approximation Operators Max-Product Shepard Approxiation Operators Barnabás Bede 1, Hajie Nobuhara 2, János Fodor 3, Kaoru Hirota 2 1 Departent of Mechanical and Syste Engineering, Bánki Donát Faculty of Mechanical Engineering,

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Online Learning with Samples Drawn from Non-identical Distributions

Online Learning with Samples Drawn from Non-identical Distributions Journal of Machine Learning Research 10 (009) 873-898 Submitted 1/08; Revised 7/09; Published 1/09 Online Learning with Samples Drawn from Non-identical Distributions Ting Hu Ding-uan hou Department of

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

L p moments of random vectors via majorizing measures

L p moments of random vectors via majorizing measures L p oents of rando vectors via ajorizing easures Olivier Guédon, Mark Rudelson Abstract For a rando vector X in R n, we obtain bounds on the size of a saple, for which the epirical p-th oents of linear

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

LIOUVILLE THEOREM AND GRADIENT ESTIMATES FOR NONLINEAR ELLIPTIC EQUATIONS ON RIEMANNIAN MANIFOLDS

LIOUVILLE THEOREM AND GRADIENT ESTIMATES FOR NONLINEAR ELLIPTIC EQUATIONS ON RIEMANNIAN MANIFOLDS Electronic Journal of Differential Equations, Vol. 017 (017), No. 58, pp. 1 11. ISSN: 107-6691. URL: http://ejde.ath.txstate.edu or http://ejde.ath.unt.edu LIOUVILLE THEOREM AND GRADIENT ESTIMATES FOR

More information

Lecture 21 Nov 18, 2015

Lecture 21 Nov 18, 2015 CS 388R: Randoized Algoriths Fall 05 Prof. Eric Price Lecture Nov 8, 05 Scribe: Chad Voegele, Arun Sai Overview In the last class, we defined the ters cut sparsifier and spectral sparsifier and introduced

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL

PREPRINT 2006:17. Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL PREPRINT 2006:7 Inequalities of the Brunn-Minkowski Type for Gaussian Measures CHRISTER BORELL Departent of Matheatical Sciences Division of Matheatics CHALMERS UNIVERSITY OF TECHNOLOGY GÖTEBORG UNIVERSITY

More information

APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY

APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY NEW ZEALAND JOURNAL OF MATHEMATICS Volue 36 007, 11 APPROXIMATION BY GENERALIZED FABER SERIES IN BERGMAN SPACES ON INFINITE DOMAINS WITH A QUASICONFORMAL BOUNDARY Daniyal M. Israfilov and Yunus E. Yildirir

More information

RIGIDITY OF QUASI-EINSTEIN METRICS

RIGIDITY OF QUASI-EINSTEIN METRICS RIGIDITY OF QUASI-EINSTEIN METRICS JEFFREY CASE, YU-JEN SHU, AND GUOFANG WEI Abstract. We call a etric quasi-einstein if the -Bakry-Eery Ricci tensor is a constant ultiple of the etric tensor. This is

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

Variance Reduction. in Statistics, we deal with estimators or statistics all the time

Variance Reduction. in Statistics, we deal with estimators or statistics all the time Variance Reduction in Statistics, we deal with estiators or statistics all the tie perforance of an estiator is easured by MSE for biased estiators and by variance for unbiased ones hence it s useful to

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maximal Ideal Spaces

Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maximal Ideal Spaces Int. J. Nonlinear Anal. Appl. 5 204 No., 9-22 ISSN: 2008-6822 electronic http://www.ijnaa.senan.ac.ir Certain Subalgebras of Lipschitz Algebras of Infinitely Differentiable Functions and Their Maxial Ideal

More information

ISSN , Volume 32, Number 2

ISSN , Volume 32, Number 2 ISSN 076-476, Volume 3, Number This article was published in the above mentioned Springer issue The material, including all portions thereof, is protected by copyright; all rights are held exclusively

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

WEIGHTED PERSISTENT HOMOLOGY SUMS OF RANDOM ČECH COMPLEXES

WEIGHTED PERSISTENT HOMOLOGY SUMS OF RANDOM ČECH COMPLEXES WEIGHTED PERSISTENT HOMOLOGY SUMS OF RANDOM ČECH COMPLEXES BENJAMIN SCHWEINHART Abstract. We study the asyptotic behavior of rando variables of the for Eα i x,..., x n = d b α b,d PH ix,...,x n where {

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Learnability, Stability and Uniform Convergence

Learnability, Stability and Uniform Convergence Journal of Machine Learning Research (200) 2635-2670 ubitted 4/0; Revised 9/0; Published 0/0 Learnability, tability and Unifor Convergence hai halev-hwartz chool of Coputer cience and Engineering The Hebrew

More information

Research Article Robust ε-support Vector Regression

Research Article Robust ε-support Vector Regression Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Bayesian Models for Regularization in Optimization

Bayesian Models for Regularization in Optimization Bayesian Models for Regularization in Optimization Aleksandr Aravkin, UBC Bradley Bell, UW Alessandro Chiuso, Padova Michael Friedlander, UBC Gianluigi Pilloneto, Padova Jim Burke, UW MOPTA, Lehigh University,

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information