Regularization in Reproducing Kernel Banach Spaces

Similar documents
MATH 590: Meshfree Methods

Reproducing Kernel Hilbert Spaces

Online Gradient Descent Learning Algorithms

A graph based approach to semi-supervised learning

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

Reproducing Kernel Hilbert Spaces

10-701/ Recitation : Kernels

Kernels MIT Course Notes

CIS 520: Machine Learning Oct 09, Kernel Methods

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Semi-Nonparametric Inferences for Massive Data

Kernel Method: Data Analysis with Positive Definite Kernels

Statistical learning on graphs

MATH 590: Meshfree Methods

Reproducing Kernel Hilbert Spaces

MATH 590: Meshfree Methods

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

Karhunen-Loève decomposition of Gaussian measures on Banach spaces

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Data fitting by vector (V,f)-reproducing kernels

COS 424: Interacting with Data

Bernstein-Szegö Inequalities in Reproducing Kernel Hilbert Spaces ABSTRACT 1. INTRODUCTION

3. Some tools for the analysis of sequential strategies based on a Gaussian process prior

Reproducing Kernel Banach Spaces for Machine Learning

Compressive Inference

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Introduction to Machine Learning

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

MATH 590: Meshfree Methods

5.6 Nonparametric Logistic Regression

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Bayesian Aggregation for Extraordinarily Large Dataset

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Your first day at work MATH 806 (Fall 2015)

Lecture 10: Support Vector Machine and Large Margin Classifier

Approximate Kernel PCA with Random Features

Can we do statistical inference in a non-asymptotic way? 1

Mercer s Theorem, Feature Maps, and Smoothing

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

Kernel Learning via Random Fourier Representations

Multivariate Interpolation with Increasingly Flat Radial Basis Functions of Finite Smoothness

Kernels A Machine Learning Overview

Solving the 3D Laplace Equation by Meshless Collocation via Harmonic Kernels

Support Vector Machines

Structured Prediction

Approximation Theory on Manifolds

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Online gradient descent learning algorithm

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Hilbert Space Methods in Learning

Function Spaces. 1 Hilbert Spaces

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Analysis of Five Diagonal Reproducing Kernels

Support Vector Machines for Classification: A Statistical Portrait

Kernel Methods. Machine Learning A W VO

AN INTRODUCTION TO THE THEORY OF REPRODUCING KERNEL HILBERT SPACES

Notes on Regularized Least Squares Ryan M. Rifkin and Ross A. Lippert

Kernel Methods. Jean-Philippe Vert Last update: Jan Jean-Philippe Vert (Mines ParisTech) 1 / 444

Math 240 (Driver) Qual Exam (9/12/2017)

Support Vector Machine

Spatial Process Estimates as Smoothers: A Review

An Introduction to Kernel Methods 1

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Your first day at work MATH 806 (Fall 2015)

DIRECT ERROR BOUNDS FOR SYMMETRIC RBF COLLOCATION

Representer theorem and kernel examples

UNBOUNDED OPERATORS ON HILBERT SPACES. Let X and Y be normed linear spaces, and suppose A : X Y is a linear map.

THE GAUSSIAN RADON TRANSFORM AND MACHINE LEARNING

Reproducing Kernel Banach Spaces for Machine Learning

Scattered Data Approximation of Noisy Data via Iterated Moving Least Squares

Outline of Fourier Series: Math 201B

Stability of Kernel Based Interpolation

Kernels for Multi task Learning

Kernel methods and the exponential family

Nonparametric Regression. Badr Missaoui

Kernel-based Approximation. Methods using MATLAB. Gregory Fasshauer. Interdisciplinary Mathematical Sciences. Michael McCourt.

CS 7140: Advanced Machine Learning

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Statistical Convergence of Kernel CCA

TRANSLATION INVARIANCE OF FOCK SPACES

Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators

MATH 590: Meshfree Methods

1. SS-ANOVA Spaces on General Domains. 2. Averaging Operators and ANOVA Decompositions. 3. Reproducing Kernel Spaces for ANOVA Decompositions

Complexity and regularization issues in kernel-based learning

Kernel Methods. Outline

Are Loss Functions All the Same?

Lecture 4 Colorization and Segmentation

Introduction to Bases in Banach Spaces

Math Solutions to homework 5

Normality of adjointable module maps

The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

Basis Expansion and Nonlinear SVM. Kai Yu

Computation Of Asymptotic Distribution. For Semiparametric GMM Estimators. Hidehiko Ichimura. Graduate School of Public Policy

Transcription:

.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred Hickernell (Illinois Institute of Technology) Dr. Haizhang Zhang (Sun Yat-Sen University) Guohui Song Comp Math 1 of 21

Outline 1 Scattered Data Approximation 2 Reproducing Kernel Hilbert Spaces 3 Reproducing Kernel Banach Spaces Guohui Song Comp Math 2 of 21

Scattered Data Approximation Setting: Given data {(xj, y j ) : j = 1, 2,..., n} in R d R. Find a function Pf which is a good fit to the given data.. Question 1... What is a good fit?.. Guohui Song Comp Math 4 of 21

A 1-D Example scattered data points Guohui Song Comp Math 5 of 21

A Regularization Approach We want to control the closeness to the given data the complexity of the function A regularization approach: Target function: L(f, y, H) := n j=1 (f (x j) y j ) 2 + λ f 2 H Pf := arg min L(f, y, H). f H Some fancy names: penalized least square, ridge regression, smoothing spline. Question 2... What is the hypothesis space H?.. Guohui Song Comp Math 6 of 21

Reproducing Kernel Hilbert Spaces (RKHS) We need H is a Hilbert space f n f H 0 = f n (x) f (x) 0 for all x X RKHS: a Hilbert space H on which the point evaluation functional is continuous. f (x) M x f H for all x X. Question 3....Ẉhere is the kernel? Guohui Song Comp Math 8 of 21

Kernel Suppose X is a subset of R d. K is a real-valued function on X X : K : X X R K is a kernel if for any positive integer m and X := {x1,..., x m } X, the kernel gram matrix K m := [K(x j, x l ) : 1 j, l m] is symmetric and positive semi-definite. Guohui Song Comp Math 9 of 21

Connections Between RKHS and Kernels [Aronszajn, 1950] There is a bijective mapping from the RKHS to the set of kernels such that K(, x) H for any x X, f (x) = (f ( ), K(, x)) H for any f H. Some properties of RKHS HK and the kernel K H 0 := span{k(, x) : x X } is dense in H K. For any f = m j=1 c jk(, x j ) H 0, f HK = K 1/2 m c 2. Guohui Song Comp Math 10 of 21

Some Examples of RKHS and Kernels Sobolev space H 2 (R): K(s, t) = 3 3 ( ) 3 e 2 s t sin s t 2 + π 6. C 0 Matérn kernel: K(s, t) = e s t Gaussian kernel: K(s, t) = e w(s t) 2, w > 0. Sinc kernel: K(s, t) = sinc(s t). Polynomial kernels: K(s, t) = (st) d, d = 1, 2,.... L 2 (R) is NOT a RKHS. Guohui Song Comp Math 11 of 21

Regularization in the RKHS H K [Kimeldorf and Wahba, 1971] Representer Theorem : Target function: L(f, y, H K ) = n j=1 (f (x j) y j ) 2 + λ f 2 H K. Let S n := span{k(, x j) : j = 1, 2,..., n}. The optimization problem reduces to finite-dimensional: min L(f, y, H K) = min L(f, y, H K) f H K f S n The minimizer is explicitly given: Pf = n α jk(, x j), where α = (K n + λi n) 1 y. j=1 Guohui Song Comp Math 12 of 21

Reproducing Kernel Banach Spaces We try to construct a Banach space B point evaluation functional δ x is continuous on B A specific construction Let B 0 := span{k(, x) : x X }. For any f = m j=1 c jk(, x j ) B 0, define f B := c 1. δ x is continuous on B 0 if K(, ) is uniformly bounded. Let B be the Banach completion of B 0 with the norm B. Guohui Song Comp Math 14 of 21

Some Properties of RKBS [Song2010+] Point evaluation functional is continuous on B if and only if α j K(, x j ) = 0 = α = 0. j=1 [Song2010+] Reproducing property still holds. Define a bilinear form <, > on B 0 B 0 such that < m m α jk(, x j), β jk(, x j) >= α T K mβ j=1 j=1 The bilinear form <, > can be extended to B B such that < f, K(, x) >= f (x), x X, f B. Guohui Song Comp Math 15 of 21

Regularization in RKBS Target function: L(f, y, B) = n j=1 (f (x j) y j ) 2 + λ f B. Recall Sn = span{k(, x j ) : j = 1, 2,..., n}. Does the optimization problem reduce to finite-dimensional??? min L(f, y, B) = min L(f, y, B) f B f S n If it can reduce to finite-dimensional, how to find the minimizer Pf = n j=1 α jk(, x j )? Guohui Song Comp Math 16 of 21

Regularization and Interpolation Define the interpolation space I n (y) = {f B : f (x j ) = y j, j = 1, 2,..., n}. [Song2010+] The following two statements are equivalent. min L(f, y, B) = min L(f, y, B), for all y R n. f B f S n min f B = min f B, for all y R n. f I n(y) f I n(y) S n Note that In (y) S n has only one element when K n is invertible. We only need to show that the minimal norm interpolation problem admits a minimizer in the finite-dimensional space S n. Guohui Song Comp Math 17 of 21

Representer Theorem in RKBS Let k(x) := (K(x, x1 ),..., K(x, x n )) T. [Song2010+] Minimal norm interpolation min f B = min f B, for all y R n f I n (y) f I n (y) S n K n 1 k(x) 1 1, for all x X. [Song2010+] Regularization min L(f, y, B) = min L(f, y, B), for all y R n f B f S n K n 1 k(x) 1 1, for all x X. Guohui Song Comp Math 18 of 21

Some Examples The condition Kn 1 k(x) 1 1 is not easy to check. We have only been able to find two kernels satisfying it so far. K(s, t) = min{s, t} st, s, t [0, 1] K(s, t) = e s t, s, t R Counter examples that does not satisfy this condition Gaussian kernels: K(s, t) = e (s t)2, Sinc Kernel: K(s, t) = sinc(s t), s, t R s, t R Guohui Song Comp Math 19 of 21

How to find the minimizer? { } n min L(f, y, B) = min (f (x j ) y j ) 2 + λ c 1 : f = n c j K(, x j ) f S n j=1 We do not have a closed form of the minimizer. Standard optimization methods may do, but we still need efficient methods especially for large size of data. j=1 Guohui Song Comp Math 20 of 21

Thank you! Guohui Song Comp Math 21 of 21