Kernel Sliced Inverse Regression With Applications to Classification

Similar documents
Support Vector Machines for Classification: A Statistical Portrait

Reproducing Kernel Hilbert Spaces

Canonical Correlation Analysis with Kernels

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Sufficient Dimension Reduction using Support Vector Machine and it s variants

CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines

Multiple Similarities Based Kernel Subspace Learning for Image Classification

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Discriminative Direction for Kernel Classifiers

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Localized Sliced Inverse Regression

Support Vector Machines. Maximizing the Margin

Lecture Notes Statistical and Machine Learning

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

Classifier Complexity and Support Vector Classifiers

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Similarity and kernels in machine learning

A BAYESIAN APPROACH FOR EXTREME LEARNING MACHINE-BASED SUBSPACE LEARNING. Alexandros Iosifidis and Moncef Gabbouj

Support Vector Machine (SVM) and Kernel Methods

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Learning sets and subspaces: a spectral approach

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Kernel Methods in Medical Imaging

A Selective Review of Sufficient Dimension Reduction

CS798: Selected topics in Machine Learning

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Perceptron Revisited: Linear Separators. Support Vector Machines

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Bayes Optimal Kernel Discriminant Analysis

Jeff Howbert Introduction to Machine Learning Winter

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines

Discriminant Kernels based Support Vector Machine

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine

Principal Component Analysis

The Numerical Stability of Kernel Methods

Kernel PCA: keep walking... in informative directions. Johan Van Horebeek, Victor Muñiz, Rogelio Ramos CIMAT, Guanajuato, GTO

Support Vector Machines Explained

Applied Machine Learning Annalisa Marsico

Support Vector Machines and Kernel Algorithms

PoS(CENet2017)018. Privacy Preserving SVM with Different Kernel Functions for Multi-Classification Datasets. Speaker 2

Local Learning Projections

Support Vector Machine (SVM) and Kernel Methods

How to learn from very few examples?

Kernel Methods & Support Vector Machines

Kernel Methods and Support Vector Machines

Lecture 10: A brief introduction to Support Vector Machine

Statistical Pattern Recognition

Support Vector Machines

Kernel methods, kernel SVM and ridge regression

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Kernel Partial Least Squares for Nonlinear Regression and Discrimination

Introduction to Support Vector Machines

Non-linear Dimensionality Reduction

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Support Vector Machines

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Lecture 10: Support Vector Machine and Large Margin Classifier

Support Vector Machines: Maximum Margin Classifiers

Quantum machine learning for quantum anomaly detection CQT AND SUTD, SINGAPORE ARXIV:

Support Vector Machines

A Bahadur Representation of the Linear Support Vector Machine

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Chemometrics: Classification of spectra

Probabilistic Class-Specific Discriminant Analysis

Support Vector Machines

Linear & nonlinear classifiers

Support Vector Machines and Speaker Verification

Nonlinear Dimensionality Reduction

Semi-Supervised Learning through Principal Directions Estimation

Analysis of N-terminal Acetylation data with Kernel-Based Clustering

Support Vector Machines on General Confidence Functions

Introduction to Three Paradigms in Machine Learning. Julien Mairal

Kernel Methods in Machine Learning

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Nyström-based Approximate Kernel Subspace Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Neural networks and support vector machines

Kernel-based Feature Extraction under Maximum Margin Criterion

Probabilistic Machine Learning. Industrial AI Lab.

Kernel Methods. Machine Learning A W VO

Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas

CS145: INTRODUCTION TO DATA MINING

Modeling electrocardiogram using Yule-Walker equations and kernel machines

Transcription:

May 21-24, 2008 in Durham, NC Kernel Sliced Inverse Regression With Applications to Classification Han-Ming Wu (Hank) Department of Mathematics, Tamkang University Taipei, Taiwan 2008/05/22 http://www.hmwu.idv.tw

Outline 2/27 Kernel Methods, Kernel Trick Kernel Data and Its Properties SIR in the Euclidean Space Kernel SIR in a Non-linear Feature Space KSIR for Nonlinear Dimension Reduction and Data Visualization Experiments on Classification Conclusion and Future Direction

Kernel Methods (1) 3/27 Aronszajn (1950) and Parzen (1962) first to employ kernel methods in statistics. Aizerman et al. (1964) used positive definite kernels which was closer to kernel trick, argued that a positive definite kernel is identical to a dot product in the feature space.

Kernel Methods (2) 4/27 Boser et al (1992) construct SVMs, a generalization of the so-called optimal hyperplane algorithm. Scholkopf et al (1998) point out that kernels can be used to construct generalization of any algorithm that can be carried out in terms of dot products. For last 10 years there have seen a large number of kernelization of various algorithms. (e.g., PCA, LDA, CCA, PLS, )

Prepare Kernel Data 5/27 theoretically In fact x i x j Φ(x i )? Φ(x j )

Data Representation 6/27 Data are not represented individually anymore, but only through a set of pairwise comparisons. The size of the matrix used to represent a dataset of n objects is always n by n.

Kernel as Inner Product 7/27 (Aronszajn, 1950)

Kernel Trick 8/27 The kernel trick transforms any algorithm that solely dependents on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. The non-linear algorithm is the linear algorithm operating in the feature space. Kernelization: the operation that transforms a linear algorithm into a more general kernel method.

SIR in the Euclidean Space 9/27 Sufficient Dimension Reduction NOTE: For more details, please see Dr. Dennis Cook, School of Statistics, University of Minnesota. ( > 50 related articles published!)

Classical SIR: Algorithm 10/27 Weighted PCA

Kernel SIR in a Non-linear Feature Space 11/27 Kernel SIR: Kernelize the SIR algorithm

KSIR: Algorithm (1) 12/27

KSIR: Algorithm (2) 13/27

KSIR: Algorithm (3) 14/27

Normalization and Projection 15/27

Reduced Features 16/27 For Theoretical details: Lee, Y.J. and Huang, S.Y. (2006), Reduced support vector machines: a statistical theory, IEEE Transactions on Neural Networks, accepted. http://dmlab1.csie.ntust.edu.tw/downloads

KSIR for Nonlinear Dimensional Reduction and Data Visualization 17/27 Simulation Data Square Data (150x2, na) Three Clusters Data (220x2, no.class=3) Li Data Model (6.3) (400x10, y=conti) Real Data Wine Data (178x18, no.class=3) Pendigit Data (7494x16, no.class=10)

Visualization (1): Square Data H=8 18/27 d = 1 d = 2 d = 3 d = 4 d = 1 d = 2 d = 3 d = 4 V 1 V 1 V 2 V 2 V 3 V 3 KPCA KSIR s = 0.01 s = 0.1 s = 1 s = 10 s = 0.01 s = 0.1 s = 1 s = 10 V 1 V 1 V 2 V 2 V 3 V 3

Visualization (2): Three Clusters Data 19/27 d = 1 d = 2 d = 3 d = 4 d = 1 d = 2 d = 3 d = 4 V 1 V 1 V 2 V 2 V 3 V 3 KPCA KSIR s = 0.01 s = 0.1 s = 1 s = 10 s = 0.01 s = 0.1 s = 1 s = 10 V 1 V 1 V 2 V 2 V 3 V 3

20/27 Visualization (3): Li Data Model (6.3) PCA SIR H=13 Orig KPCA Gaussian s=0.05 KSIR

Visualization (4): Wine Data Wine data (n=178) are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. PCA SIR 21/27 The analysis determined the quantities of 13 constituents found in each of the three types of wines. KPCA Gaussian s=0.05 KSIR

Pen-based recognition of handwritten Digits Visualization (5): Pendigit Data PCA SIR 22/27 7494 instances, 16 attributes 10 classes KPCA Gaussian 0.05 Random sampling 200 KSIR

Classification (1): UCI Data Sets 23/27 Gaussian 0.05 Random sampling 200 Linear Support Vector Machine 10-fold classification error rates

10-fold Classification ERs: UCI Data Sets 24/27

Classification: Microarray Data Sets 25/27 Linear Support Vector Machine Leave-one-out classification error rates

Conclusion and Future Direction 26/27 Use Kernel Trick to study the linear algorithm of SIR in the Feature Space. Nonlinear dimension reduction subspace from X viewpoint Linear dimension reduction subspace in H k Nonlinear Dimension Reduction and Visualization For Classification. Apply to Clustering Problem. SIR/KSIR: A tool for feature extraction and data exploratory analysis. Theoretical Prosperities of Kernel SIR. Selection of Kernel Parameters (model selection).

jdrcluster: Dimension Reduction and Cluster Analysis 27/27 Thank You! http://www.hmwu.idv.tw