for Classification Problems Sergios Petridis

Size: px
Start display at page:

Download "for Classification Problems Sergios Petridis"

Transcription

1 for Classification Problems National and Kapodistrian University of Athens, School of Sciencies, Department of Informatics and Telecommunications NCSR Demokritos, Institute of Informatics and Telecommunications, Computational Intelligence Laboratory Mars 2006

2 Outline LDA HDA MMI relation

3 Feature Vector Observation Feature Generation x R n Feature Transform y R m

4 Feature transforms y = f(x), f : R n R m Adjusting Dimensionality Changing Axis Reforming space Transform families scaling selection linear non-linear scaling non-linear Transform properties linear feature oriented invertible

5 Supervised transform learning Criterion: Suitability against a classification problem Features more usefull less redundant well balanced Final goal assist classifier faciliate visualization y x 2 Samples without class x 1

6 Supervised transform learning Criterion: Suitability against a classification problem Features more usefull less redundant well balanced Final goal assist classifier faciliate visualization x 2 y Samples with class x 1

7 Thesis aims 1 Formulate a unifying framework for supervised learning of feature transforms, associating known criteria and methods 2 Engineer new learning algorithms based on general criteria adapted to the transform type and properties

8 Outline LDA HDA MMI relation

9 Outline LDA HDA MMI relation

10 Supervised Linear Feature Extraction (SLFE) y = A x, R n R m, m < n x 1 x 2 y 1 Equivalent expressions Extraction of m linear features Searching m projections Searching matrix A [n m] x i x n a ij y 2 y j y m A = [a 1 a 2 a m ]

11 Minimum Bayes risk [r(ĥ(x) x) ] B R (C X) E The minimum risk depends on: The feature vector p (x c 1 ) p (x c 2 ) X, C p(x = x, C = c) The risk matrix [ 0 α R = β 0 ] α = β h(x)

12 Minimum Bayes risk [r(ĥ(x) x) ] B R (C X) E The minimum risk depends on: The feature vector p (x c 1 ) p (x c 2 ) X, C p(x = x, C = c) The risk matrix [ 0 α R = β 0 ] α > β h(x)

13 The Bayes optimal transform The Bayes criterion Optimal transform is the one extracting features implying minimum Bayes risk  = argmin B R (C A X) A Subspace optimality Optimality is not associated to features but to the subspace they engender

14 The features number Arguments in favor of dimensionality reduction: visualisation, computation complexity, classifier s performance B R However: Dimensionality reduction max min won t lead to Bayes risk reduction 0 n Risk vs dimension

15 goals Three distinct goals: No loss with known risk matrix No loss with unkwnown risk matrix min B R B R R 2 R 1 0 m 1 m 2 ˆm n With contstraints upon dimension min B 0 ˆm m n

16 Zero Information / Risk Loss A s [ ] s M ζ ζ We define two models: x Z M X Zero Information Loss model (ZIL) S Z C S Zero Risk Loss model (ZRL) C x, r(ĉ x) = r(ĉ s) ZRL & 2 cl ZIL

17 Constraint no-lossy We prove that: ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features x2 R1 φ R2 s x1 R3 ZRL No-lossy SLFE with known risk matrix leads to subpsace dimension at most d features Example 1: ZIL ζ

18 Constraint no-lossy We prove that: x 2 ζ ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features R 2 R 3 R 1a R 1b x 1 s R 1c ZRL No-lossy SLFE with known risk matrix leads to subpsace dimension at most d features Example 2: ZIL,ZRL

19 Constraint no-lossy We prove that: x 2 ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features R 1 R 3 R 2 R 2b φ x 1 s 2 s ZRL R 1b ζ ζ 2 No-lossy SLFE with known risk matrix Example 3: - leads to subpsace dimension at most d features

20 DLFE Goals: Conclusions Extracting Linear Features Extracting Feature Subspace Knowing the risk matrix is significant for optimal reduction 2 class problems ZRL = ZIL

21 Outline LDA HDA MMI relation

22 LDA HDA MMI relation Three distinct criteria, different from the Bayes criterion Experimental evidence LDA: small complexity, efficient HDA: moderated complexity, superior to LDA MMI: high complexity, superior to LDA Structural relation What is the connection between these criteria? Form relation Optimality relation Derivation one from the other

23 Linear discriminant analysis Linear Discriminant Analysis (LDA) Fisher x 2 inter-class means variance, small within class variance Optimality µ 2 A x 1 Homoscedastic Gaussian model (HOG) µ 1

24 Heteroscedastic Discriminant Analysis Heteroscedastic Discriminant Analysis (HDA) Kumar, Andreou 1996 There is a subspace where x 2 both means and covariance matrices are identical Optimality Kumar-Andreou Heteroscedastic model A µ 1 = µ 2 x 1 (KAH)

25 Maximization of mutual information Maximization of Mutual Information (MMI)(Shannon, Lewis 1962) The feature vector knowledge reduces class uncertainty I(C ; X) = H(C ) H(C X) x 2 Optimality ZIL x 1

26 Original criteria formulation  = argmax A LDA HDA cov(a X) log cov(a X C ) K log cov(a c X) p(c k ) log cov(a X c k ) +2 log à k=1 MMI I(C ; A X)

27 Criteria reformulation  = argmax A LDA log cov(a X) K k=1 p(c k ) cov(a X c k ) HDA log cov(a X) K k=1 cov(a X c k ) p( c k ) MMI log cov(a X) K k=1 cov(a X c k ) p( c k ) 2 ( J (A X) J (A X C ) )

28 Model hierarchy We prove that : When a problem conforms to HOG then it conforms to KAH When a problem conforms to KAH then it conforms to ZIL Common source - noise view for all three models ZIL KAH HOG

29 Criteria equivalence ZIL Bayes = MMI KAH Bayes = MMI = HDA HOG Bayes = MMI = HDA = LDA Derivation of LDA through information theory

30 Simulations (1) Success of all three criteria 1 MMIHDA LDA x 2 ζ c 1 c φ x s

31 Simulations (2) Success of HDA - MMI x 2 1 LDA MMI HDA c 1 c 2 φ x c 1 s

32 Simulations (3) Success of MMI 1 MMI HDA LDA x 2 c c φ x s c 1

33 LDA HDA MMI: Conclusions Hierarchical connection between criteria The MMI criterion is optimal in a wider problem range Derivation of the LDA criterion from information theory

34 Outline LDA HDA MMI relation

35 Motivation BR var (A C, X) = R n A change in Bayes risk within a subspace implies usefullness of the subspace r(ĉ x) A 2 p(x) dx Local sensitivity r(ĉ x) A

36 Subspace risk sensitivity To krit hrio euaisjhs ias r iskou Bayes Maximization of Bayes Risk Sensitivity (MBRS) Optimal transform is the one that extracts features implying maximum Bayes risk sensitivity Optimality  MBRS = argmax BR var (A C, X) A Mont elo mhdenik hs ap wleias euaisjhs ias r iskou Zero Risk Sensitivity Loss model (ZRSL) ZRL + differ ZRSL

37 The SPCA algorithm Supervised Principal Component Analysis (SPCA) Define the feature risk sensitivity matrix Express local sensitivity as the trace of a quadratic form Solve an eigenvector-eigenvalues problem Evaluate sensitivities as derivatives estimations via Parzen method Reduction to Principal Component Analysis (PCA) by replacing samples with the corresponding samples sensitivities [ r(ĉ x t ] ), r(ĉ xt ), r(ĉ xt ) x 1 x 2 x n

38 Evaluation (1) methodology Goals No loss 1,2,3 features Framework LDA comparison Classifier: K NN 30-crossvalidation pair-t-test (095) problem features dimension classes samples cancer card diabetes glass heart horse iris sonar soybean xorrot

39 Evaluation (2) Artificial Data Rotated XOR (XORROT) xorrotspcaknnbdf 0 The best 2 dimensions

40 Evaluation (2) Optimal dimensionality problem dimension comparaison optimal reduction performance difference t-test cancer 1 90% card 2 96% diabetes 2 75% glass 6 67% heart 3 91% horse 2 96% iris 1 75% sonar 14 77% soybean 20 76% xorrot 2 75%

41 Evaluation (3) Constraint number of dimensions problem 1 feat 2 feat 3 feat difference t-test difference t-test difference t-test cancer card diabetes glass 1, heart horse iris sonar soybean xorrot

42 Evaluation (4) Performance against dimension soybean LDA SPCA number of features sonar LDA SPCA number of features 4

43 criterion: Conclusions Equivalence between MBRS and Bayes (ZRSL) Eigenvector-eigenvalues problem formulation Parzen SPCA wins over LDA

44 Outline LDA HDA MMI relation

45 Outline LDA HDA MMI relation

46 Motivation The Bayes and MMI criteria don t account for the increase in generalization ability The curse of dimensionality problem Invertible transforms The Bayes and MMI criteria don t apply: B R (C f(x)) = B R (C X) and I(f(X); C ) = I(X; C )

47 Experiment: impact of sphering problem space comp original sphered diff t-test cancer card diabetes gene glass heart horse iris sonar soybean xorrot

48 Experiment: scaling vs rejection Invertible linear transform A = D Q card LDA diabetes SPCA A = lim l L, d l 0 D Q card LDA cancer SPCA

49 Focusing in feature space X2 Focalization kernel K(s z, Φ) e z Φ s 2 Focus magnitude Focus angle matrix x X 1 P (X = x z, Φ) = K(s z(x), Φ(x)) p(x = x + s) ds R n

50 Effective random variable X X z,φ Effective functionals Effective risk Bayes P (X = x z, Φ) p( X z,φ = x) P (X = x z, Φ) dx R n B R (C X) B R (C X) Limit of infinite accuracy lim X z,φ = X z= Effective information Iz,Φ (X; C ) I( X z,φ ; C )

51 Generalized learning criteria Isotropy Space suitability criterion stationarity Φ(x) = Φ isotropy Φ = I n Minimization of Isotropic Risk (MIR) ˆf argmin B R (C f(x)) {f F} Maximization of Isotropic Information (MII) ˆf argmax I(f(X); C ) {f F}

52 Generalized learning criteria Finding optimal transform Approximation Finding locally optimal transforms Aggregating to a global transform X2 K 3 K 1 K 2 x0 X1 ω 1 ω 2

53 Generalized learning criteria Finding optimal transform Approximation Finding locally optimal transforms Aggregating to a global transform

54 Generalized learning criteria Finding optimal transform Y2 Approximation Finding locally optimal transforms Aggregating to a global transform f(x 0) K 2 ω 1 ω 2 Y1

55 : Conclusions Integrating partial probability knowledge as a new random variable Space isotropy as a desired property Account for classifiers limitations regarding local adaptation and accuracy Dimensionality is not a problem on its own Bridging invertible non-invertible learning criteria at the infinite accuracy limit

56 Outline LDA HDA MMI relation

57 Motivation When is dimensionality reduction required? x 2 c 1 How can a transform be interpretable? When do we need non-linearity transforms? c 2 x 1

58 Motivation When is dimensionality x 2 reduction required? c 1 How can a transform be interpretable? When do we need c 2 c 1 c 2 non-linearity transforms? x 1

59 The transform (NLFS) f : R n R n f(x) = [f 1 (x 1 ), f 2 (x 2 ),, f n (x n )] Stretching function x i f(x i ) > 0 Properties: Preserving dimensionality Feature-oriented No information loss

60 Interpretability Preserving samples ordering Changing distance between samples Shrinking expanding space p(x i) d AB > d BC x i X i p(y i) 1 g(xi) x i f(x i ) 1 d AB <d BC y i Y i

61 Grid transform x 2 x 2 l 2 B A G l 2,7 B A G x 1 x 1 l 1 d = 10 s 1,4 l 1 d = 10

62 Transform learning The Isotropic (ISONLFS) algorithm MII criterion application Locally optimal matrices are diagonal Near-projection matrix definition Weighted matrix summation in respect with local near-projected effective information Evaluation as effective feature information

63 Evaluation (1) - Artificial data

64 Evaluation (2) - Real data problem methods comp SPCA ISONLFS diff t-test cancer card diabetes glass heart horse iris sonar soybean

65 Conclusions NLFS: reversibility, interpretability, low complexity Maximization of isotropic information application Low complexity learning algorithm Classification performance improvement

66 Contributions (1/2) 1 Formulation of a concise framework of goals and models for surpervised feature transform criteria learning comparison 2 Establishment of a relation between linear discriminant analysis, heteroskedastic discriminant analysis and mutual information criteria 3 Formulation of the new Bayes risk sensitivity criterion and derivation of the SPCA algorithm

67 Contributions (2/2) 1 Generalization of Bayes and MMI criteria, unifying invertible and non-invertible learing criteria 2 Study of non-linear feature scaling transforms and derivation of the ISONLFS algorithm

68 Conclusions non-invertibility linearity MBRS ZRSL SPCA MIR z Bayes ZRL R MII z N homo MMI HDA LDA ZIL KAH HOG ISONLFS

69 Future directions Enhance feature space models Goal-independent relation between MMI and LDA Impact of generalized entropies Mutual information sensitivity analysis Effective information: maximization in respect to focalization magnitude Adaptive grid resolution for ISONLFS Integrate isotropicalization to classifiers

70 Publications 1/4 S Petridis and S J Perantonis On the relation between discriminant analysis and mutual information for supervised linear feature Pattern Recognition, 37: , 2004 S Petridis and S J Perantonis Isotropic information maximization for non-linear feature scaling IEEE Transactions on Pattern Recognition and Machine Intelligence, submitted, 2004 S Petridis and S J Perantonis Feature deforming for improved similarity based learning In Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, volume 3025 of Lecture Notes in Artificial Intelligence, pages , 2004

71 Publications 2/4 S Petridis, E Charou, and S J Perantonis Non-redundant feature selection of multi-band remotely sensed images for land cover classification In Proceedings of the 2003 Tyrrhenian International Workshop on Remote Sensing, 2003 S J Perantonis, S Petridis, and V Virvilis Supervised principal component analysis using a smooth classifier paradigm In Proceedings of the 15th International Conference on Pattern Recognition, volume 2, pages , 2000 E Charou, S Petridis, M Stefouli, O D Mavrantza, and S J Perantonis Innovative feature selection used in multispectral imagery, classification for water quality monitoring

72 Publications 3/4 B Gatos, K Ntzios, I Pratikakis, S Petridis, T Konidaris, and S J Perantonis An efficient segmentation-free approach to assist old greek handwritten manuscript ocr Pattern Analysis and Applications, 8(4): , 2006 A Agogino, J Gosh, S J Perantonis, V Virvilis, S Petridis, and PJG Lisboa The role of multiple, linear-projection based visualization techniques in rbf-based classification of high dimensional data In Proceedings of the International Joint Conference on Neural Networks, volume 3, 2000 S J Perantonis, N Ampazis, V Virvilis, and S Petridis Open issues in feedforward neural network training In LFTNC (Siena, Italy), 2001

73 Publications 4/4 G Petasis, S Petridis, and G Paliouras Symbolic and neural learning for named-entity recognition In GTselentis and H-J Zimmermann, editors, Advances in Computational Intelligence and, Methods and Applications Kluwer Academic Publishers, 2001 G Petasis, S Petridis, G Paliouras, and V Karkaletsis Symbolic and neural learning for named-entity recognition In Proceedings of European Best Practice Workshops and Symposium on Computational Intelligence and, 2000 B Gatos, K Ntzios, I Pratikakis, S Petridis, T Konidaris, and S J Perantonis A segmentation-free recognition technique to assist old greek handwritten manuscript ocr In Proceedings of Document Analysis Systems, volume 3163 of

Classification-Optimal Feature Transforms

Classification-Optimal Feature Transforms Classification-Optimal Feature Transforms Sergios Petridis National and Kapodistrian University of Athens Department of Informatics and Telecommunications, NCSR Demokritos petridis@iitdemokritosgr Abstract

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Artificial Neural Network Approach for Land Cover Classification of Fused Hyperspectral and Lidar Data

Artificial Neural Network Approach for Land Cover Classification of Fused Hyperspectral and Lidar Data Artificial Neural Network Approach for Land Cover Classification of Fused Hyperspectral and Lidar Data Paris Giampouras 1,2, Eleni Charou 1, and Anastasios Kesidis 3 1 Computational Intelligence Laboratory,

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Feature Extraction with Weighted Samples Based on Independent Component Analysis Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Neutron inverse kinetics via Gaussian Processes

Neutron inverse kinetics via Gaussian Processes Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria

Multi-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA

More information

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Generative MaxEnt Learning for Multiclass Classification

Generative MaxEnt Learning for Multiclass Classification Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

LEC 5: Two Dimensional Linear Discriminant Analysis

LEC 5: Two Dimensional Linear Discriminant Analysis LEC 5: Two Dimensional Linear Discriminant Analysis Dr. Guangliang Chen March 8, 2016 Outline Last time: LDA/QDA (classification) Today: 2DLDA (dimensionality reduction) HW3b and Midterm Project 3 What

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Variable selection and feature construction using methods related to information theory

Variable selection and feature construction using methods related to information theory Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data

Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Classification of high dimensional data: High Dimensional Discriminant Analysis

Classification of high dimensional data: High Dimensional Discriminant Analysis Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

EECS490: Digital Image Processing. Lecture #26

EECS490: Digital Image Processing. Lecture #26 Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Doubly Stochastic Normalization for Spectral Clustering

Doubly Stochastic Normalization for Spectral Clustering Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Lecture 13 Visual recognition

Lecture 13 Visual recognition Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Classification with Kernel Mahalanobis Distance Classifiers

Classification with Kernel Mahalanobis Distance Classifiers Classification with Kernel Mahalanobis Distance Classifiers Bernard Haasdonk and Elżbieta P ekalska 2 Institute of Numerical and Applied Mathematics, University of Münster, Germany, haasdonk@math.uni-muenster.de

More information

L5: Quadratic classifiers

L5: Quadratic classifiers L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Recap from previous lecture

Recap from previous lecture Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Towards a Ptolemaic Model for OCR

Towards a Ptolemaic Model for OCR Towards a Ptolemaic Model for OCR Sriharsha Veeramachaneni and George Nagy Rensselaer Polytechnic Institute, Troy, NY, USA E-mail: nagy@ecse.rpi.edu Abstract In style-constrained classification often there

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines

CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines A comprehensive introduc@on to SVMs and other kernel methods, including theory, algorithms and applica@ons. Instructor: Anthony

More information

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis 009 0th International Conference on Document Analysis and Recognition A Novel Rejection easurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis Chun Lei He Louisa Lam Ching

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information