for Classification Problems Sergios Petridis
|
|
- Prudence Myrtle Hart
- 5 years ago
- Views:
Transcription
1 for Classification Problems National and Kapodistrian University of Athens, School of Sciencies, Department of Informatics and Telecommunications NCSR Demokritos, Institute of Informatics and Telecommunications, Computational Intelligence Laboratory Mars 2006
2 Outline LDA HDA MMI relation
3 Feature Vector Observation Feature Generation x R n Feature Transform y R m
4 Feature transforms y = f(x), f : R n R m Adjusting Dimensionality Changing Axis Reforming space Transform families scaling selection linear non-linear scaling non-linear Transform properties linear feature oriented invertible
5 Supervised transform learning Criterion: Suitability against a classification problem Features more usefull less redundant well balanced Final goal assist classifier faciliate visualization y x 2 Samples without class x 1
6 Supervised transform learning Criterion: Suitability against a classification problem Features more usefull less redundant well balanced Final goal assist classifier faciliate visualization x 2 y Samples with class x 1
7 Thesis aims 1 Formulate a unifying framework for supervised learning of feature transforms, associating known criteria and methods 2 Engineer new learning algorithms based on general criteria adapted to the transform type and properties
8 Outline LDA HDA MMI relation
9 Outline LDA HDA MMI relation
10 Supervised Linear Feature Extraction (SLFE) y = A x, R n R m, m < n x 1 x 2 y 1 Equivalent expressions Extraction of m linear features Searching m projections Searching matrix A [n m] x i x n a ij y 2 y j y m A = [a 1 a 2 a m ]
11 Minimum Bayes risk [r(ĥ(x) x) ] B R (C X) E The minimum risk depends on: The feature vector p (x c 1 ) p (x c 2 ) X, C p(x = x, C = c) The risk matrix [ 0 α R = β 0 ] α = β h(x)
12 Minimum Bayes risk [r(ĥ(x) x) ] B R (C X) E The minimum risk depends on: The feature vector p (x c 1 ) p (x c 2 ) X, C p(x = x, C = c) The risk matrix [ 0 α R = β 0 ] α > β h(x)
13 The Bayes optimal transform The Bayes criterion Optimal transform is the one extracting features implying minimum Bayes risk  = argmin B R (C A X) A Subspace optimality Optimality is not associated to features but to the subspace they engender
14 The features number Arguments in favor of dimensionality reduction: visualisation, computation complexity, classifier s performance B R However: Dimensionality reduction max min won t lead to Bayes risk reduction 0 n Risk vs dimension
15 goals Three distinct goals: No loss with known risk matrix No loss with unkwnown risk matrix min B R B R R 2 R 1 0 m 1 m 2 ˆm n With contstraints upon dimension min B 0 ˆm m n
16 Zero Information / Risk Loss A s [ ] s M ζ ζ We define two models: x Z M X Zero Information Loss model (ZIL) S Z C S Zero Risk Loss model (ZRL) C x, r(ĉ x) = r(ĉ s) ZRL & 2 cl ZIL
17 Constraint no-lossy We prove that: ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features x2 R1 φ R2 s x1 R3 ZRL No-lossy SLFE with known risk matrix leads to subpsace dimension at most d features Example 1: ZIL ζ
18 Constraint no-lossy We prove that: x 2 ζ ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features R 2 R 3 R 1a R 1b x 1 s R 1c ZRL No-lossy SLFE with known risk matrix leads to subpsace dimension at most d features Example 2: ZIL,ZRL
19 Constraint no-lossy We prove that: x 2 ZIL No-lossy SLFE with unkown risk matrix leads to exactly d features R 1 R 3 R 2 R 2b φ x 1 s 2 s ZRL R 1b ζ ζ 2 No-lossy SLFE with known risk matrix Example 3: - leads to subpsace dimension at most d features
20 DLFE Goals: Conclusions Extracting Linear Features Extracting Feature Subspace Knowing the risk matrix is significant for optimal reduction 2 class problems ZRL = ZIL
21 Outline LDA HDA MMI relation
22 LDA HDA MMI relation Three distinct criteria, different from the Bayes criterion Experimental evidence LDA: small complexity, efficient HDA: moderated complexity, superior to LDA MMI: high complexity, superior to LDA Structural relation What is the connection between these criteria? Form relation Optimality relation Derivation one from the other
23 Linear discriminant analysis Linear Discriminant Analysis (LDA) Fisher x 2 inter-class means variance, small within class variance Optimality µ 2 A x 1 Homoscedastic Gaussian model (HOG) µ 1
24 Heteroscedastic Discriminant Analysis Heteroscedastic Discriminant Analysis (HDA) Kumar, Andreou 1996 There is a subspace where x 2 both means and covariance matrices are identical Optimality Kumar-Andreou Heteroscedastic model A µ 1 = µ 2 x 1 (KAH)
25 Maximization of mutual information Maximization of Mutual Information (MMI)(Shannon, Lewis 1962) The feature vector knowledge reduces class uncertainty I(C ; X) = H(C ) H(C X) x 2 Optimality ZIL x 1
26 Original criteria formulation  = argmax A LDA HDA cov(a X) log cov(a X C ) K log cov(a c X) p(c k ) log cov(a X c k ) +2 log à k=1 MMI I(C ; A X)
27 Criteria reformulation  = argmax A LDA log cov(a X) K k=1 p(c k ) cov(a X c k ) HDA log cov(a X) K k=1 cov(a X c k ) p( c k ) MMI log cov(a X) K k=1 cov(a X c k ) p( c k ) 2 ( J (A X) J (A X C ) )
28 Model hierarchy We prove that : When a problem conforms to HOG then it conforms to KAH When a problem conforms to KAH then it conforms to ZIL Common source - noise view for all three models ZIL KAH HOG
29 Criteria equivalence ZIL Bayes = MMI KAH Bayes = MMI = HDA HOG Bayes = MMI = HDA = LDA Derivation of LDA through information theory
30 Simulations (1) Success of all three criteria 1 MMIHDA LDA x 2 ζ c 1 c φ x s
31 Simulations (2) Success of HDA - MMI x 2 1 LDA MMI HDA c 1 c 2 φ x c 1 s
32 Simulations (3) Success of MMI 1 MMI HDA LDA x 2 c c φ x s c 1
33 LDA HDA MMI: Conclusions Hierarchical connection between criteria The MMI criterion is optimal in a wider problem range Derivation of the LDA criterion from information theory
34 Outline LDA HDA MMI relation
35 Motivation BR var (A C, X) = R n A change in Bayes risk within a subspace implies usefullness of the subspace r(ĉ x) A 2 p(x) dx Local sensitivity r(ĉ x) A
36 Subspace risk sensitivity To krit hrio euaisjhs ias r iskou Bayes Maximization of Bayes Risk Sensitivity (MBRS) Optimal transform is the one that extracts features implying maximum Bayes risk sensitivity Optimality  MBRS = argmax BR var (A C, X) A Mont elo mhdenik hs ap wleias euaisjhs ias r iskou Zero Risk Sensitivity Loss model (ZRSL) ZRL + differ ZRSL
37 The SPCA algorithm Supervised Principal Component Analysis (SPCA) Define the feature risk sensitivity matrix Express local sensitivity as the trace of a quadratic form Solve an eigenvector-eigenvalues problem Evaluate sensitivities as derivatives estimations via Parzen method Reduction to Principal Component Analysis (PCA) by replacing samples with the corresponding samples sensitivities [ r(ĉ x t ] ), r(ĉ xt ), r(ĉ xt ) x 1 x 2 x n
38 Evaluation (1) methodology Goals No loss 1,2,3 features Framework LDA comparison Classifier: K NN 30-crossvalidation pair-t-test (095) problem features dimension classes samples cancer card diabetes glass heart horse iris sonar soybean xorrot
39 Evaluation (2) Artificial Data Rotated XOR (XORROT) xorrotspcaknnbdf 0 The best 2 dimensions
40 Evaluation (2) Optimal dimensionality problem dimension comparaison optimal reduction performance difference t-test cancer 1 90% card 2 96% diabetes 2 75% glass 6 67% heart 3 91% horse 2 96% iris 1 75% sonar 14 77% soybean 20 76% xorrot 2 75%
41 Evaluation (3) Constraint number of dimensions problem 1 feat 2 feat 3 feat difference t-test difference t-test difference t-test cancer card diabetes glass 1, heart horse iris sonar soybean xorrot
42 Evaluation (4) Performance against dimension soybean LDA SPCA number of features sonar LDA SPCA number of features 4
43 criterion: Conclusions Equivalence between MBRS and Bayes (ZRSL) Eigenvector-eigenvalues problem formulation Parzen SPCA wins over LDA
44 Outline LDA HDA MMI relation
45 Outline LDA HDA MMI relation
46 Motivation The Bayes and MMI criteria don t account for the increase in generalization ability The curse of dimensionality problem Invertible transforms The Bayes and MMI criteria don t apply: B R (C f(x)) = B R (C X) and I(f(X); C ) = I(X; C )
47 Experiment: impact of sphering problem space comp original sphered diff t-test cancer card diabetes gene glass heart horse iris sonar soybean xorrot
48 Experiment: scaling vs rejection Invertible linear transform A = D Q card LDA diabetes SPCA A = lim l L, d l 0 D Q card LDA cancer SPCA
49 Focusing in feature space X2 Focalization kernel K(s z, Φ) e z Φ s 2 Focus magnitude Focus angle matrix x X 1 P (X = x z, Φ) = K(s z(x), Φ(x)) p(x = x + s) ds R n
50 Effective random variable X X z,φ Effective functionals Effective risk Bayes P (X = x z, Φ) p( X z,φ = x) P (X = x z, Φ) dx R n B R (C X) B R (C X) Limit of infinite accuracy lim X z,φ = X z= Effective information Iz,Φ (X; C ) I( X z,φ ; C )
51 Generalized learning criteria Isotropy Space suitability criterion stationarity Φ(x) = Φ isotropy Φ = I n Minimization of Isotropic Risk (MIR) ˆf argmin B R (C f(x)) {f F} Maximization of Isotropic Information (MII) ˆf argmax I(f(X); C ) {f F}
52 Generalized learning criteria Finding optimal transform Approximation Finding locally optimal transforms Aggregating to a global transform X2 K 3 K 1 K 2 x0 X1 ω 1 ω 2
53 Generalized learning criteria Finding optimal transform Approximation Finding locally optimal transforms Aggregating to a global transform
54 Generalized learning criteria Finding optimal transform Y2 Approximation Finding locally optimal transforms Aggregating to a global transform f(x 0) K 2 ω 1 ω 2 Y1
55 : Conclusions Integrating partial probability knowledge as a new random variable Space isotropy as a desired property Account for classifiers limitations regarding local adaptation and accuracy Dimensionality is not a problem on its own Bridging invertible non-invertible learning criteria at the infinite accuracy limit
56 Outline LDA HDA MMI relation
57 Motivation When is dimensionality reduction required? x 2 c 1 How can a transform be interpretable? When do we need non-linearity transforms? c 2 x 1
58 Motivation When is dimensionality x 2 reduction required? c 1 How can a transform be interpretable? When do we need c 2 c 1 c 2 non-linearity transforms? x 1
59 The transform (NLFS) f : R n R n f(x) = [f 1 (x 1 ), f 2 (x 2 ),, f n (x n )] Stretching function x i f(x i ) > 0 Properties: Preserving dimensionality Feature-oriented No information loss
60 Interpretability Preserving samples ordering Changing distance between samples Shrinking expanding space p(x i) d AB > d BC x i X i p(y i) 1 g(xi) x i f(x i ) 1 d AB <d BC y i Y i
61 Grid transform x 2 x 2 l 2 B A G l 2,7 B A G x 1 x 1 l 1 d = 10 s 1,4 l 1 d = 10
62 Transform learning The Isotropic (ISONLFS) algorithm MII criterion application Locally optimal matrices are diagonal Near-projection matrix definition Weighted matrix summation in respect with local near-projected effective information Evaluation as effective feature information
63 Evaluation (1) - Artificial data
64 Evaluation (2) - Real data problem methods comp SPCA ISONLFS diff t-test cancer card diabetes glass heart horse iris sonar soybean
65 Conclusions NLFS: reversibility, interpretability, low complexity Maximization of isotropic information application Low complexity learning algorithm Classification performance improvement
66 Contributions (1/2) 1 Formulation of a concise framework of goals and models for surpervised feature transform criteria learning comparison 2 Establishment of a relation between linear discriminant analysis, heteroskedastic discriminant analysis and mutual information criteria 3 Formulation of the new Bayes risk sensitivity criterion and derivation of the SPCA algorithm
67 Contributions (2/2) 1 Generalization of Bayes and MMI criteria, unifying invertible and non-invertible learing criteria 2 Study of non-linear feature scaling transforms and derivation of the ISONLFS algorithm
68 Conclusions non-invertibility linearity MBRS ZRSL SPCA MIR z Bayes ZRL R MII z N homo MMI HDA LDA ZIL KAH HOG ISONLFS
69 Future directions Enhance feature space models Goal-independent relation between MMI and LDA Impact of generalized entropies Mutual information sensitivity analysis Effective information: maximization in respect to focalization magnitude Adaptive grid resolution for ISONLFS Integrate isotropicalization to classifiers
70 Publications 1/4 S Petridis and S J Perantonis On the relation between discriminant analysis and mutual information for supervised linear feature Pattern Recognition, 37: , 2004 S Petridis and S J Perantonis Isotropic information maximization for non-linear feature scaling IEEE Transactions on Pattern Recognition and Machine Intelligence, submitted, 2004 S Petridis and S J Perantonis Feature deforming for improved similarity based learning In Proceedings of the 3rd Hellenic Conference on Artificial Intelligence, volume 3025 of Lecture Notes in Artificial Intelligence, pages , 2004
71 Publications 2/4 S Petridis, E Charou, and S J Perantonis Non-redundant feature selection of multi-band remotely sensed images for land cover classification In Proceedings of the 2003 Tyrrhenian International Workshop on Remote Sensing, 2003 S J Perantonis, S Petridis, and V Virvilis Supervised principal component analysis using a smooth classifier paradigm In Proceedings of the 15th International Conference on Pattern Recognition, volume 2, pages , 2000 E Charou, S Petridis, M Stefouli, O D Mavrantza, and S J Perantonis Innovative feature selection used in multispectral imagery, classification for water quality monitoring
72 Publications 3/4 B Gatos, K Ntzios, I Pratikakis, S Petridis, T Konidaris, and S J Perantonis An efficient segmentation-free approach to assist old greek handwritten manuscript ocr Pattern Analysis and Applications, 8(4): , 2006 A Agogino, J Gosh, S J Perantonis, V Virvilis, S Petridis, and PJG Lisboa The role of multiple, linear-projection based visualization techniques in rbf-based classification of high dimensional data In Proceedings of the International Joint Conference on Neural Networks, volume 3, 2000 S J Perantonis, N Ampazis, V Virvilis, and S Petridis Open issues in feedforward neural network training In LFTNC (Siena, Italy), 2001
73 Publications 4/4 G Petasis, S Petridis, and G Paliouras Symbolic and neural learning for named-entity recognition In GTselentis and H-J Zimmermann, editors, Advances in Computational Intelligence and, Methods and Applications Kluwer Academic Publishers, 2001 G Petasis, S Petridis, G Paliouras, and V Karkaletsis Symbolic and neural learning for named-entity recognition In Proceedings of European Best Practice Workshops and Symposium on Computational Intelligence and, 2000 B Gatos, K Ntzios, I Pratikakis, S Petridis, T Konidaris, and S J Perantonis A segmentation-free recognition technique to assist old greek handwritten manuscript ocr In Proceedings of Document Analysis Systems, volume 3163 of
Classification-Optimal Feature Transforms
Classification-Optimal Feature Transforms Sergios Petridis National and Kapodistrian University of Athens Department of Informatics and Telecommunications, NCSR Demokritos petridis@iitdemokritosgr Abstract
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationArtificial Neural Network Approach for Land Cover Classification of Fused Hyperspectral and Lidar Data
Artificial Neural Network Approach for Land Cover Classification of Fused Hyperspectral and Lidar Data Paris Giampouras 1,2, Eleni Charou 1, and Anastasios Kesidis 3 1 Computational Intelligence Laboratory,
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,
More informationFeature selection and extraction Spectral domain quality estimation Alternatives
Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationSupervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012
Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationFeature Extraction with Weighted Samples Based on Independent Component Analysis
Feature Extraction with Weighted Samples Based on Independent Component Analysis Nojun Kwak Samsung Electronics, Suwon P.O. Box 105, Suwon-Si, Gyeonggi-Do, KOREA 442-742, nojunk@ieee.org, WWW home page:
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationLinear Models for Classification
Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationNeutron inverse kinetics via Gaussian Processes
Neutron inverse kinetics via Gaussian Processes P. Picca Politecnico di Torino, Torino, Italy R. Furfaro University of Arizona, Tucson, Arizona Outline Introduction Review of inverse kinetics techniques
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationClassification via kernel regression based on univariate product density estimators
Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationMulti-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria
Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA
More informationSTUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
INTERNATIONAL JOURNAL OF INFORMATION AND SYSTEMS SCIENCES Volume 5, Number 3-4, Pages 351 358 c 2009 Institute for Scientific Computing and Information STUDY ON METHODS FOR COMPUTER-AIDED TOOTH SHADE DETERMINATION
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationRegularized Discriminant Analysis and Reduced-Rank LDA
Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationArtificial Intelligence Module 2. Feature Selection. Andrea Torsello
Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationThe Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationLEC 5: Two Dimensional Linear Discriminant Analysis
LEC 5: Two Dimensional Linear Discriminant Analysis Dr. Guangliang Chen March 8, 2016 Outline Last time: LDA/QDA (classification) Today: 2DLDA (dimensionality reduction) HW3b and Midterm Project 3 What
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationHigh Dimensional Discriminant Analysis
High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid ASMDA Brest May 2005 Introduction Modern data are high dimensional: Imagery:
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationIn: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999
In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationDimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014
Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their
More informationNon-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data
Non-Iterative Heteroscedastic Linear Dimension Reduction for Two-Class Data From Fisher to Chernoff M. Loog and R. P.. Duin Image Sciences Institute, University Medical Center Utrecht, Utrecht, The Netherlands,
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationClassification of high dimensional data: High Dimensional Discriminant Analysis
Classification of high dimensional data: High Dimensional Discriminant Analysis Charles Bouveyron, Stephane Girard, Cordelia Schmid To cite this version: Charles Bouveyron, Stephane Girard, Cordelia Schmid.
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationLecture: Face Recognition
Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationEECS490: Digital Image Processing. Lecture #26
Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationDoubly Stochastic Normalization for Spectral Clustering
Doubly Stochastic Normalization for Spectral Clustering Ron Zass and Amnon Shashua Abstract In this paper we focus on the issue of normalization of the affinity matrix in spectral clustering. We show that
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationLecture 13 Visual recognition
Lecture 13 Visual recognition Announcements Silvio Savarese Lecture 13-20-Feb-14 Lecture 13 Visual recognition Object classification bag of words models Discriminative methods Generative methods Object
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationClassification with Kernel Mahalanobis Distance Classifiers
Classification with Kernel Mahalanobis Distance Classifiers Bernard Haasdonk and Elżbieta P ekalska 2 Institute of Numerical and Applied Mathematics, University of Münster, Germany, haasdonk@math.uni-muenster.de
More informationL5: Quadratic classifiers
L5: Quadratic classifiers Bayes classifiers for Normally distributed classes Case 1: Σ i = σ 2 I Case 2: Σ i = Σ (Σ diagonal) Case 3: Σ i = Σ (Σ non-diagonal) Case 4: Σ i = σ 2 i I Case 5: Σ i Σ j (general
More informationA Statistical Analysis of Fukunaga Koontz Transform
1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationIntroduction to Graphical Models
Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 55 Applied Machine Learning Lecture 5: Generative models for linear classification Instructor: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp55 Unless otherwise noted, all material
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationRadial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition
Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.
More informationRecap from previous lecture
Recap from previous lecture Learning is using past experience to improve future performance. Different types of learning: supervised unsupervised reinforcement active online... For a machine, experience
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationTowards a Ptolemaic Model for OCR
Towards a Ptolemaic Model for OCR Sriharsha Veeramachaneni and George Nagy Rensselaer Polytechnic Institute, Troy, NY, USA E-mail: nagy@ecse.rpi.edu Abstract In style-constrained classification often there
More informationLEC 4: Discriminant Analysis for Classification
LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python
More informationCSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines
CSC2545 Topics in Machine Learning: Kernel Methods and Support Vector Machines A comprehensive introduc@on to SVMs and other kernel methods, including theory, algorithms and applica@ons. Instructor: Anthony
More informationA Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis
009 0th International Conference on Document Analysis and Recognition A Novel Rejection easurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis Chun Lei He Louisa Lam Ching
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More information