Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures
|
|
- Stuart Ray
- 5 years ago
- Views:
Transcription
1 Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures STIAN NORMANN ANFINSEN ROBERT JENSSEN TORBJØRN ELTOFT COMPUTATIONAL EARTH OBSERVATION AND MACHINE LEARNING LABORATORY DEPARTMENT OF PHYSICS AND TECHNOLOGY UNIVERSITY OF TROMSØ, NORWAY 1/54
2 Outline Motivation Introduction to Spectral Clustering Distance Measures for PolSAR Covariance Matrices A New Algorithm Results Conclusions and Future Work 2/54
3 Motivation Seeking (near) optimal statistical classification Disregarding covariance matrix structure (decomposition theory) and spatial information - for now Improve on the Wishart classifier Lee et al. (IJRS, 1994), Lee et al. (TGRS, 1999), Pottier & Lee (EUSAR, 2000),... Apply modern pattern recognition tools Kernel methods, spectral clustering, information theoretic learning 3/54
4 The Wishart Classifier Revisited Initialisation: Segmentation in H/A/α space Cloude-Pottier-Wishart (CPW) classifier Class mean coherency matrices V i calculated from initial partitioning of data; V i = < T j pixel j class i >, i = 1,..., k T j = < kk H > k = 1 2 [S hh +S vv, S hh S vv, 2S hv ] T. 4/54
5 The Wishart Classifier Revisited Initialisation: Segmentation in H/A/α space Cloude-Pottier-Wishart (CPW) classifier Class mean coherency matrices V i calculated from initial partitioning of data Iterative classification Minimum distance classification based on Wishart distance between the pixel coherency matrix T and V i : ω j = min i d W (T, V i ), i 1,..., k Iterative reclassification and update of class means 5/54
6 The Wishart Classifier Revisited Delivers consistently good results. Few parameters, easy to use, computationally efficient, approaches a ML solution - if it converges. But has some drawbacks: The initialisation uses a fixed number of classes, and is restricted to one class per predetermined zone in H/A/α space. Inherits the well known disadvantages of k-means. E.g., converence is not guaranteed, and may be slow. Conclusion: State of the art algorithms from pattern recognition and machine learning should be tested. 6/54
7 Clustering by Pairwise Affinities Based on distances d i j between all pixel pairs (i,j). Propagates similarity from pixel to pixel. Yields flexible discrimination surfaces. Nonlinear mapping to kernel space, where clustering is done with linear methods. The mapping is found by eigendecomposition Examples of capabilities Input space Kernel space 7/54
8 Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 8/54
9 Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a a 1N A = a 21 a a 2N..... a N1 a N2... a NN 9/54
10 Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a a 1N A = a 21 a a 2N..... a N1 a N2... a NN 10/54
11 Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a a 1N A = a 21 a a 2N..... a N1 a N2... a NN The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. There are different ways of using the eigenvalues and eigenvectors of A to obtain an optimal clustering. 11/54
12 Spectral Clustering Pairwise distances d i j are transformed to affinities, e.g.: { } a i j = exp d2 i j 2σ 2 Pairwise affinities a i j between N data points are stored in an affinity matrix A. a 11 a a 1N A = a 21 a a 2N..... a N1 a N2... a NN The optimal data partitioning is derived from the eigendecomposition of A. Hence, spectral clustering. E.g.: Using u eigenvectors corresponding to the largest eigenvalues new u-dimensional feature space (eigenspace): e T 1 e T 2. e T u = [φ 1φ 2... φ N ] 12/54
13 Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i 13/54
14 Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. 14/54
15 Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. 15/54
16 Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. We use an information theoretic method, which partitions data by implicit maximization of the Cauchy-Schwarz divergence between the cluster pdf s in input space. Pdf s are estimated nonparametrically. 16/54
17 Spectral Clustering We have a mapping from input feature space to eigenspace: Φ(T i ) : T i φ i The eigenspace feature set can be clustered by simple, linear discrimination methods, e.g. k-means with Euclidean distance. We use an information theoretic method, which partitions data by implicit maximization of the Cauchy-Schwarz divergence between the cluster pdf s in input space. Pdf s are estimated nonparametrically. Data points outside the size N sample can be mapped to eigenspace using the Nyström approximation: Φ j (T) N λ j N e ji d(t, T i ), j = 1,..., u. i=1 17/54
18 Relation to Kernel Methods May be related to Mercer kernel-based algorithms, such as: Support Vector Machines, Kernel PCA, Kernel k-means, etc. The pairwise affinities are inner products in a Mercer kernel space a i j = a(t i, T j ) = < φ i, φ j >, a i j is a Mercer kernel function and A a Mercer kernel matrix iff: a(t i, T j ) is semi-positive definite a(t i, T j ) is symmetric a(t i, T j ) is continuous With these restrictions, how do we select the distance measure? 18/54
19 Coherency Matrix Distance Measures Wishart distance (Lee et al., IJRS 94) d W (T 1, T 2 ) = ln T 2 + tr(t 1 2 T 1). 19/54
20 Coherency Matrix Distance Measures Wishart distance (Lee et al., IJRS 94) d W (T 1, T 2 ) = ln T 2 + tr(t 1 2 T 1). Can be symmetrized, but d W (T i, T i ) depends on T i. Not suitable! 20/54
21 Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 21/54
22 Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. 22/54
23 Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS.03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. Symmetrized normalized log-likelihood distance (Proposed here) d SNLL (T 1, T 2 ) = 1 2 ( tr(t 1 1 T 2 + T 1 2 T 1) ) p. 23/54
24 Coherency Matrix Distance Measures Bartlett distance (Conradsen et al., TGRS 03) ( T1 + T d B (T 1, T 2 ) = ln 2 2 ) 2p ln 2. T 1 T 2 Based on log-likelihood ratio test of equality for two unknown covariance matrices. Symmetrized normalized log-likelihood distance (Proposed here) d SNLL (T 1, T 2 ) = 1 ( tr(t T 2 + T 1 2 T 1) ) p. Based on log-likelihood ratio test of equality for one known and one unknown covariance matrix. Symmetrized version of revised Wishart distance (Kersten et al., TGRS 05) 24/54
25 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. 25/54
26 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. 26/54
27 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in eigenspace, using the Nyström approximation. 27/54
28 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. 28/54
29 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. The latter solution has much lower computational cost. Our experience is that the classification results are essentially equal. 29/54
30 The New Algorithm Summary Replaces H/A/α space initialisation with spectral clustering. A subset of N pixels, randomly sampled from the image, is clustered. Remaining pixels may be classified in kernel space (eigenspace), using the Nyström approximation. Alternatively, remaining pixels may be classified in input space with the minimum distance Wishart classifier. The latter solution has much lower computational cost. Our experience is that the classification results are essentially equal. Hence, only the initialisation of the CPW classifier is changed. 30/54
31 The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. 31/54
32 The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost 32/54
33 The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost Kernel bandwidth: σ Robust automatic selection rule is under investigation 33/54
34 The New Algorithm Parameters Number of clusters: k Must be manually selected, but the effective number of classes in the classification result, k e f f, is data adaptive. Sample size: N Trade-off with computational cost Kernel bandwidth: σ Robust automatic selection rule is under investigation Eigenspace dimension: u Can be fixed to u = k for simplicity 34/54
35 POLinSAR 2007 Frascati Test Data Set: Flevoland, L-band 200x320 subset of AIRSAR L-band data set of agricultural area in Flevoland, The Netherlands, August Courtesy of NASA/JPL. 35/54
36 Ground Truth Data 36/54
37 Evaluation Qualitative analysis (visual inspection) Quantitative analysis We calculate a matching matrix M that relates predicted (P) and actual (A) class labels, and derive classification merits from M (Ferro-Famil et al., TGRS 01): Descriptivity D i : The fraction of the dominant predicted class labels within an actual class (quantifies homogeneity). Compactness C i : Quantifies to what extent the dominant predicted class also dominates other actual classes. Representivity R i : Quantifies to what extent the dominant predicted class is predicted for other actual classes. 37/54
38 Qualitative Analysis Cloude-Pottier-Wishart (CPW) Classifier Parameters: k=16, k e f f =9, it=10 (No. iterations in Wishart classifier). 38/54
39 Qualitative Analysis Cloude-Pottier-Wishart (CPW) Classifier Observations: Class 2 and 9 covered by same cluster. Class 4 and 10 covered by same cluster. Homogeneous classification in the ground truth areas. 39/54
40 Qualitative Analysis Bartlett Spectral Wishart (BSW) Classifier Parameters: k=16, k e f f =12, σ = 0.42, N=6400 (10%), it=10. 40/54
41 Qualitative Analysis Bartlett Spectral Wishart (BSW) Classifier Observations: Class 1 and 5 covered by same cluster. Some interference by a second cluster in class 3 and 5. Not as homogeneous classification as for CPW classifier, but better delineation of some areas. 41/54
42 POLinSAR 2007 Frascati Qualitative Analysis SNLL Spectral Wishart (SSW) Classifier Parameters: k=16, ke f f =15, σ = 0.42, N=6400 (10%), it=10. 42/54
43 Qualitative Analysis SNLL Spectral Wishart (SSW) Classifier Observations: Unique dominant cluster for all ground truth areas. Less homogeneous classification than other methods, much due to the higher effective number of classes. 43/54
44 Matching matrix for CPW classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A A A A A A A A A A R i /54
45 Matching matrix for Bartlett distance classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A A A A A A A A A A R i /54
46 Matching matrix for SNLL distance classifier D i P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 C i A A A A A A A A A A R i /54
47 Quantitative Analysis: Descriptivity 47/54
48 Quantitative Analysis: Compactness 48/54
49 Quantitative Analysis: Representivity 49/54
50 Quantitative Analysis: Effective no. classes 50/54
51 Convergence Speed 51/54
52 Conclusions and Future Work We have selected two distance measures suited for calculation of pairwise affinities for PolSAR data coherency matrices. We have demonstrated how PolSAR data can be segmented by spectral clustering of coherency matrices The algorithm improves the classification result of the CPW classifier, while using the same information (derived from the statistics of a single pixel). Performance analysis shows that spectral clustering gives a better initialisation of the Wishart classifier than the H/A/α initialisation, both in terms of classification result and covergence speed. 52/54
53 Conclusions and Future Work Further work will concentrate on methods for robust selection of the kernel bandwidth σ, and studies of the data adaptive k e f f, in order to develop and verify a fully automatic segmentation algorithm. We will also study how spatial information and information from polarimetric decompositions can be included in the distance measure, to assimilate more prior information in the kernel function. The algorithm will be tested on different data sets. 53/54
54 Thank you! Stian Normann Anfinsen Computational Earth Observation and Machine Learning Laboratory University of Tromsø URL: 54/54
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway
More informationLand Cover Feature recognition by fusion of PolSAR, PolInSAR and optical data
Land Cover Feature recognition by fusion of PolSAR, PolInSAR and optical data Shimoni, M., Borghys, D., Heremans, R., Milisavljević, N., Pernel, C. Derauw, D., Orban, A. PolInSAR Conference, ESRIN, 22-26
More informationEvaluation and Bias Removal of Multi-Look Effect on Entropy/Alpha /Anisotropy (H/
POLINSAR 2009 WORKSHOP 26-29 January 2009 ESA-ESRIN, Frascati (ROME), Italy Evaluation and Bias Removal of Multi-Look Effect on Entropy/Alpha /Anisotropy (H/ (H/α/A) Jong-Sen Lee*, Thomas Ainsworth Naval
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationManifold Coarse Graining for Online Semi-supervised Learning
for Online Semi-supervised Learning Mehrdad Farajtabar, Amirreza Shaban, Hamid R. Rabiee, Mohammad H. Rohban Digital Media Lab, Department of Computer Engineering, Sharif University of Technology, Tehran,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationDUAL FREQUENCY POLARIMETRIC SAR DATA CLASSIFICATION AND ANALYSIS
Progress In Electromagnetics Research, PIER 31, 247 272, 2001 DUAL FREQUENCY POLARIMETRIC SAR DATA CLASSIFICATION AND ANALYSIS L. Ferro-Famil Ecole Polytechnique de l Université de Nantes IRESTE, Laboratoire
More informationMaximum Within-Cluster Association
Maximum Within-Cluster Association Yongjin Lee, Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 3 Hyoja-dong, Nam-gu, Pohang 790-784, Korea Abstract This paper
More informationMulti-Class Linear Dimension Reduction by. Weighted Pairwise Fisher Criteria
Multi-Class Linear Dimension Reduction by Weighted Pairwise Fisher Criteria M. Loog 1,R.P.W.Duin 2,andR.Haeb-Umbach 3 1 Image Sciences Institute University Medical Center Utrecht P.O. Box 85500 3508 GA
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationMLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT
MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018
More informationStudy and Applications of POLSAR Data Time-Frequency Correlation Properties
Study and Applications of POLSAR Data Time-Frequency Correlation Properties L. Ferro-Famil 1, A. Reigber 2 and E. Pottier 1 1 University of Rennes 1, Institute of Electronics and Telecommunications of
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationMultiple Similarities Based Kernel Subspace Learning for Image Classification
Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationEstimation of the Equivalent Number of Looks in Polarimetric Synthetic Aperture Radar Imagery
PUBLISHED IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 47, NO., NOVEMBER 9 Estimation of the Equivalent Number of Looks in Polarimetric Synthetic Aperture Radar Imagery Stian Normann Anfinsen,
More informationEigenface-based facial recognition
Eigenface-based facial recognition Dimitri PISSARENKO December 1, 2002 1 General This document is based upon Turk and Pentland (1991b), Turk and Pentland (1991a) and Smith (2002). 2 How does it work? The
More informationA New Model-Based Scattering Power Decomposition for Polarimetric SAR and Its Application in Analyzing Post-Tsunami Effects
A New Model-Based Scattering Power Decomposition for Polarimetric SAR and Its Application in Analyzing Post-Tsunami Effects Yi Cui, Yoshio Yamaguchi Niigata University, Japan Background (1/5) POLSAR data
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationMaximum likelihood SAR tomography based on the polarimetric multi-baseline RVoG model:
Maximum likelihood SAR tomography based on the polarimetric multi-baseline RVoG model: Optimal estimation of a covariance matrix structured as the sum of two Kronecker products. L. Ferro-Famil 1,2, S.
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationOverview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated
Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian
More informationLearning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014
Learning with Noisy Labels Kate Niehaus Reading group 11-Feb-2014 Outline Motivations Generative model approach: Lawrence, N. & Scho lkopf, B. Estimating a Kernel Fisher Discriminant in the Presence of
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSpectral and Spatial Methods for the Classification of Urban Remote Sensing Data
Spectral and Spatial Methods for the Classification of Urban Remote Sensing Data Mathieu Fauvel gipsa-lab/dis, Grenoble Institute of Technology - INPG - FRANCE Department of Electrical and Computer Engineering,
More informationSpectral Generative Models for Graphs
Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known
More informationOrdinary Differential Equations II
Ordinary Differential Equations II February 9 217 Linearization of an autonomous system We consider the system (1) x = f(x) near a fixed point x. As usual f C 1. Without loss of generality we assume x
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationA Least Squares Formulation for Canonical Correlation Analysis
A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationLearning Spectral Graph Segmentation
Learning Spectral Graph Segmentation AISTATS 2005 Timothée Cour Jianbo Shi Nicolas Gogin Computer and Information Science Department University of Pennsylvania Computer Science Ecole Polytechnique Graph-based
More informationFunctional Analysis Review
Functional Analysis Review Lorenzo Rosasco slides courtesy of Andre Wibisono 9.520: Statistical Learning Theory and Applications September 9, 2013 1 2 3 4 Vector Space A vector space is a set V with binary
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationBasic Calculus Review
Basic Calculus Review Lorenzo Rosasco ISML Mod. 2 - Machine Learning Vector Spaces Functionals and Operators (Matrices) Vector Space A vector space is a set V with binary operations +: V V V and : R V
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationAn indicator for the number of clusters using a linear map to simplex structure
An indicator for the number of clusters using a linear map to simplex structure Marcus Weber, Wasinee Rungsarityotin, and Alexander Schliep Zuse Institute Berlin ZIB Takustraße 7, D-495 Berlin, Germany
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationSupport Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationCS 664 Segmentation (2) Daniel Huttenlocher
CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationSoil moisture retrieval over periodic surfaces using PolSAR data
Soil moisture retrieval over periodic surfaces using PolSAR data Sandrine DANIEL Sophie ALLAIN Laurent FERRO-FAMIL Eric POTTIER IETR Laboratory, UMR CNRS 6164, University of Rennes1, France Contents Soil
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationProbabilistic Machine Learning. Industrial AI Lab.
Probabilistic Machine Learning Industrial AI Lab. Probabilistic Linear Regression Outline Probabilistic Classification Probabilistic Clustering Probabilistic Dimension Reduction 2 Probabilistic Linear
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationOBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES
OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES THEORY AND PRACTICE Bogustaw Cyganek AGH University of Science and Technology, Poland WILEY A John Wiley &. Sons, Ltd., Publication Contents Preface Acknowledgements
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationPolarimetry-based land cover classification with Sentinel-1 data
Polarimetry-based land cover classification with Sentinel-1 data Banqué, Xavier (1); Lopez-Sanchez, Juan M (2); Monells, Daniel (1); Ballester, David (2); Duro, Javier (1); Koudogbo, Fifame (1) 1. Altamira-Information
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationAnalysis Preliminary Exam Workshop: Hilbert Spaces
Analysis Preliminary Exam Workshop: Hilbert Spaces 1. Hilbert spaces A Hilbert space H is a complete real or complex inner product space. Consider complex Hilbert spaces for definiteness. If (, ) : H H
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationKernel Learning with Bregman Matrix Divergences
Kernel Learning with Bregman Matrix Divergences Inderjit S. Dhillon The University of Texas at Austin Workshop on Algorithms for Modern Massive Data Sets Stanford University and Yahoo! Research June 22,
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationExploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationRanking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization
Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis
More information