Self Organizing Maps

Similar documents
Multidimensional scaling (MDS)

Unsupervised learning: beyond simple clustering and PCA

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

6.036 midterm review. Wednesday, March 18, 15

Machine Learning (BSMC-GA 4439) Wenke Liu

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Machine Learning - MT Clustering

Curvilinear Components Analysis and Bregman Divergences

Freeman (2005) - Graphic Techniques for Exploring Social Network Data

Machine Learning & SVM

Lecture 11: Unsupervised Machine Learning

STAT 730 Chapter 14: Multidimensional scaling

Artificial Neural Networks. Edward Gatt

Dimensionality reduction. PCA. Kernel PCA.

Preprocessing & dimensionality reduction

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Statistical Pattern Recognition

LINGUIST 716 Week 9: Compuational methods for finding dimensions

Multivariate analysis

Non-linear Measure Based Process Monitoring and Fault Diagnosis

Learning Vector Quantization (LVQ)

DISTRIBUTIONAL SEMANTICS

Structured matrix factorizations. Example: Eigenfaces

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Proximity data visualization with h-plots

Distance Preservation - Part I

Learning Vector Quantization

Multilayer Perceptron

Neural networks and optimization

CSC242: Intro to AI. Lecture 21

Statistical Machine Learning

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

Statistical Learning Reading Assignments

Least Mean Squares Regression. Machine Learning Fall 2018

Hebbian Learning II. Robert Jacobs Department of Brain & Cognitive Sciences University of Rochester. July 20, 2017

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

5.6 Nonparametric Logistic Regression

Machine Learning for Data Science (CS4786) Lecture 2

18.9 SUPPORT VECTOR MACHINES

Neural networks and optimization

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning (Spring 2012) Principal Component Analysis

Kernel methods for comparing distributions, measuring dependence

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

L26: Advanced dimensionality reduction

ORIE 4741 Final Exam

PATTERN CLASSIFICATION

CPSC 540: Machine Learning

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Immediate Reward Reinforcement Learning for Projective Kernel Methods

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Reward-modulated inference

Multidimensional Scaling in R: SMACOF

Correspondence Analysis & Related Methods

through any three given points if and only if these points are not collinear.

6.5 Trigonometric Equations

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Experimental Design and Data Analysis for Biologists

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

DM-Group Meeting. Subhodip Biswas 10/16/2014

DIMENSION REDUCTION AND CLUSTER ANALYSIS

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Advanced Machine Learning & Perception

Table of Contents. Multivariate methods. Introduction II. Introduction I

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Machine Learning Basics: Maximum Likelihood Estimation

Pattern Recognition Approaches to Solving Combinatorial Problems in Free Groups

Error Functions & Linear Regression (1)

Lecture: Local Spectral Methods (1 of 4)

Last time: PCA. Statistical Data Mining and Machine Learning Hilary Term Singular Value Decomposition (SVD) Eigendecomposition and PCA

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Notion of Distance. Metric Distance Binary Vector Distances Tangent Distance

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Lecture 15: Random Projections

Singular Value Decompsition

Network Topology Inference from Non-stationary Graph Signals

Hierarchical models for multiway data

Natural Numbers: Also called the counting numbers The set of natural numbers is represented by the symbol,.

Lesson 28: Another Computational Method of Solving a Linear System

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Dimension Reduction Methods

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Lecture 9: SVD, Low Rank Approximation

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Oil Field Production using Machine Learning. CS 229 Project Report

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. I.Priyanka 1. Research Scholar,Bharathiyar University. G.

MTH Linear Algebra. Study Guide. Dr. Tony Yee Department of Mathematics and Information Technology The Hong Kong Institute of Education

Introduction to Neural Networks

Classification with Perceptrons. Reading:

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Transcription:

Sta306b May 21, 2012 Dimension Reduction: 1 Self Organizing Maps A SOM represents the data by a set of prototypes (like K-means. These prototypes are topologically organized on a lattice structure. In the example, each circle represents a prototype. The points associated with each prototype have been randomly jittered (for viewing).

Sta306b May 21, 2012 Dimension Reduction: 2 1 2 3 4 5 1 2 3 4 5 A SOM is a discrete version of a principal surface, where the surface is represented by a topologically constrained set of prototypes.

Sta306b May 21, 2012 Dimension Reduction: 3 SOM Algorithm Each prototype m k has a pair of integer lattice coordinates, e.g. l k = (2,4). The SOM algorithms are typically online. Initialize by placing a lattice of prototypes uniformly on the principal component plane. Infinite loop until stabilization: For observation x i, identify the closest prototype m j. Move: m k m k +αh( l j l k )(x i m k ), where h dies down with increasing topological distance between prototypes. Batch versions of the SOM algorithm are similar to K-means and principal surfaces.

Sta306b May 21, 2012 Dimension Reduction: 4 SOM: Example Document Organization and Retrieval Example taken (with permission from Kohonen et al) from WEBSOM homepage: http://websom.hut.fi/websom/. Data are 12,088 newsgroup comp.ai.neural-nets articles. Observations are represented by a term-document matrix, where for each document, the features are the relative frequency of each of a dictionary of terms (e.g. 50,000 words). WEBSOM software uses a randomized version of the SVD to initially reduce the dimension of the data. WEBSOM has a zoom feature which allows one to interact with the map.

Sta306b May 21, 2012 Dimension Reduction: 5 WEBSOM heatmap

Sta306b May 21, 2012 Dimension Reduction: 6 Multidimensional Scaling Like principal surfaces and SOM, MDS delivers a low-dimensional mapping of high-dimensional data. MDS requires only interpoint distances, while PS/SOM require coordinate data. MDS delivers low dimensional coordinates for each observation; PS/SOM deliver a mapping function. XGvis is freely available (and wonderful) software that implements MDS (Buja, Swayne, Littman and Dean, 1998); http://www.research.att.com/areas/stat/xgobi Example: in a consumer survey, respondents are asked to compare different products by giving a pairwise similarity ranking. Based on the average dissimilarities, MDS will produce a two-dimensional mapping of the products.

Sta306b May 21, 2012 Dimension Reduction: 7 MDS: Example Political Science students are asked to give similarities between countries, based on social and political environment (Kaufman and Rousseeuw, 1990). BEL BRA CHI CUB EGY FRA IND ISR USA USS YUG BRA 5.58 CHI 7.00 6.50 CUB 7.08 7.00 3.83 EGY 4.83 5.08 8.17 5.83 FRA 2.17 5.75 6.67 6.92 4.92 IND 6.42 5.00 5.58 6.00 4.67 6.42 ISR 3.42 5.50 6.42 6.42 5.00 3.92 6.17 USA 2.50 4.92 6.25 7.33 4.50 2.25 6.33 2.75 USS 6.08 6.67 4.25 2.67 6.00 6.17 6.17 6.92 6.17 YUG 5.25 6.83 4.50 3.75 5.75 5.42 6.08 5.83 6.67 3.67 ZAI 4.75 3.00 6.08 6.67 5.00 5.58 4.83 6.17 5.67 6.50 6.92

Sta306b May 21, 2012 Dimension Reduction: 8 MDS solution Second MDS Coordinate -2-1 0 1 2 3 ZAI BRA EGY USA BEL ISR FRA IND YUG USS CHI CUB -2 0 2 4 First MDS Coordinate Colored sets denote partitioning found by K-medoids (discussed earlier).

Sta306b May 21, 2012 Dimension Reduction: 9 Background Given data x 1,x 2,...x n R p, with d ii = x i x i, we seek values z 1,z 2...z n R k with k < p so that d ii z i z i. Similar in flavor to principal curves, where points close in original space should map to close points in the reduced space. But in principal curves, points far apart could also map together. But not in MDS, which tries to preserve all pairwise distances.

Sta306b May 21, 2012 Dimension Reduction: 10 MDS Algorithms Least Squares or Kruskal Shephard Metric Scaling minimizes i i (d ii z i z i ) 2 ( stress function ) with respect to coordinates z i, using gradient descent. Not a nice function! Sammon Mapping minimizes (d ii z i zi ) 2. d i i ii More emphasis is put on preserving smaller pairwise distances. Shephard Kruskal Nonmetric Scaling (using only ranks) minimizes i,i [θ( z i z i ) d ii ] 2 i,i d 2 ii

Sta306b May 21, 2012 Dimension Reduction: 11 over the z i and an arbitrary increasing function θ( ). Uses isotonic regression.

Sta306b May 21, 2012 Dimension Reduction: 12 Classical MDS A slightly different formulation, that leads to a simple eigenvalue problem. Similarities S ii = x i x i,x i x i (centered inner product) Minimize i i [S ii z i z i,z i z i ] 2 Details of solution in exercise 14.11 in text. Classical scaling uses the result x i x i 2 = x i 2 + x i 2 2 x i,x i if data are centered and lie in a circle, then x i 2 is constant, and classical MDS is equivalent to LS scaling. In general a good approximation.

Sta306b May 21, 2012 Dimension Reduction: 13 Applications of MDS social sciences- summarizing proximity data archeology- similarity of two digging sites can be quantified by frequency of shared features of artifacts found at each site classification problems- analysis of pairwise classification rates with many classes. graph layout

Sta306b May 21, 2012 Dimension Reduction: 14 classical scaling Sammon map y 6 4 2 0 1 45 678 91011 12 1314 15 16 17 3 18 19 2 20 y 6 4 2 0 1 45 678 91011 12 1314 15 16 17 3 18 19 2 20 4 2 0 2 4 x 4 2 0 2 4 x isomds y 6 4 2 0 1 45 678 91011 12 1314 15 16 17 3 18 19 2 20 4 2 0 2 4 x

Sta306b May 21, 2012 Dimension Reduction: 15 classical scaling Sammon map y 6 4 2 0 1 3 4567 891011 12 1314 15 16 17 18 2 19 20 y 6 4 2 0 1 3 4567 891011 12 1314 15 16 17 18 2 19 20 4 2 0 2 4 x 4 2 0 2 4 x isomds y 6 4 2 0 1 3 4567 891011 12 1314 15 16 17 18 2 19 20 4 2 0 2 4 x

Sta306b May 21, 2012 Dimension Reduction: 16 Classical vs local MDS Classical MDS Local MDS x2 15 10 5 0 x2 15 10 5 0 5 0 5 x 1 5 0 5 x 1

Sta306b May 21, 2012 Dimension Reduction: 17 The orange points show data lying on a parabola, while the blue points shows multidimensional scaling representations in one dimension. Classical multidimensional scaling (left panel) does not preserve the ordering of the points along the curve, because it judges points on opposite ends of the curve to be close together. In contrast, local multidimensional scaling (right panel) does a good job of preserving the ordering of the points along the curve.

Sta306b May 21, 2012 Dimension Reduction: 18 Glossary Machine learning/ai STATISTICS neural network model self-organizing map principal surface weights parameters learning fitting generalization test set performance supervised learning regression/classification unsupervised learning density estimation optimal brain damage model selection large grant = $500,000 large grant= $5,000 nice place to have a meeting: Snowbird, Utah, French Alps nice place to have a meeting: Las Vegas in August