Correlation Preserving Unsupervised Discretization. Outline

Similar documents
Unsupervised Data Discretization of Mixed Data Types

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Machine Learning 2nd Edition

7. Variable extraction and dimensionality reduction

Supervised locally linear embedding

Lecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm.

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Dimensionality Reduction

LEC 3: Fisher Discriminant Analysis (FDA)

Dimensionality Reduction Techniques (DRT)

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Unsupervised Learning. k-means Algorithm

Principal Component Analysis

PRINCIPAL COMPONENT ANALYSIS

CS5112: Algorithms and Data Structures for Applications

Machine learning for pervasive systems Classification in high-dimensional spaces

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Discriminative Direction for Kernel Classifiers

Principal Components Analysis. Sargur Srihari University at Buffalo

Dimensionality Reduction and Principal Components

L11: Pattern recognition principles

PCA, Kernel PCA, ICA

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

Principal Components Analysis (PCA)

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Lecture 7: Con3nuous Latent Variable Models

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Preprocessing & dimensionality reduction

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

Department of Computer Science and Engineering

CS145: INTRODUCTION TO DATA MINING

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Lecture 2: Linear Algebra Review

Data Mining and Analysis

Noise & Data Reduction

PRINCIPAL COMPONENTS ANALYSIS

Data Mining Techniques

What is semi-supervised learning?

Data Exploration and Unsupervised Learning with Clustering

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2016

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Classification Using Decision Trees

Chapter XII: Data Pre and Post Processing

CS281 Section 4: Factor Analysis and PCA

Nonlinear Dimensionality Reduction. Jose A. Costa

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

Basics of Multivariate Modelling and Data Analysis

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Lecture 8: Principal Component Analysis; Kernel PCA

CSE 5243 INTRO. TO DATA MINING

STA 414/2104: Lecture 8

Unsupervised Learning: K- Means & PCA

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Feature Engineering, Model Evaluations

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks

Modern Information Retrieval

Why Spatial Data Mining?

When Dictionary Learning Meets Classification

Convergence of Eigenspaces in Kernel Principal Component Analysis

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Statistical Pattern Recognition

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

What is Principal Component Analysis?

Density-Based Clustering

Methods for sparse analysis of high-dimensional data, II

Dimensionality Reduction and Principle Components

Unsupervised Learning: Dimensionality Reduction

CSE 5243 INTRO. TO DATA MINING

A Methodology for Direct and Indirect Discrimination Prevention in Data Mining

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Dimension Reduction (PCA, ICA, CCA, FLD,

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Intelligent Data Analysis Lecture Notes on Document Mining

Expectation Maximization

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Mathematical foundations - linear algebra

CS 6375 Machine Learning

Dimension reduction methods: Algorithms and Applications Yousef Saad Department of Computer Science and Engineering University of Minnesota

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

Principal Component Analysis (PCA)

Machine Learning: Pattern Mining

Deriving Principal Component Analysis (PCA)

Dimensionality reduction

Dimensionality Reduction

On Improving the k-means Algorithm to Classify Unclassified Patterns

Recognition Using Class Specific Linear Projection. Magali Segal Stolrasky Nadav Ben Jakov April, 2015

Relations Between Adjacency And Modularity Graph Partitioning: Principal Component Analysis vs. Modularity Component Analysis

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Covariance and Correlation Matrix

Linear Dimensionality Reduction

Face Detection and Recognition

Transcription:

Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization Results Summary/Conclusion 1

Paper References S. Mehta, S. Parthasarathy, and H. Yang. Toward Unsupervised Correlation Preserving Discretization, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 8, 2005. S. Mehta, S. Parthasarathy, and H. Yang. Correlation Preserving Discretization, Proceedings of the Fourth IEEE International Conference on Data Mining, 2004. What is discretization? Data Continuous (aka quantitative) (i.e. weight (lbs): 155.2, 160.2, 199.3) Categorical (aka quantitative) (i.e. color: red, blue, green, white) Discretization Transformation of continuous variable to categorical variable i.e. weight (lbs): 155.2, 160.2, 199.3 -> low, medium, high Consequence: lose information (i.e. correlation) Dimensions of discretizations Unsupervised vs supervised Unsupervised: no class label» Only 2 main algorithms known: equal-width and equal-frequency Supervised: class label» Many algorithms exists Dynamic vs static Dynamic: while learning takes place Static: before learning takes place Local vs global Local: subset of variables at a time Global: all variables at a time Top-down vs bottom-up Top-down: start with empty set of cutpoints and add cutpoints Bottom-up: start with all data points as cutpoints and merges Why even bother with discretization? Many learning algorithms accept data set in only categorical form 2

Motivation What if you had mixed-data type in your dataset, had no class label, and want the discretization to preserve correlation in the continuous domain in the categorical domain? Are there algorithms that aim to solve this problem? S.D. Bay, Multivariate Discretization for Set Mining, Knowledge and Information Systems, Vol. 3, No. 4, pp. 491-512, 2001. No dimension reduction; computationally expensive M.C. Ludl and G. Widmer, Relative Unsupervised Discretization for Association Rule Mining, Proc. Second Int l Conf. Principles and Practice of Knowledge Discovery in Databases, pp. 148-158, 2000. No consideration of interdependence between variables Mehta et al Use of PCA to deal with high dimensionality PCA generates a set of n orthogonal vectors from the input data set with a dimensionality N, where n < N and n orthogonal directions preserve most of the variance in the input data Use of association mining to handle mixed-data type Principal Component Analysis PCA generates a set of n orthogonal vectors from the input data set with a dimensionality N, where n < N and n orthogonal directions preserve most of the variance in the input data Steps Generate n x n correlation matrix Get the eigenvectors and eigenvalues of correlation matrix Eigenvectors represent orthogonal axis Eigenvalue represent variance corresponding to its eigenvector Retain k (k < n) eigenvectors that correspond to the largest eigenvalues which add up to 90% 3

Association Mining Association mining generates association rules Let I = { i 1, i 2, i 3,, i m } be a set of m distinct attributes (items) Let each transaction, T, in a database, D, contain a set of items such that T is a subset of I Association rule is an expression of the form A => B, where A, B are proper subsets of I, and A B = 0. Each itemset is said to have support S if S percent of T in D contain the itemset Similarity metric of association patterns generated by two data set (or two samples of the same data set) Let A and B be two frequent itemsets for database samples d1 and d2, respective. For an element x in A, let supp d1 (x) be the frequency of x in d1 and for an element x in B, let supp d2 (x) be the frequency of x in d2 sim( d1, d2) = x A B max(0,1 α sup d1 A B ( x) sup d 2 ( x) ) Correlation Preserving Discretization 1. Normalization and mean centralization 2. Eigenvector computation 3. Data projection onto eigenspace 4. Discretization in eigenspace 1. Only continuous: apply clustering to generate n cutpoints 2. Mixed-data: 1. Compute frequent itemsets from all categorical variables, A 2. Split eigendimension into equal-frequency and compute frequent itemsets in each interval subject to being subset of A 3. Compute similarity between contiguous intervals and merge (threshold must be defined) 5. Correlate original dimensions with eigenvectors 6. Reproject eigen cut-points to original dimensions 1. K-NN: 1. find knn points to cutpoint in eigenspace, take mean/median of knn points and cutpoint in original representation space 2. Direct Projection: 1. cos(θ ij ) = e i x o i, where e i is the i-th eigenvector and o i is a N-dimensional unit vector along the j-th dimension, multiply cos(θ ij ) with each cutpoint 4

Results - Dataset Results Compared to MVD and ME-MDL 5

Results Compared to MVD and ME-MDL Correlation preserving discretization is more meaningful Population within an interval should exhibit similar properties Population in different intervals should exhibit different properties Intuitive cutpoints Adult Age: cutpoints similar to MVD; meaningful cutpoints (marriage, retirement, education) Adult Capitial Gain: (low, medium, high) Adult Capital Loss: (binary) Adult Hours/week: (age correlation to hours) Results - Classification 6

Results - Classification Results - Classification Compared to other classifiers C4.5, IBK, PART, Bayes, ONER, Kernelbased, SMO Show lowest error rate in 8 of 13 datasets Up to 30% missing at random datasets did not produce significant differences in error rate 7

Summary/Conclusion This methods uses and preserves correlation between all variables of mixedtype data to discretize Discretization is meaningful and intuitive Promising results in classification and missing data problems 8