Constraint-based Subspace Clustering

Size: px
Start display at page:

Download "Constraint-based Subspace Clustering"

Transcription

1 Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30

2 Traditional Clustering Partitions the data into groups (clusters) of similar objects Similarity : based on distances or density Traditional methods use all features (dimensions) to identify clusters in the data Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 2 / 32

3 Clustering examples Synthetic data K-means clustering Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 3 / 32

4 Problems When dealing with high-dimensional data : «Curse of dimensionality»[beyer et al.,1999] : Distance-based : the distance to the nearest neighbor is nearly equal to the distance to the farthest neighbor Density-based : it is difficult to determine dense regions in high-dimensional data Data may have many irrelevant dimensions Subspace Clustering for High Dimensional Data : A Review, Parsons et al. KDD Explorations 2004 Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 4 / 32

5 Solutions? Dimensionality reduction? (e.g. PCA) Aims at discarding irrelevant dimensions BUT : dimensions are often not «globally»irrelevant Detecting Clusters in Moderate-to-high Dimensional Data, A. Zimek, Tutorial on Subspace Clustering at KDD Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 5 / 32

6 Gene Expression Data Analysis Columns : Genes Rows : Experiment conditions or samples. Values : relative abundance of the mrna of a gene under a specific condition Task : Cluster the samples w.r.t. their similarity on gene expression values Samples may be clustered differently depending on considered subsets of genes Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 6 / 32

7 Gene Expression Data Analysis Add instance-level Constraints on couples of samples : Some are known to result from similar experiment conditions, and must belong to the same subspace cluster. Others turn out from different experiment conditions and cannot be linked by a subspace. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 7 / 32

8 Solution Constraint-based Subspace Clustering! Techniques that automatically detect clusters in subspaces of the data while ensuring the instance-level constraints are satisfied How can it be done efficiently? Naïve solution Check whether each possible subspace of a d-dimensional dataset is a subspace cluster satisfying the instance-level constraints. Runtime complexity : O(2 d ) Infeasible! Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 8 / 32

9 Solution Constraint-based Subspace Clustering! Techniques that automatically detect clusters in subspaces of the data while ensuring the instance-level constraints are satisfied How can it be done efficiently? Naïve solution Check whether each possible subspace of a d-dimensional dataset is a subspace cluster satisfying the instance-level constraints. Runtime complexity : O(2 d ) Infeasible! Integrating instance-level constraints into the subspace clustering mining process Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 8 / 32

10 Outline of the talk 1 Subspace clustering 2 Constraint-based Subspace clustering 3 Experimental Results 4 Conclusion Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 9 / 32

11 Subspace Clustering Strategies Top-down Start with an initial approximation of the clusters in full feature space (ex : k-medoids) Iteratively refine the current clustering by projecting clusters to a lower-dimensional space Pb : do not guarantee the best clustering Bottom-up First consider clusters in 1-dimensional spaces Iteratively join subspaces to form higher dimensional ones. Pb : complexity of the enumeration process try to prune the enumeration as much as possible! Use a clustering criterion that implements the downward closure property. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 10 / 32

12 CLIQUE [Agrawal et al,98] Pioneering approach (several extensions already) Grid- and density-based approach Each dimension is partitioned into equal-sized intervals : 1-dimensional units A k-dimensional unit is the intersection of k units of different dimensions A k-dimensional unit is dense iff it contains at least σ objects Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 11 / 32

13 CLIQUE [Agrawal et al,98] Pioneering approach (several extensions already) Grid- and density-based approach Each dimension is partitioned into equal-sized intervals : 1-dimensional units A k-dimensional unit is the intersection of k units of different dimensions A k-dimensional unit is dense iff it contains at least σ objects Anti-monotonic property If a k-dimensional unit is dense, then all its included k 1-dimensional units are also dense. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 11 / 32

14 CLIQUE [Agrawal et al,98] Pioneering approach (several extensions already) Grid- and density-based approach Each dimension is partitioned into equal-sized intervals : 1-dimensional units A k-dimensional unit is the intersection of k units of different dimensions A k-dimensional unit is dense iff it contains at least σ objects Anti-monotonic property If a k-dimensional unit is dense, then all its included k 1-dimensional units are also dense. A subspace cluster is a maximal set of connected dense k-dimensional units Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 11 / 32

15 CLIQUE : an example Number of units = 2 per dimension Dense units : units with at least 4 objects (σ = 4) Raw dataset d 1 d 2 o o o o o o o o o o Grid 4 o u 4 o o d o u 3 o 7 o o 1 o 2 o 5 o u 11 u 12 d 1 1-dimensional dense unit : ({o 1, o 2, o 3, o 4 }), ({u 11 }) 1-dimensional dense unit : ({o 5, o 6, o 7, o 8, o 9, o 10 }), ({u 12 }) 1-dimensional dense unit : ({o 1, o 2, o 3, o 5, o 6, o 7, o 8 }), ({u 21 }) 2-dimensional dense unit : {o 5, o 6, o 7, o 8 }), ({u 12, u 21 }) Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 12 / 32

16 CLIQUE : Mining k-dimensional units Find 1-dimensional dense units At iteration k > 1 : generate k-dimensional dense units Merge a pair of (k 1)-dimensional dense units differing in only one dimension Prune k-dimensional units having a (k 1)-dimensional projections that is not dense. Output : subspace clusters (O, D), where O is a set of objects and D a k-dimensional unit Post-processing : Connected k-dimensional units are merged to generate maximal subspace clusters. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 13 / 32

17 Outline of the talk 1 Subspace clustering 2 Constraint-based Subspace clustering 3 Experimental Results 4 Conclusion Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 14 / 32

18 Motivation and Goal Subspace clustering relies on the monotonicity of constraints to improve efficiency. We propose to Integrate background knowledge into the Subspace clustering process in the form of instance-level constraints : must-link and cannot-link Investigate whether these new constraints can make the process not only more accurate but also more efficient. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 15 / 32

19 Definitions of Instance-Level constraints Cannot-link constraint CL(o i, o j ) A Cannot-link constraint between two objects o i and o j is satisfied by a subspace cluster (O, D) iff {o i, o j } O. Must-link constraint ML(o i, o j ) A Must-link constraint between two objects o i and o j is satisfied by a subspace cluster (O, D) iff {o i, o j } O or {o i, o j } O =. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 16 / 32

20 Monotonicity properties of Instance-Level constraints Cannot-Link is anti-monotonic P O : {o i, o j } O {o i, o j } P Must-Link is a disjunction of a monotonic and an anti-monotonic constraint P O : {o i, o j } P {o i, o j } O P O : {o i, o j } O = {o i, o j } P = Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 17 / 32

21 SC-MINER : main characteristics Need of an algorithm that can handle both monotonic and anti-monotonic constraints SC-MINER (Subspace Clustering Miner) : Considers that the dimensions are divided into units beforehand Enumerates the candidate subspace clusters in a depth-first way Can handle monotonic and anti-monotonic constraints Mines closed subspace clusters directly Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 18 / 32

22 Candidate Generation A candidate X, Y consists of 2 couples of sets : X = (O, D) : the set of objects O and the set of units D contained in all the subspace clusters under construction Y = (O, D ), the set of objects O and the set of units D that still need to be enumerated A each iteration : SC-MINER picks an element z from Y (from O or D ) and makes two recursive calls : once for the candidate X {z}, Y \ {z} once for the candidate X, Y \ {z} Recursion stops when a candidate and all its descendants can be pruned or when Y =, In this case, we have found a valid subspace cluster X = (O, D). For the first call, the candidate is (, ), (O, D). Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 19 / 32

23 Subspace cluster constraint evaluation Subspaces clusters (O, D) are made of objects and units that are in relation : Each object in O must belong to all units of D Each unit in D must contain all objects of O Instead of enumerating candidates and checking if they satisfy this property, SC-MINER maintains this property dynamically (propagation of constraints) : When an element z is moved from Y to X (first recursive call), all elements of Y not in relation with z are removed. Evaluation of the density constraint : if O O < σ the recursion is stopped none of the descendants of the current subspace cluster candidate can be dense. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 20 / 32

24 Candidate Generation example σ = 3 d 1 d 2 d 3 o o o o (, ), (o 1 o 2 o 3 o 4, d 1 d 2 d 3 ) d 1 (, d 1 ), (o 2 o 3, d 2 d 3 ) (, ), (o 1 o 2 o 3 o 4, d 2 d 3 ) d 3 (, d 3 ), (o 1 o 3, d 2 ) (, ), (o 1 o 2 o 3 o 4, d 2 ) d 2 (, d 2 ), (o 2 o 3 o 4, ) (, ), (o 1 o 2 o 3 o 4, ) (o 2 o 3 o 4, d 2 ), (, ) Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 21 / 32

25 Propagation of Instance-Level constraints Cannot-Link constraint : CL(o i, o j ) or CL(o j, o i ) When the candidate X {o i }, Y \ {o i } is generated, o j is removed from Y. Must-Link constraint : ML(o i, o j ) or ML(o i, o j ) When the candidate X {o i }, Y \ {o i } is generated, o j is moved from Y into X and the elements of Y not in relation with o j are removed. When the candidate X, Y \ {o i } is generated, o j is also removed from Y. Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 22 / 32

26 Closeness constraint To avoid redundant clusters Is neither monotonic nor anti-monotonic. We check whether any element z that was previously enumerated in X is in relation with all the elements of (O O ) or (D D ) If so, the current candidate is not closed and can be safely pruned This can be checked efficiently by keeping track of all previously enumerated elements during the recursions Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 23 / 32

27 Outline of the talk 1 Subspace clustering 2 Constraint-based Subspace clustering 3 Experimental Results 4 Conclusion Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 24 / 32

28 Subspace Clustering examples Synthetic data Subspace clustering Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 25 / 32

29 Experimental Results Datasets : Four benchmark datasets with numerical attributes Real high-dimensional gene expression data Plasmodium [Bozdech, 2003] Constraints Generation of IL constraints randomly from examples according to the class attribute (cf. [Struyf et al, 2007]) Average results on 60 different generations of constraints Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 26 / 32

30 Efficiency Number of candidate subspace clusters of SC-MINER, for different numbers of IL constraints Nb candidates decreases in inverse proportion to the number of IL constraints! Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 27 / 32

31 Accuracy evaluation Coverage : percentage of objects present in any of the subspace clusters Quality [Assent et al, 2007] : purity of the final clustering w.r.t the class values The quality increases! However, the coverage decreases Why? Too many constraints to validate robustness! Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 28 / 32

32 Gene Expression Data Each column : expression profile of a given gene of Plasmodium Falciparum (a parasite), evaluated during its developmental cycle (DC). Total : 476 genes (476 dimensions) Each line corresponds to a specific hour of the developmental cycle of Plasmodium Falciparum. Total : 48 hours (48 objects) divided into 3 different stages : Ring, Trophozoite or Schizont (class attribute). Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 29 / 32

33 Meaningful Clusters? Parameters : bins = 4, σ = 26%, 50 constraints, and containing at least 35 dimensions (genes) 26 subspace clusters were obtained with 77.18% quality, 91.3% coverage We compared our results with the biological results in [Bozdech, 2003] We observed that the clusters were formed by genes whose corresponding functions are known to be active during the corresponding samples (objects) : Functional Group Ring Trophozoite Schizont Schizont+beginning cytoplasmic translation 15,000 10,500 9,375 13,045 transcription machinery 4,143 3,500 1,875 2,331 proteasome 2,286 3,500 2,0 2,981 ribonucleotide synthesis 1,143 1,5 0,625 1,513 deoxynucleotide synthesis 0,000 0,000 1,250 0,000 dna replication 2,143 2,000 5,00 4,558 plastid genome 1,286 1,0 1,75 0,481 Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 30 / 32

34 Outline of the talk 1 Subspace clustering 2 Constraint-based Subspace clustering 3 Experimental Results 4 Conclusion Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 31 / 32

35 Conclusion and Future work Conclusion We proposed to extend the common framework of bottom-up subspace clustering to also consider IL constraints IL constraints can increase not only the efficiency of the techniques but also the quality of the resulting clustering Can be integrated into an Inductive Database framework Future work On clustering : Integration of soft constraints (to take noisy data into account) Integration in a real inductive database On constraint-based data mining : Continue to investigate how constraints can help both users and data mining algorithms Elisa Fromont, Adriana Prado and Céline Robardet Constraint-based Subspace Clustering 32 / 32

Mining bi-sets in numerical data

Mining bi-sets in numerical data Mining bi-sets in numerical data Jérémy Besson, Céline Robardet, Luc De Raedt and Jean-François Boulicaut Institut National des Sciences Appliquées de Lyon - France Albert-Ludwigs-Universitat Freiburg

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering

More information

Projective Clustering by Histograms

Projective Clustering by Histograms Projective Clustering by Histograms Eric Ka Ka Ng, Ada Wai-chee Fu and Raymond Chi-Wing Wong, Member, IEEE Abstract Recent research suggests that clustering for high dimensional data should involve searching

More information

Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek

Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek Hans-Peter Kriegel, Peer Kröger, Irene Ntoutsi, Arthur Zimek SSDBM, 20-22/7/2011, Portland OR Ludwig-Maximilians-Universität (LMU) Munich, Germany www.dbs.ifi.lmu.de Motivation Subspace clustering for

More information

Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other?

Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN INSTITUTE FOR INFORMATICS DATABASE Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? MultiClust@KDD

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team (PART I) IMAGINA 17/18 Webpage: http://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge

More information

Frequent Pattern Mining: Exercises

Frequent Pattern Mining: Exercises Frequent Pattern Mining: Exercises Christian Borgelt School of Computer Science tto-von-guericke-university of Magdeburg Universitätsplatz 2, 39106 Magdeburg, Germany christian@borgelt.net http://www.borgelt.net/

More information

Flexible and Adaptive Subspace Search for Outlier Analysis

Flexible and Adaptive Subspace Search for Outlier Analysis Flexible and Adaptive Subspace Search for Outlier Analysis Fabian Keller Emmanuel Müller Andreas Wixler Klemens Böhm Karlsruhe Institute of Technology (KIT), Germany {fabian.keller, emmanuel.mueller, klemens.boehm}@kit.edu

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Chapter 5-2: Clustering

Chapter 5-2: Clustering Chapter 5-2: Clustering Jilles Vreeken Revision 1, November 20 th typo s fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point x as a member of its own ε-neighborhood 12 Nov 2015

More information

Chapters 6 & 7, Frequent Pattern Mining

Chapters 6 & 7, Frequent Pattern Mining CSI 4352, Introduction to Data Mining Chapters 6 & 7, Frequent Pattern Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining Chapters

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Data Analytics Beyond OLAP. Prof. Yanlei Diao Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Clustering & microarray technology

Clustering & microarray technology Clustering & microarray technology A large scale way to measure gene expression levels. Thanks to Kevin Wayne, Matt Hibbs, & SMD for a few of the slides 1 Why is expression important? Proteins Gene Expression

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Towards Detecting Protein Complexes from Protein Interaction Data

Towards Detecting Protein Complexes from Protein Interaction Data Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,

More information

Finding High-Order Correlations in High-Dimensional Biological Data

Finding High-Order Correlations in High-Dimensional Biological Data Finding High-Order Correlations in High-Dimensional Biological Data Xiang Zhang, Feng Pan, and Wei Wang Department of Computer Science University of North Carolina at Chapel Hill 1 Introduction Many real

More information

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent KDD 2011 Rainer Gemulla, Peter J. Haas, Erik Nijkamp and Yannis Sismanis Presenter: Jiawen Yao Dept. CSE, UT Arlington 1 1

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Mining State Dependencies Between Multiple Sensor Data Sources

Mining State Dependencies Between Multiple Sensor Data Sources Mining State Dependencies Between Multiple Sensor Data Sources C. Robardet Co-Authored with Marc Plantevit and Vasile-Marian Scuturici April 2013 1 / 27 Mining Sensor data A timely challenge? Why is it

More information

Data Mining and Matrices

Data Mining and Matrices Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,

More information

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series

More information

CS246 Final Exam. March 16, :30AM - 11:30AM

CS246 Final Exam. March 16, :30AM - 11:30AM CS246 Final Exam March 16, 2016 8:30AM - 11:30AM Name : SUID : I acknowledge and accept the Stanford Honor Code. I have neither given nor received unpermitted help on this examination. (signed) Directions

More information

Mining alpha/beta concepts as relevant bi-sets from transactional data

Mining alpha/beta concepts as relevant bi-sets from transactional data Mining alpha/beta concepts as relevant bi-sets from transactional data Jérémy Besson 1,2, Céline Robardet 3, and Jean-François Boulicaut 1 1 INSA Lyon, LIRIS CNRS FRE 2672, F-69621 Villeurbanne cedex,

More information

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets IEEE Big Data 2015 Big Data in Geosciences Workshop An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets Fatih Akdag and Christoph F. Eick Department of Computer

More information

Efficient Haplotype Inference with Boolean Satisfiability

Efficient Haplotype Inference with Boolean Satisfiability Efficient Haplotype Inference with Boolean Satisfiability Joao Marques-Silva 1 and Ines Lynce 2 1 School of Electronics and Computer Science University of Southampton 2 INESC-ID/IST Technical University

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Local search algorithms. Chapter 4, Sections 3 4 1

Local search algorithms. Chapter 4, Sections 3 4 1 Local search algorithms Chapter 4, Sections 3 4 Chapter 4, Sections 3 4 1 Outline Hill-climbing Simulated annealing Genetic algorithms (briefly) Local search in continuous spaces (very briefly) Chapter

More information

Handling Uncertainty in Clustering Art-exhibition Visiting Styles

Handling Uncertainty in Clustering Art-exhibition Visiting Styles Handling Uncertainty in Clustering Art-exhibition Visiting Styles 1 joint work with Francesco Gullo 2 and Andrea Tagarelli 3 Salvatore Cuomo 4, Pasquale De Michele 4, Francesco Piccialli 4 1 DTE-ICT-HPC

More information

Frequent Itemset Mining

Frequent Itemset Mining ì 1 Frequent Itemset Mining Nadjib LAZAAR LIRMM- UM COCONUT Team IMAGINA 16/17 Webpage: h;p://www.lirmm.fr/~lazaar/teaching.html Email: lazaar@lirmm.fr 2 Data Mining ì Data Mining (DM) or Knowledge Discovery

More information

Clustering Perturbation Resilient

Clustering Perturbation Resilient Clustering Perturbation Resilient Instances Maria-Florina Balcan Carnegie Mellon University Clustering Comes Up Everywhere Clustering news articles or web pages or search results by topic. Clustering protein

More information

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG

ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING. Alexandre Termier, LIG ASSOCIATION ANALYSIS FREQUENT ITEMSETS MINING, LIG M2 SIF DMV course 207/208 Market basket analysis Analyse supermarket s transaction data Transaction = «market basket» of a customer Find which items are

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Pivot Selection Techniques

Pivot Selection Techniques Pivot Selection Techniques Proximity Searching in Metric Spaces by Benjamin Bustos, Gonzalo Navarro and Edgar Chávez Catarina Moreira Outline Introduction Pivots and Metric Spaces Pivots in Nearest Neighbor

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Mining Molecular Fragments: Finding Relevant Substructures of Molecules Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

COMP 5331: Knowledge Discovery and Data Mining

COMP 5331: Knowledge Discovery and Data Mining COMP 5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar And Jiawei Han, Micheline Kamber, and Jian Pei 1 10

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Part I C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Probabilistic Graphical Models Graphical representation of a probabilistic model Each variable corresponds to a

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Subspace Correlation Clustering: Finding Locally Correlated Dimensions in Subspace Projections of the Data

Subspace Correlation Clustering: Finding Locally Correlated Dimensions in Subspace Projections of the Data Subspace Correlation Clustering: Finding Locally Correlated Dimensions in Subspace Projections of the Data Stephan Günnemann, Ines Färber, Kittipat Virochsiri, and Thomas Seidl RWTH Aachen University,

More information

Incremental Construction of Complex Aggregates: Counting over a Secondary Table

Incremental Construction of Complex Aggregates: Counting over a Secondary Table Incremental Construction of Complex Aggregates: Counting over a Secondary Table Clément Charnay 1, Nicolas Lachiche 1, and Agnès Braud 1 ICube, Université de Strasbourg, CNRS 300 Bd Sébastien Brant - CS

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Mining chains of relations

Mining chains of relations Mining chains of relations Foto Aftrati 1, Gautam Das 2, Aristides Gionis 3, Heikki Mannila 4, Taneli Mielikäinen 5, and Panayiotis Tsaparas 6 1 National Technical University of Athens, afrati@softlab.ece.ntua.gr

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Subspace Clustering using CLIQUE: An Exploratory Study

Subspace Clustering using CLIQUE: An Exploratory Study Subspace Clustering using CLIQUE: An Exploratory Study Jyoti Yadav, Dharmender Kumar Abstract Traditional clustering algorithms like K-means, CLARANS, BIRCH, DBSCAN etc. are not able to handle higher dimensional

More information

A Bi-clustering Framework for Categorical Data

A Bi-clustering Framework for Categorical Data A Bi-clustering Framework for Categorical Data Ruggero G. Pensa 1,Céline Robardet 2, and Jean-François Boulicaut 1 1 INSA Lyon, LIRIS CNRS UMR 5205, F-69621 Villeurbanne cedex, France 2 INSA Lyon, PRISMa

More information

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Data Mining Lab Course WS 2017/18

Data Mining Lab Course WS 2017/18 Data Mining Lab Course WS 2017/18 L. Richter Department of Computer Science Technische Universität München Wednesday, Dec 20th L. Richter DM Lab WS 17/18 1 / 14 1 2 3 4 L. Richter DM Lab WS 17/18 2 / 14

More information

Mining Non-Redundant High Order Correlations in Binary Data

Mining Non-Redundant High Order Correlations in Binary Data Mining Non-Redundant High Order Correlations in Binary Data Xiang Zhang 1, Feng Pan 1, Wei Wang 1, and Andrew Nobel 2 1 Department of Computer Science, 2 Department of Statistics and Operations Research

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data cleaning

More information

Un nouvel algorithme de génération des itemsets fermés fréquents

Un nouvel algorithme de génération des itemsets fermés fréquents Un nouvel algorithme de génération des itemsets fermés fréquents Huaiguo Fu CRIL-CNRS FRE2499, Université d Artois - IUT de Lens Rue de l université SP 16, 62307 Lens cedex. France. E-mail: fu@cril.univ-artois.fr

More information

Subspace Clustering and Visualization of Data Streams

Subspace Clustering and Visualization of Data Streams Ibrahim Louhi 1,2, Lydia Boudjeloud-Assala 1 and Thomas Tamisier 2 1 Laboratoire d Informatique Théorique et Appliquée, LITA-EA 3097, Université de Lorraine, Ile du Saucly, Metz, France 2 e-science Unit,

More information

Removing trivial associations in association rule discovery

Removing trivial associations in association rule discovery Removing trivial associations in association rule discovery Geoffrey I. Webb and Songmao Zhang School of Computing and Mathematics, Deakin University Geelong, Victoria 3217, Australia Abstract Association

More information

Outlier Detection in High-Dimensional Data

Outlier Detection in High-Dimensional Data Tutorial Arthur Zimek 1,2, Erich Schubert 2, Hans-Peter Kriegel 2 1 University of Alberta Edmonton, AB, Canada 2 Ludwig-Maximilians-Universität München Munich, Germany PAKDD 2013, Gold Coast, Australia

More information

Local search algorithms. Chapter 4, Sections 3 4 1

Local search algorithms. Chapter 4, Sections 3 4 1 Local search algorithms Chapter 4, Sections 3 4 Chapter 4, Sections 3 4 1 Outline Hill-climbing Simulated annealing Genetic algorithms (briefly) Local search in continuous spaces (very briefly) Chapter

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft

Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft Editorial Manager(tm) for Data Mining and Knowledge Discovery Manuscript Draft Manuscript Number: Title: Summarizing transactional databases with overlapped hyperrectangles, theories and algorithms Article

More information

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics

Genome 541! Unit 4, lecture 2! Transcription factor binding using functional genomics Genome 541 Unit 4, lecture 2 Transcription factor binding using functional genomics Slides vs chalk talk: I m not sure why you chose a chalk talk over ppt. I prefer the latter no issues with readability

More information

CS 584 Data Mining. Association Rule Mining 2

CS 584 Data Mining. Association Rule Mining 2 CS 584 Data Mining Association Rule Mining 2 Recall from last time: Frequent Itemset Generation Strategies Reduce the number of candidates (M) Complete search: M=2 d Use pruning techniques to reduce M

More information

Scalable Algorithms for Distribution Search

Scalable Algorithms for Distribution Search Scalable Algorithms for Distribution Search Yasuko Matsubara (Kyoto University) Yasushi Sakurai (NTT Communication Science Labs) Masatoshi Yoshikawa (Kyoto University) 1 Introduction Main intuition and

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Ranking Interesting Subspaces for Clustering High Dimensional Data

Ranking Interesting Subspaces for Clustering High Dimensional Data Ranking Interesting Subspaces for Clustering High Dimensional Data Karin Kailing, Hans-Peter Kriegel, Peer Kröger, and Stefanie Wanka Institute for Computer Science University of Munich Oettingenstr. 67,

More information

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs

Applications of the Lopsided Lovász Local Lemma Regarding Hypergraphs Regarding Hypergraphs Ph.D. Dissertation Defense April 15, 2013 Overview The Local Lemmata 2-Coloring Hypergraphs with the Original Local Lemma Counting Derangements with the Lopsided Local Lemma Lopsided

More information

Generating p-extremal graphs

Generating p-extremal graphs Generating p-extremal graphs Derrick Stolee Department of Mathematics Department of Computer Science University of Nebraska Lincoln s-dstolee1@math.unl.edu August 2, 2011 Abstract Let f(n, p be the maximum

More information

On the Mining of Numerical Data with Formal Concept Analysis

On the Mining of Numerical Data with Formal Concept Analysis On the Mining of Numerical Data with Formal Concept Analysis Thèse de doctorat en informatique Mehdi Kaytoue 22 April 2011 Amedeo Napoli Sébastien Duplessis Somewhere... in a temperate forest... N 2 /

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees Entropy, Information Gain, Gain Ratio Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted

More information

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009

Lecture 23 Branch-and-Bound Algorithm. November 3, 2009 Branch-and-Bound Algorithm November 3, 2009 Outline Lecture 23 Modeling aspect: Either-Or requirement Special ILPs: Totally unimodular matrices Branch-and-Bound Algorithm Underlying idea Terminology Formal

More information

Optimization of Submodular Functions Tutorial - lecture I

Optimization of Submodular Functions Tutorial - lecture I Optimization of Submodular Functions Tutorial - lecture I Jan Vondrák 1 1 IBM Almaden Research Center San Jose, CA Jan Vondrák (IBM Almaden) Submodular Optimization Tutorial 1 / 1 Lecture I: outline 1

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on: 17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

More information

Structural Learning and Integrative Decomposition of Multi-View Data

Structural Learning and Integrative Decomposition of Multi-View Data Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6 Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights

More information

The Lovász Local Lemma: constructive aspects, stronger variants and the hard core model

The Lovász Local Lemma: constructive aspects, stronger variants and the hard core model The Lovász Local Lemma: constructive aspects, stronger variants and the hard core model Jan Vondrák 1 1 Dept. of Mathematics Stanford University joint work with Nick Harvey (UBC) The Lovász Local Lemma

More information

Orbitopes. Marc Pfetsch. joint work with Volker Kaibel. Zuse Institute Berlin

Orbitopes. Marc Pfetsch. joint work with Volker Kaibel. Zuse Institute Berlin Orbitopes Marc Pfetsch joint work with Volker Kaibel Zuse Institute Berlin What this talk is about We introduce orbitopes. A polyhedral way to break symmetries in integer programs. Introduction 2 Orbitopes

More information

Finding Non-Redundant, Statistically Signicant Regions in High Dimensional Data: a Novel Approach to Projected and Subspace Clustering

Finding Non-Redundant, Statistically Signicant Regions in High Dimensional Data: a Novel Approach to Projected and Subspace Clustering Finding Non-Redundant, Statistically Signicant Regions in High Dimensional Data: a Novel Approach to Projected and Subspace Clustering ABSTRACT Gabriela Moise Dept. of Computing Science University of Alberta

More information

Differential Modeling for Cancer Microarray Data

Differential Modeling for Cancer Microarray Data Differential Modeling for Cancer Microarray Data Omar Odibat Department of Computer Science Feb, 01, 2011 1 Outline Introduction Cancer Microarray data Problem Definition Differential analysis Existing

More information

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen Book Title Encyclopedia of Machine Learning Chapter Number 00403 Book CopyRight - Year 2010 Title Frequent Pattern Author Particle Given Name Hannu Family Name Toivonen Suffix Email hannu.toivonen@cs.helsinki.fi

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees CS194-10 Fall 2011 Lecture 8 CS194-10 Fall 2011 Lecture 8 1 Outline Decision tree models Tree construction Tree pruning Continuous input features CS194-10 Fall 2011 Lecture 8 2

More information

Summarizing Transactional Databases with Overlapped Hyperrectangles

Summarizing Transactional Databases with Overlapped Hyperrectangles Noname manuscript No. (will be inserted by the editor) Summarizing Transactional Databases with Overlapped Hyperrectangles Yang Xiang Ruoming Jin David Fuhry Feodor F. Dragan Abstract Transactional data

More information

CARE: Finding Local Linear Correlations in High Dimensional Data

CARE: Finding Local Linear Correlations in High Dimensional Data CARE: Finding Local Linear Correlations in High Dimensional Data Xiang Zhang, Feng Pan, and Wei Wang Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599, USA

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Last Lecture Refresher Lecture Plan Directed

More information