Less is More: Non-Redundant Subspace Clustering
|
|
- Virgil Neal
- 5 years ago
- Views:
Transcription
1 Less is More: Non-Redundant Subspace Clustering Ira Assent Emmanuel Müller Stephan Günnemann Ralph Krieger Thomas Seidl Aalborg University, Denmark DATA MANAGEMENT AND DATA EXPLORATION GROUP RWTH Prof. Dr. Aachen rer. nat. University, Thomas Germany Seidl MultiClust Workshop at SIGKDD 2010 July 25, 2010
2 Detection of Non-Redundant Subspace Clusters I C 3 # boats in Miami C 1 internet usage C 5 C 2 C 4 income C 6 sportive activities Hidden clusters are described by different attribute sets Each object might be grouped in multiple clusters Novel challenges for subspace clustering Less is More: Non-Redundant Subspace Clustering 1 / 11
3 Detection of Non-Redundant Subspace Clusters II C 3 # boats in Miami C 1 Subspace Cluster: (rich; boat owner; car fan; globetrotter; horse fan) # horses freq. flyer miles # cars Exp. many projections (rich) (boat owner) (rich; globetrotter)... C 4 income Huge amount of redundant clusters Typically number of clusters number of objects Detection of all and only non-redundant subspace clusters Less is More: Non-Redundant Subspace Clustering 2 / 11
4 Overview Main question How can you use/extend non-redundant clustering... In this talk, we present A survey of our contributions so far The generality of our techniques Our open source initiatives for the community Research questions arise in the areas of: 1 Effective Models 2 Efficient Computation 3 Evaluation and Exploration of Results Less is More: Non-Redundant Subspace Clustering 3 / 11
5 Notions and Related Work Abstract subspace clustering definition Definition of object set O clustered in subspace S C = (O, S) with O DB, S DIM Selection of result set M a subset of all valid subspace clusters ALL 1,2,3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2 1,3 1,4 2,3 2,4 3, M = {(O 1, S 1 )... (O n, S n )} ALL Related work Subspace clustering: focus on definition of (O, S) Output all valid subspace cluster M = ALL ( too many) Projected clustering: focus on definition of disjoint clusters in M Unable to detect objects in multiple clusters ( too few) Less is More: Non-Redundant Subspace Clustering 4 / 11
6 Non-Redundant Subspace Clustering Models Select M ALL: Exclude redundant subspace clusters... C 2 C 2 C 3 C 1 C 1 Local (pairwise) redundancy elimination [1][2] (O, S) is non-redundant iff (O, S ) with O O S S O R O Excludes large number of redundant subspace clusters [1] Assent, Krieger, Müller and Seidl: DUSC: Dimensionality Unbiased Subspace Clustering, in ICDM [2] Assent, Krieger, Müller and Seidl: INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy, in ICDM Less is More: Non-Redundant Subspace Clustering 5 / 11
7 Generalization of Redundancy Elimination Relevant subspace clustering model [3] Include the most interesting subspace clusters Exclude redundant subspace clusters Provide most relevant subspace clusters in result set Extract novel knowledge with each cluster all possible clusters ALL interestingness of clusters relevance model redundancy of clusters relevant clustering M ALL Given any definition of subspace clusters C = (O, S) Choose optimal subset M = {C 1,..., C n } ALL Proof: Such an optimization is an NP-hard problem [3] Müller, Assent, Günnemann, Krieger and Seidl: Relevant Subspace Clustering: Mining the Most Interesting Non-Redundant Concepts in High Dimensional Data, in ICDM Less is More: Non-Redundant Subspace Clustering 6 / 11
8 Redundancy Pruning by Depth-First Processing Pruning Applicable for local redundancy (simple pairwise model) Enables in-process pruning of redundant clusters. 1,2,3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2 1,3 1,4 2,3 2,4 3, breadth-first depth-first 1,2,3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2 1,3 1,4 2,3 2,4 3, Step-by-step processing (k-d (k + 1)-D subspace!) Scalability to high dimensional data? Less is More: Non-Redundant Subspace Clustering 7 / 11
9 Scalable Subspace Processing direct jump 1,2,3,4 1,2,3 1,2,4 1,3,4 2,3,4 1,2 1,3 1,4 2,3 2,4 3, Dimension interval 1 interval 2 Dimension Dimension 4 Key idea: density estimation + steered jumps Subspace clusters are represented by many low dimensional projections Use 2-D projections to estimate density in higher subspace regions [4] Use k-d projections to jump directly to (k + x)-d subspaces [x 1] Best-first search: Intelligent steering to promising subspace regions [4] Müller, Assent, Krieger, Günnemann and Seidl: DensEst: Density Estimation for Data Mining in High Dimensional Spaces, in SDM Less is More: Non-Redundant Subspace Clustering 8 / 11
10 Challenges in Evaluation and Exploration General challenge for clustering No ground truth available for clustering Subjective evaluation by exploration requires visualization techniques and interactive exploration tools Objective evaluations are incomparable using different implementations, databases and quality measures We provide broad evaluation study & interactive exploration framework Evaluation Study [5] Characterization of major paradigms Providing comparable baseline implementations Evaluation based on broad set of data sets, quality measures and parameter settings [5] Müller, Günnemann, Assent and Seidl: Evaluating Clustering in Subspace Projections of High Dimensional Data, in VLDB Less is More: Non-Redundant Subspace Clustering 9 / 11
11 Open Source Framework OpenSubspace framework Framework for research, education and application [6][7][8][9] Baselines for algorithm and evaluation measure development OpenSubspace algo. A algo. B algo. C unified algorithm repository algo. D rare case: common implementation eval. 1 eval. 2 eval. 3 re-implementation eval. 1 eval. 2 eval. 3 unified evaluation repository [6] Müller, Assent, Krieger, Jansen and Seidl: Morpheus: Interactive Exploration of Subspace Clustering, in KDD [7] Assent, Müller, Krieger, Jansen and Seidl: Pleiades: Subspace Clustering and Evaluation, in PKDD [8] Günnemann, Färber, Kremer, Seidl: CoDA: Interactive Cluster Based Concept Discovery, in VLDB 2010 [9] Müller, Schiffer, Gerwert, Hannen, Jansen, Seidl: SOREX: Subspace Outlier Ranking Exploration Toolkit, in PKDD Less is More: Non-Redundant Subspace Clustering 10 / 11
12 Conclusion and Future Work Subspace clustering is still an emerging research field... Is the basis for a lot of further research Alternative subspace clustering Evaluation measures for subspace clustering Benchmark databases for subspace clustering... Less is More: Non-Redundant Subspace Clustering 11 / 11
13 Conclusion and Future Work Subspace clustering is still an emerging research field... Is the basis for a lot of further research Alternative subspace clustering Evaluation measures for subspace clustering Benchmark databases for subspace clustering... Thank you for your attention. Questions? Less is More: Non-Redundant Subspace Clustering 11 / 11
P leiades: Subspace Clustering and Evaluation
P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de
More informationAnticipatory DTW for Efficient Similarity Search in Time Series Databases
Anticipatory DTW for Efficient Similarity Search in Time Series Databases Ira Assent Marc Wichterich Ralph Krieger Hardy Kremer Thomas Seidl RWTH Aachen University, Germany Aalborg University, Denmark
More informationSubspace Correlation Clustering: Finding Locally Correlated Dimensions in Subspace Projections of the Data
Subspace Correlation Clustering: Finding Locally Correlated Dimensions in Subspace Projections of the Data Stephan Günnemann, Ines Färber, Kittipat Virochsiri, and Thomas Seidl RWTH Aachen University,
More informationPattern-Based Decision Tree Construction
Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut
More informationWhen Pattern Met Subspace Cluster
When Pattern Met Subspace Cluster a Relationship Story Jilles Vreeken 1 and Arthur Zimek 2 1 Advanced Database Research and Modeling, Universiteit Antwerpen, Belgium http://www.adrem.ua.ac.be/~jvreeken
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationVisual Quality Assessment of Subspace Clusterings
Visual Quality Assessment of Subspace Clusterings Michael Hund1, Ines Färber2, Michael Behrisch1, Andrada Tatu1, Tobias Schreck3, Daniel A. Keim1, Thomas Seidl4 1 2 University of Konstanz, Germany {lastname@dbvis.inf.uni-konstanz.de}
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan
More informationUnbiased Multivariate Correlation Analysis
Unbiased Multivariate Correlation Analysis Yisen Wang,, Simone Romano, Vinh Nguyen, James Bailey, Xingjun Ma, Shu-Tao Xia Dept. of Computer Science and Technology, Tsinghua University, China Dept. of Computing
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University 10/17/2017 Slides adapted from Prof. Jiawei Han @UIUC, Prof.
More informationA Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms T. Vijayakumar 1, V.Nivedhitha 2, K.Deeba 3 and M. Sathya Bama 4 1 Assistant professor / Dept of IT, Dr.N.G.P College of Engineering
More informationCluster Evaluation of Density Based Subspace Clustering Rahmat Widia Sembiring, Jasni Mohamad Zain
WWW.JORNALOFCOMPTING.ORG 1 Cluster Evaluation of Density Based Subspace Clustering Rahmat Widia Sembiring, Jasni Mohamad Zain Abstract Clustering real world data often faced with curse of dimensionality,
More informationSubspace Clustering and Visualization of Data Streams
Ibrahim Louhi 1,2, Lydia Boudjeloud-Assala 1 and Thomas Tamisier 2 1 Laboratoire d Informatique Théorique et Appliquée, LITA-EA 3097, Université de Lorraine, Ile du Saucly, Metz, France 2 e-science Unit,
More information4S: Scalable Subspace Search Scheme Overcoming Traditional Apriori Processing
4S: Scalable Subspace Search Scheme Overcoming Traditional Apriori Processing Hoang Vu guyen Emmanuel Müller Klemens Böhm Karlsruhe Institute of Technology (KIT), Germany University of Antwerp, Belgium
More informationMining Spatial and Spatio-temporal Patterns in Scientific Data
Mining Spatial and Spatio-temporal Patterns in Scientific Data Hui Yang Dept. of Comp. Sci. & Eng. The Ohio State University Columbus, OH, 43210 yanghu@cse.ohio-state.edu Srinivasan Parthasarathy Dept.
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationHigh Frequency Rough Set Model based on Database Systems
High Frequency Rough Set Model based on Database Systems Kartik Vaithyanathan kvaithya@gmail.com T.Y.Lin Department of Computer Science San Jose State University San Jose, CA 94403, USA tylin@cs.sjsu.edu
More informationNetBox: A Probabilistic Method for Analyzing Market Basket Data
NetBox: A Probabilistic Method for Analyzing Market Basket Data José Miguel Hernández-Lobato joint work with Zoubin Gharhamani Department of Engineering, Cambridge University October 22, 2012 J. M. Hernández-Lobato
More informationAlgorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview
Algorithmic Methods of Data Mining, Fall 2005, Course overview 1 Course overview lgorithmic Methods of Data Mining, Fall 2005, Course overview 1 T-61.5060 Algorithmic methods of data mining (3 cp) P T-61.5060
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationInvestigating Measures of Association by Graphs and Tables of Critical Frequencies
Investigating Measures of Association by Graphs Investigating and Tables Measures of Critical of Association Frequencies by Graphs and Tables of Critical Frequencies Martin Ralbovský, Jan Rauch University
More informationConstraint-based Subspace Clustering
Constraint-based Subspace Clustering Elisa Fromont 1, Adriana Prado 2 and Céline Robardet 1 1 Université de Lyon, France 2 Universiteit Antwerpen, Belgium Thursday, April 30 Traditional Clustering Partitions
More informationApriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation 12.3.2008 Lauri Lahti Association rules Techniques for data mining and knowledge discovery in databases
More informationSubspace Search and Visualization to Make Sense of Alternative Clusterings in High-Dimensional Data
Subspace Search and Visualization to Make Sense of Alternative Clusterings in High-Dimensional Data Andrada Tatu University of Konstanz Tobias Schreck University of Konstanz Fabian Maaß University of Konstanz
More informationAlgorithms for Characterization and Trend Detection in Spatial Databases
Published in Proceedings of 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) Algorithms for Characterization and Trend Detection in Spatial Databases Martin Ester, Alexander
More informationModeling and Estimation from High Dimensional Data TAKASHI WASHIO THE INSTITUTE OF SCIENTIFIC AND INDUSTRIAL RESEARCH OSAKA UNIVERSITY
Modeling and Estimation from High Dimensional Data TAKASHI WASHIO THE INSTITUTE OF SCIENTIFIC AND INDUSTRIAL RESEARCH OSAKA UNIVERSITY Department of Reasoning for Intelligence, Division of Information
More informationBiclustering Gene-Feature Matrices for Statistically Significant Dense Patterns
Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns Mehmet Koyutürk, Wojciech Szpankowski, and Ananth Grama Dept. of Computer Sciences, Purdue University West Lafayette, IN
More informationData Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td
Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak
More informationRare Event Discovery And Event Change Point In Biological Data Stream
Rare Event Discovery And Event Change Point In Biological Data Stream T. Jagadeeswari 1 M.Tech(CSE) MISTE, B. Mahalakshmi 2 M.Tech(CSE)MISTE, N. Anusha 3 M.Tech(CSE) Department of Computer Science and
More informationFlexible and Adaptive Subspace Search for Outlier Analysis
Flexible and Adaptive Subspace Search for Outlier Analysis Fabian Keller Emmanuel Müller Andreas Wixler Klemens Böhm Karlsruhe Institute of Technology (KIT), Germany {fabian.keller, emmanuel.mueller, klemens.boehm}@kit.edu
More informationStreaming multiscale anomaly detection
Streaming multiscale anomaly detection DATA-ENS Paris and ThalesAlenia Space B Ravi Kiran, Université Lille 3, CRISTaL Joint work with Mathieu Andreux beedotkiran@gmail.com June 20, 2017 (CRISTaL) Streaming
More informationAn Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets
IEEE Big Data 2015 Big Data in Geosciences Workshop An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets Fatih Akdag and Christoph F. Eick Department of Computer
More informationPrivacy Preserving Frequent Itemset Mining. Workshop on Privacy, Security, and Data Mining ICDM - Maebashi City, Japan December 9, 2002
Privacy Preserving Frequent Itemset Mining Stanley R. M. Oliveira 1,2 Osmar R. Zaïane 2 1 oliveira@cs.ualberta.ca zaiane@cs.ualberta.ca Embrapa Information Technology Database Systems Laboratory Andre
More informationAlternative Clustering, Multiview Clustering: What Can We Learn From Each Other?
LUDWIG- MAXIMILIANS- UNIVERSITÄT MÜNCHEN INSTITUTE FOR INFORMATICS DATABASE Subspace Clustering, Ensemble Clustering, Alternative Clustering, Multiview Clustering: What Can We Learn From Each Other? MultiClust@KDD
More informationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation Xuan Hong Dang, Barbora Micenková, Ira Assent, and Raymond T. Ng 2 Aarhus University, Denmark, {dang,barbora,ira}@cs.au.dk 2 University of British Columbia,
More informationLocal Outlier Detection with Interpretation
Local Outlier Detection with Interpretation Xuan Hong Dang, Barbora Micenková, Ira Assent, and Raymond T. Ng 2 Aarhus University, Denmark, {dang,barbora,ira}@cs.au.dk 2 University of British Columbia,
More informationDistributed Data Mining for Pervasive and Privacy-Sensitive Applications. Hillol Kargupta
Distributed Data Mining for Pervasive and Privacy-Sensitive Applications Hillol Kargupta Dept. of Computer Science and Electrical Engg, University of Maryland Baltimore County http://www.cs.umbc.edu/~hillol
More informationExploring Spatial Relationships for Knowledge Discovery in Spatial Data
2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang
More informationProblem orientable evaluations as L-subsets
Problem orientable evaluations as L-subsets Adalbert Kerber 1 Neuchatel 26.6.2018 1 kerber@uni-bayreuth.de, Dep. of Math. Introduction Suppose We consider evaluations as sets over a lattice. E.g.: G. Restrepo
More informationSpatial Co-location Patterns Mining
Spatial Co-location Patterns Mining Ruhi Nehri Dept. of Computer Science and Engineering. Government College of Engineering, Aurangabad, Maharashtra, India. Meghana Nagori Dept. of Computer Science and
More informationMatrix factorization models for patterns beyond blocks. Pauli Miettinen 18 February 2016
Matrix factorization models for patterns beyond blocks 18 February 2016 What does a matrix factorization do?? A = U V T 2 For SVD that s easy! 3 Inner-product interpretation Element (AB) ij is the inner
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationThe final publication is available at IEEE via
This manuscript is a post-print copy of the following article Title: Mining Approximate Multi-Relational Patterns Authors: Eirini Spyropoulou; Tijl De Bie The final publication is available at IEEE via
More informationROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING
ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationMining Molecular Fragments: Finding Relevant Substructures of Molecules
Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli
More informationSurprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University
Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University kborne@gmu.edu, http://classweb.gmu.edu/kborne/ Outline Astroinformatics Example Application:
More informationOnline Estimation of Discrete Densities using Classifier Chains
Online Estimation of Discrete Densities using Classifier Chains Michael Geilke 1 and Eibe Frank 2 and Stefan Kramer 1 1 Johannes Gutenberg-Universtität Mainz, Germany {geilke,kramer}@informatik.uni-mainz.de
More informationMining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany
Mining Rank Data Sascha Henzgen and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {sascha.henzgen,eyke}@upb.de Abstract. This paper addresses the problem of mining rank
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationKnowledge Discovery and Data Mining I
Ludwig-Maximilians-Universität München Lehrstuhl für Datenbanksysteme und Data Mining Prof. Dr. Thomas Seidl Knowledge Discovery and Data Mining I Winter Semester 2018/19 Agenda 1. Introduction 2. Basics
More informationClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information
ClasSi: Measuring Ranking Quality in the Presence of Object Classes with Similarity Information Anca Maria Ivanescu, Marc Wichterich, and Thomas Seidl Data Management and Data Exploration Group Informatik
More informationA Neural Network learning Relative Distances
A Neural Network learning Relative Distances Alfred Ultsch, Dept. of Computer Science, University of Marburg, Germany. ultsch@informatik.uni-marburg.de Data Mining and Knowledge Discovery aim at the detection
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More informationTomasz F. Stepinski. Jean-Philippe Nicot* Bureau of Economic Geology, Jackson School of Geosciences University of Texas at Austin Austin, TX 78712
Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets Christoph F. Eick, Rachana Parmar, Wei Ding Department of Computer Science University of Houston Houston, TX 77204-3010
More informationAstroinformatics in the data-driven Astronomy
Astroinformatics in the data-driven Astronomy Massimo Brescia 2017 ICT Workshop Astronomy vs Astroinformatics Most of the initial time has been spent to find a common language among communities How astronomers
More informationUnit II Association Rules
Unit II Association Rules Basic Concepts Frequent Pattern Analysis Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set Frequent Itemset
More informationUsing Rank Propagation and Probabilistic Counting for Link-based Spam Detection
Using Rank Propagation and Probabilistic for Link-based Luca Becchetti, Carlos Castillo,Debora Donato, Stefano Leonardi and Ricardo Baeza-Yates 2. Università di Roma La Sapienza Rome, Italy 2. Yahoo! Research
More informationOutlier Detection and Trend Detection: Two Sides of the Same Coin Preprint publisher version available at
Outlier Detection and Trend Detection: Two Sides of the Same Coin Preprint publisher version available at http://dx.doi.org/10.1109/icdmw.2015.79 Erich Schubert, Michael Weiler, Arthur Zimek Ludwig-Maximilians-Universität
More informationPositive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise
Positive Borders or Negative Borders: How to Make Lossless Generator Based Representations Concise Guimei Liu 1,2 Jinyan Li 1 Limsoon Wong 2 Wynne Hsu 2 1 Institute for Infocomm Research, Singapore 2 School
More informationQuantitative Association Rule Mining on Weighted Transactional Data
Quantitative Association Rule Mining on Weighted Transactional Data D. Sujatha and Naveen C. H. Abstract In this paper we have proposed an approach for mining quantitative association rules. The aim of
More information22c:145 Artificial Intelligence
22c:145 Artificial Intelligence Fall 2005 Propositional Logic Cesare Tinelli The University of Iowa Copyright 2001-05 Cesare Tinelli and Hantao Zhang. a a These notes are copyrighted material and may not
More informationHierarchy in Meritocracy: Community Building and Code Production in the ASF. Oscar Castañeda Student Delft University of Technology
Hierarchy in Meritocracy: Community Building and Code Production in the ASF Oscar Castañeda Student Delft University of Technology This talk started with a project proposal... Overview Institutions in
More informationSYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA. Outlier Detection And Description Workshop 2013
SYSTEMATIC CONSTRUCTION OF ANOMALY DETECTION BENCHMARKS FROM REAL DATA Outlier Detection And Description Workshop 2013 Authors Andrew Emmott emmott@eecs.oregonstate.edu Thomas Dietterich tgd@eecs.oregonstate.edu
More informationChapters 6 & 7, Frequent Pattern Mining
CSI 4352, Introduction to Data Mining Chapters 6 & 7, Frequent Pattern Mining Young-Rae Cho Associate Professor Department of Computer Science Baylor University CSI 4352, Introduction to Data Mining Chapters
More informationDistributed Mining of Frequent Closed Itemsets: Some Preliminary Results
Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore
More informationMining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns
Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns Jinyan Li Institute for Infocomm Research (I 2 R) & Nanyang Technological University, Singapore jyli@ntu.edu.sg
More informationMining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data
Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional -Series Data Xiaolei Li, Jiawei Han University of Illinois at Urbana-Champaign VLDB 2007 1 Series Data Many applications produce time series
More informationA computer program for mathematical discovery
A computer program for mathematical discovery Ermelinda DeLaViña University of Houston-Downtown March 29, 2008 Some main practices of a mathematician: Concept formation Conjecture making Constructing counterexamples
More informationMulti-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts
Multi-Plant Photovoltaic Energy Forecasting Challenge with Regression Tree Ensembles and Hourly Average Forecasts Kathrin Bujna 1 and Martin Wistuba 2 1 Paderborn University 2 IBM Research Ireland Abstract.
More informationDisplay Advertising Optimization by Quantum Annealing Processor
Display Advertising Optimization by Quantum Annealing Processor Shinichi Takayanagi*, Kotaro Tanahashi*, Shu Tanaka *Recruit Communications Co., Ltd. Waseda University, JST PRESTO Overview 1. Introduction
More informationDetecting Correlated Columns in Relational Databases with Mixed Data Types
Detecting Correlated Columns in Relational Databases with Mixed Data Types Hoang Vu Nguyen Emmanuel Müller Periklis Andritsos Klemens Böhm Karlsruhe Institute of Technology (KIT), Germany {hoang.nguyen,
More informationAn Extended Branch and Bound Search Algorithm for Finding Top-N Formal Concepts of Documents
An Extended Branch and Bound Search Algorithm for Finding Top-N Formal Concepts of Documents Makoto HARAGUCHI and Yoshiaki OKUBO Division of Computer Science Graduate School of Information Science and
More informationMolecular Fragment Mining for Drug Discovery
Molecular Fragment Mining for Drug Discovery Christian Borgelt 1, Michael R. Berthold 2, and David E. Patterson 2 1 School of Computer Science, tto-von-guericke-university of Magdeburg, Universitätsplatz
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationTowards the discovery of exceptional local models: descriptive rules relating molecules and their odors
Towards the discovery of exceptional local models: descriptive rules relating molecules and their odors Guillaume Bosc 1, Mehdi Kaytoue 1, Marc Plantevit 1, Fabien De Marchi 1, Moustafa Bensafi 2, Jean-François
More informationFinding Hierarchies of Subspace Clusters
Proc. 10th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany, 2006 Finding Hierarchies of Subspace Clusters Elke Achtert, Christian Böhm, Hans-Peter Kriegel,
More informationCourse overview. Heikki Mannila Laboratory for Computer and Information Science Helsinki University of Technology
Course overview Heikki Mannila Laboratory for Computer and Information Science Helsinki University of Technology Heikki.Mannila@tkk.fi Algorithmic methods of data mining, Autumn 2007, Course overview 1
More informationAlgorithmic methods of data mining, Autumn 2007, Course overview0-0. Course overview
Algorithmic methods of data mining, Autumn 2007, Course overview0-0 Course overview Heikki Mannila Laboratory for Computer and Information Science Helsinki University of Technology Heikki.Mannila@tkk.fi
More informationLecture Notes for Chapter 6. Introduction to Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004
More information15 Introduction to Data Mining
15 Introduction to Data Mining 15.1 Introduction to principle methods 15.2 Mining association rule see also: A. Kemper, Chap. 17.4, Kifer et al.: chap 17.7 ff 15.1 Introduction "Discovery of useful, possibly
More informationarxiv: v1 [stat.ml] 28 Oct 2015
Universal Dependency Analysis Hoang-Vu Nguyen Panagiotis Mandros Jilles Vreeken arxiv:5.8389v [stat.ml] 28 Oct 25 Abstract Most data is multi-dimensional. Discovering whether any subset of dimensions,
More informationUsing HDDT to avoid instances propagation in unbalanced and evolving data streams
Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014
More informationCS60021: Scalable Data Mining. Dimensionality Reduction
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 1 CS60021: Scalable Data Mining Dimensionality Reduction Sourangshu Bhattacharya Assumption: Data lies on or near a
More informationSelecting a Right Interestingness Measure for Rare Association Rules
Selecting a Right Interestingness Measure for Rare Association Rules Akshat Surana R. Uday Kiran P. Krishna Reddy Center for Data Engineering International Institute of Information Technology-Hyderabad
More informationBelieve it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data
SDM 15 Vancouver, CAN Believe it Today or Tomorrow? Detecting Untrustworthy Information from Dynamic Multi-Source Data Houping Xiao 1, Yaliang Li 1, Jing Gao 1, Fei Wang 2, Liang Ge 3, Wei Fan 4, Long
More informationTowards Detecting Protein Complexes from Protein Interaction Data
Towards Detecting Protein Complexes from Protein Interaction Data Pengjun Pei 1 and Aidong Zhang 1 Department of Computer Science and Engineering State University of New York at Buffalo Buffalo NY 14260,
More informationFinding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets
Finding Regional Co-location Patterns for Sets of Continuous Variables in Spatial Datasets Christoph F. Eick, Rachana Parmar Department of Computer Science University of Houston Houston, TX 77204-3010
More informationComputer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution
Computer science research seminar: VideoLectures.Net recommender system challenge: presentation of baseline solution Nino Antulov-Fantulin 1, Mentors: Tomislav Šmuc 1 and Mile Šikić 2 3 1 Institute Rudjer
More informationOn the Mining of Numerical Data with Formal Concept Analysis
On the Mining of Numerical Data with Formal Concept Analysis Thèse de doctorat en informatique Mehdi Kaytoue 22 April 2011 Amedeo Napoli Sébastien Duplessis Somewhere... in a temperate forest... N 2 /
More informationExploiting Statistically Significant Dependent Rules for Associative Classification
Exploiting Statistically Significant Depent Rules for Associative Classification Jundong Li Computer Science and Engineering Arizona State University, USA jundongl@asuedu Osmar R Zaiane Department of Computing
More informationMTH 464: Computational Linear Algebra
MTH 464: Computational Linear Algebra Lecture Outlines Exam 2 Material Prof. M. Beauregard Department of Mathematics & Statistics Stephen F. Austin State University March 2, 2018 Linear Algebra (MTH 464)
More informationMatrices, Vector Spaces, and Information Retrieval
Matrices, Vector Spaces, and Information Authors: M. W. Berry and Z. Drmac and E. R. Jessup SIAM 1999: Society for Industrial and Applied Mathematics Speaker: Mattia Parigiani 1 Introduction Large volumes
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationUnsupervised Image Segmentation Using Comparative Reasoning and Random Walks
Unsupervised Image Segmentation Using Comparative Reasoning and Random Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon Jelena Kovacevic Carnegie
More informationComparing Apples and Oranges
Comparing Apples and Oranges Measuring Differences between Data Mining Results Nikolaj Tatti and Jilles Vreeken Advanced Database Research and Modeling Universiteit Antwerpen {nikolaj.tatti,jilles.vreeken}@ua.ac.be
More informationBipartite spectral graph partitioning to co-cluster varieties and sound correspondences
Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences Martijn Wieling Department of Computational Linguistics, University of Groningen Seminar in Methodology and Statistics
More information