Rough Sets Used in the Measurement of Similarity of Mixed Mode Data
|
|
- Rosanna Whitehead
- 6 years ago
- Views:
Transcription
1 Rough Sets Used the Measurement of Similarity of Mixed Mode Data Sarah Coppock Lawrence Mazlack Applied Artificial Intelligence Laboratory ECECS Department University of Ccnati Ccnati Ohio Abstract Similarity is important knowledge discovery Cluster analysis classification and granulation each volve some notion or defition of similarity The measurement of similarity is selected based on the doma and distribution of the data Even with a specific doma some similarity metrics may be considered more useful than others There is an amount of uncertaty quantitatively measurg the similarity between records of mixed data The uncertaty develops from the lack of scale that both nomal and ordal data have Rough set theory is one tool developed for handlg uncertaty Rough sets can be used dissimilarity analysis of qualitative data It would seem that rough sets could be applied measurg similarity between records contag both quantitative and qualitative data for the purpose of clusterg the records 1 Introduction Similarity metrics are used many fields When determg the similarity between records a data set that contas different kds of data a certa amount of uncertaty is troduced While metrics such as Euclidean and its generalized Mkowski metrics can be used when all of the data is quantitative (both discrete and contuous) it is not as easy to usefully combe scalar metrics representg qualitative data (both nomal and ordal) Data can be categorized to qualitative and quantitative data Qualitative data can be further described as either ordal or nomal Ordal data has order without scale eg small medium large Nomal data has no order and no scale eg Ccnati Tampa Atlanta Data such as cities and colors can be argued as havg some order" eg latitudes or longitudes and frequencies However the case of unsupervised learng such knowledge is not explicitly known by the learng algorithms For a more detailed discussion of data varieties see [1] [2] Similarity is important knowledge discovery Cluster analysis classification and granulation each volve some notion or defition of similarity Measurg similarity between multidimensional multi-modal data is difficult but offers formation The formation provided by clusterg records based on similarity measurements cludes an overall distribution of the data and discovery of possible outliers The measurement of similarity may be appropriately selected based on the doma and distribution of the data Even with a doma there may be some similarity metrics considered more useful than others There is an amount of uncertaty quantitatively measurg similarity (or dissimilarity) between records of mixed kds of data The uncertaty develops from the fact that both nomal and ordal data lack a natural fixed scale For example should we say that the similarity between red and orange is more or less than (or equal to) the similarity between blue and green? Some metrics assign a Boolean value for whether the values match The similarity would be considered equal Rough set theory is one of the tools developed for handlg uncertaty Pawlak [4] demonstrates how rough sets can be used dissimilarity analysis of qualitative data It would seem that rough sets could be applied measurg similarity between records contag both quantitative and qualitative data 2 Rough sets dissimilarity analysis Rough sets are built on the notion of discernibility There is an equivalence relation imposed on the items based on attribute values For example Table 1 and are considered discernible with respect to 0 denoted IND(0 ) A similar idea is used similarity metrics for nomal and ordal data A simple matchg technique where a Boolean 0 o is assigned based on whether two attribute values are the same for two records Pawlak [4] describes applyg rough sets to measure dissimilarity between records of Boolean values A brief description of Pawlak s method to measure dissimilarity usg rough sets follows usg his Middle East Situation example This example is given Table 1 Any attributes that have the same values as another attribute for all records ie equivalent attributes to one another are disregarded For example Table 2 have the same value for each record Only one of
2 would then be considered further the process but not all three This is because once one of the attributes is taken to account; the other two do not offer any more formation computg the dissimilarity In computg similarity it would be seem desirable to take to account that multiple attributes are equal That is the more values two records have common the greater the similarity is between the records Attributes that have the same value for all records are disregarded For example a 5 Table 2 would be disregarded sce there is only one value 1 for all records The attribute does not offer any formation discerng between any of the records Attributes that are the negation of another such as a 4 with Table 2 are also disregarded Only one of a 4 or a 8 Table 2 would be considered Table 1 Middle East Situation example Table 4 Core values for Table a 4 a 5 a 6 a 7 a Table 2 Small example a 4 a Table 1 is modified for measurg dissimilarity One of the possible resultg modified sets is given Table 3 Table 3 Possible modified set from Table A graph is then constructed from the modified table There is a node for each record and a labeled edge between the nodes if removg an attribute would put the records the same equivalence class For example an edge is between and with the label sce these records would be the same equivalence class IND( ) Figure 5 shows the graph for Table 3 The dissimilarity between two records is computed by determg the length of the shortest path between the nodes the graph correspondg to the records For example the dissimilarity between and would be 2 Figure1 Graph for Table 3 Dissimilarity is the complement to similarity Because of the relationship between dissimilarity and similarity we could modify the above approach to quantify similarity between records A consideration to make modifyg this approach is the generalization to multi-valued attributes For example if one attribute has more than two or three values such as the make of a car {Ford GM Toyota Nissan BMW} 3 Convertg quantitative attributes to qualitative A common approach to measure similarity between records contag mixed data is to add the measurements of qualitative similarity and of quantitative similarity Without knowledge of the doma and specifically the data set description fdg an appropriate weightg to give reasonable results would be computationally expensive Methods to cluster quantitative data have been developed One possibility for the discovery of similar records multi-modal data would be to convert the quantitative attributes to one qualitative attribute accordg to the natural clusters the quantitative attributes The modified rough set dissimilarity analysis approach can then be applied
3 Table 5 gives an example mixed data set with one nomal ( ) one ordal ( : {A B C D F} order) and one discrete quantitative ( ) attribute Table 6 is the modified data set with the quantitative attribute Table 5 clustered to c 1 and c 2 The attribute a 3 is the label which the value for the record belongs Table 5 Example mixed data set Coke B 4 Coke C 2 Pepsi B 1 Pepsi A 1 Bud F 2 Heeken B 3 Table 6 Modified data set from Table 5 a 3 Coke B c2 Coke C c1 Pepsi B c1 Pepsi A c1 Bud F c1 Heeken B c2 A modified approach as Section 2 may now be applied to determe pair-wise record similarity Figure 2 shows the graph associated with Table 6 From the graph it can be seen that some modification to handle multivalued attributes needs to be made The graph is not connected Table 7 which provides the similarities also demonstrates this need The similarities are computed as: (D max -D ij )/D max where D max is the maximum dissimilarity over all pairs and D ij is the dissimilarity between r i and r j After followg the method a straight-forward manner at this pot it is unclear whether the difficulty normalizg between attributes is handled Regardless of the method used to cluster or granulize the data it is difficult to evaluate whether the results are reasonable That is if we give a small data set such as the one our example to a number of students how would they group the records? It seems that this situation is more suited to havg a fuzzy measure associated with a sgle particular groupg or a clusterg of records The kd of measure and how to defe the measure function is unclear at this pot In this case heuristics such as those used [3] [6] can be used to restrict the search space to those that would be more likely to have a higher measure of certaty Figure 2 Graph associated with Table 5 Table 7 Pairwise similarities for Table 5 1 2/3 2/3 1/3 1 2/3 2/3 1 1/ /3 2/3 1/3 1 2/3 1 2/3 1/3 0 2/ / /3 1/3 2/3 1/ Fusg quantitative and qualitative formation Metrics and methods have been developed to cluster data records that have only quantitative or only qualitative data It is possible that formation can be extracted by the fusion of the methods results or the different measures Metrics are defed or can be normalized to the terval [01] Quantitative measures lie on the whole contuous terval while qualitative measures lie on a discrete lear subset of the terval 41 Fusg quantitative and qualitative partitions Metrics and methods have been developed to cluster records contag only one type of data The results of these methods and metrics have different meangs The characteristics that contribute to the similarity measures are different It is possible that rough sets can be used the fusion of the results of existg methods for the two sets of dimensions Let C q (X) denote a clusterg method for the quantitative dimensions of the data set X (X) denote the clusterg method for qualitative dimensions of the data set X C q clusters X based only on the quantitative attributes clusters X based only on the qualitative attributes Let C q (X)={q 1 q 2 q k } and C n (X)={n 1 n 2 n m } where the sets of q i are the clusters that result accordg to the quantitative and qualitative attributes respectively Table 9 shows one possibility for the results of C q applied to the simple example data set Table 5 Note that the q i
4 are arbitrary and are not a result of any specific metric or method There is one q i and one n i for every record Let s i = q i n i for a given r i The set s i contas all of the records considered similar to the record r i accordg to some quantitative and/or some qualitative metric or method There may be some order of the elements q i accordg to the similarity to r i That is given q i ={q i(1) q i(2) q i(k) } where q i(j) is the j th record the set q i it may be the case that s(r i q i(j) ) s(r i q i(k) ) s(r i q i(m) ) for any j k and m The same may be true for the set n i Table 8 Possible C q for Table 6 C q q q q C n n n 2 Table 9 S i for Table 8 s s s 3 s 4 s s Table 9 gives the s i for the example from Table 8 We can fer from the s i that the similarity between and is greater than the similarity between and Records and have the same membership for a greater number of s i (all of the s i ) than the pair and which differ s 1 We can also fer that and belong together the overall clusterg of the data set For each s i and have the same membership Thus far we have not addressed a weightg of attributes For example if there are 2 qualitative dimensions and 10 quantitative dimensions it seems reasonable that the q i would have more weight determg the overall clusters The overall clusterg would be more like the resultg quantitative clusters The fact that there may exist an order to each of the sets leads to the idea that rough sets may be used the development of a fuzzy measure The measure may be either a specific group identified as beg similar or an overall clusterg of mixed data are given the followg: C n ={{x 1 }{x 2 x 3 }{x 5 }} and C q ={{x 1 x 2 x 3 x 5 }{x 4 x 6 }} Both {x 1 } and {x 5 } are different clusters C q Suppose that the qualitative similarity between {x 1 } is maximal while the qualitative similarities between {x 2 x 3 } and {x 5 } are less than maximal Suppose also that the quantitative similarity between {x 1 } is mimal while the quantitative similarity between {x 5 } is greater than mimal We are not able to compare these similarities to determe if either pair should be kept together It may be more useful to consider the pair-wise qualitative and quantitative similarities One can consider "rough sets" from the perspective of each record In other words there are those records which defitely belong the same cluster as the record (lower approximation) those that defitely do not belong the same cluster and those that it is uncerta whether they belong the same cluster (boundary) Each of these can be determed by given similarity values For example we can say that for any two records if the similarity measurement is less than some threshold then they are not each others cluster approximation One can defe a similar threshold for those records that defitely belong the same cluster What these thresholds should be are subjective both to a particular doma and the metric that it used Suppose we have the followg similarity matrices for qualitative and quantitative dimensions respectively Table 10 and Table 11 The qualitative measure is computed as: number of matchg attribute values / number of qualitative attributes The quantitative measure is computed as: xik x jk 1 k quantitative R k where x mk is the k th attribute value for record m and R k is the range of attribute k Table 12 and Table 13 give the approximations with the lower threshold 1/2 and the upper approximation threshold of 9/10 0 denotes that the record is not the approximation 1 denotes that the record is the lower approximation Lastly '--' denotes that the record is the boundary For example both Table 12 and Table 13 is the boundary for From Table 12 and Table 13 we can see that for the cluster cludg the most likely record the same cluster would be sce it is both approximations One could use a similar idea to section 41 and use the union of the upper approximations to determe likely clusters For example { } based on the sets for both tables 42 Fusg qualitative and quantitative measures The sets C q provide less formation than havg pair-wise similarity measurements Suppose we
5 Table 10 Qualitative similarities for Table 5 1 1/2 1/ /2 1/ / /2 0 1/ / /2 0 1/ Table 11 Quantitative similarities for Table 5 1 1/ /3 2/3 1/3 1 2/3 2/3 1 2/3 0 2/ /3 1/3 0 2/ /3 1/3 1/3 1 2/3 2/3 1 2/3 2/3 2/3 1/3 1/3 2/3 1 The difficulty comparg different measures is still present because there still exists the problem of which approximation a resultg cluster should be more like For example sce the thresholds and therefore the equivalence relations are based on two different measures we cannot fer whether a likely result would be { }{ }{ }{ } or { } For this reason it would seem that a fuzzy measure is needed for the unsupervised discovery of similar records mixed data Table 12 Approximations for qualitative attributes Table 13 Approximations for quantitative attributes Summary This paper discussed two approaches for determg similarity between records of mixed data From both ideas it can be seen that due to the uncertaty and vagueness of qualitative data and of tryg to combe metrics leave rough set theory as an optional tool to be used As concluded the discussion an additional or other approach is needed the discovery of similar groups of records with data sets of mixed data References [1] Everitt B Cluster Analysis 3rd ed Hodder & Stoughton London 1993 [2] Han J and Kamber M Data Mg: Concepts and Techniques Morgan Kaufmann San Francisco 2001 [3] He A Unsupervised Data Mg by Recursive Partitiong Masters Thesis University of Ccnati June pg [4] Pawlak Z Rough Sets: Theoretical Aspects of Reasong About Data Kluwer Academic Publishers Dordrecht 1991 [5] Sneath P and Sokal R Numerical Taxonomy W H Freeman San Francisco 1973 [6] Zhu Y Unsupervised Database Discovery Based on Artificial Intelligence Techniques Masters Thesis University of Ccnati June pg
Granulating Data On Non-Scalar Attribute Values
Granulating Data On Non-Scalar Attribute Values Lawrence Mazlack Sarah Coppock Computer Science University of Cincinnati Cincinnati, Ohio 45220 {mazlack, coppocs}@uc.edu Abstract Data mining discouvers
More informationLINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY. 1. Introduction
LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY ELIZABETH GROSS, HEATHER HARRINGTON, NICOLETTE MESHKAT, AND ANNE SHIU Abstract. This work focuses on the
More informationIncorporating uncertainty in the design of water distribution systems
European Water 58: 449-456, 2017. 2017 E.W. Publications Incorporatg uncertaty the design of water distribution systems M. Spiliotis 1* and G. Tsakiris 2 1 Division of Hydraulic Engeerg, Department of
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationData Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td
Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak
More informationFeature Selection with Fuzzy Decision Reducts
Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium
More informationCHAPTER-17. Decision Tree Induction
CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationEfficient Heuristics for Two-Echelon Spare Parts Inventory Systems with An Aggregate Mean Waiting Time Constraint Per Local Warehouse
Efficient Heuristics for Two-Echelon Spare Parts Inventory Systems with An Aggregate Mean Waitg Time Constrat Per Local Warehouse Hartanto Wong a, Bram Kranenburg b, Geert-Jan van Houtum b*, Dirk Cattrysse
More information1.2 Valid Logical Equivalences as Tautologies
1.2. VALID LOGICAL EUIVALENCES AS TAUTOLOGIES 15 1.2 Valid Logical Equivalences as Tautologies 1.2.1 The Idea, and Defition, of Logical Equivalence In lay terms, two statements are logically equivalent
More informationFinding Stable Matchings That Are Robust to Errors in the Input
Fdg Stable Matchgs That Are Robust to Errors the Input Tung Mai Georgia Institute of Technology, Atlanta, GA, USA tung.mai@cc.gatech.edu Vijay V. Vazirani University of California, Irve, Irve, CA, USA
More informationMechanical thermal expansion correction design for an ultrasonic flow meter
Mechanical thermal expansion correction design for an ultrasonic flow meter Emil Martson* and Jerker Delsg EISLAB, Dept. of Computer Science and Electrical Engeerg, Luleå University of Technology, SE-97
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 1
Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationA Simple and Efficient Initialization Strategy for Optimizing Water-Using Network Designs
Ind. Eng. Chem. Res. 2007, 46, 8781-8786 8781 A Simple and Efficient Initialization Strategy for Optimizg Water-Usg Network Designs Bao-Hong Li* Department of Chemical Engeerg, Dalian Nationalities UniVersity,
More informationInterpreting Low and High Order Rules: A Granular Computing Approach
Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:
More informationInduction of Decision Trees
Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.
More information5.2 Single-Stub Tuning
3/26/29 5_2 Sgle_Stub Tung.doc 1/1 5.2 Sgle-Stub Tung Readg Assignment: pp. 228-235 Q: If we cannot use lumped elements like ductors or capacitors to build lossless matchg networks, what can we use? A:
More informationElectronics Lecture 8 AC circuit analysis using phasors
Electronics Lecture 8 A circuit analysis usg phasors 8. Introduction The preious lecture discussed the transient response of an circuit to a step oltage by switchg a battery. This lecture will estigate
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationA new Approach to Drawing Conclusions from Data A Rough Set Perspective
Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy
More informationValue Range Propagation
Value Range Propagation LLVM Adam Wiggs Patryk Zadarnowski {awiggs,patrykz}@cse.unsw.edu.au 17 June 2003 University of New South Wales Sydney MOTIVATION The Problem: X Layout of basic blocks LLVM is bra-dead.
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationConcept Lattices in Rough Set Theory
Concept Lattices in Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca URL: http://www.cs.uregina/ yyao Abstract
More informationAutonomous Control of Production Networks using a Pheromone Approach
Autonomous Control of Production Networks usg a Pheromone Approach D. Armbruster, C. de Beer, M. Freitag, T. Jagalski, C. Rghofer 3 Abstract To manage the creasg dynamics with complex production networks,
More informationA Probabilistic Language based upon Sampling Functions
A Probabilistic Language based upon Samplg Functions Sungwoo Park Frank Pfenng Computer Science Department Carnegie Mellon University {gla,fp}@cs.cmu.com Sebastian Thrun Computer Science Department Stanford
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More informationEasy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix
Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo
More informationInterface for module Ptset
Interface for module Ptset 1 1 Interface for module Ptset 1. Sets of tegers implemented as Patricia trees. The followg signature is exactly Set.S with type elt = t, with the same specifications. This is
More informationNeural Networks. ICS 273A UC Irvine Instructor: Max Welling
Neural Networks ICS 273A UC Irve Instructor: Max Wellg Neurons 1 b Neurons communicate by receivg signals on their dendrites. Addg these signals and firg off a new signal along the axon if the total put
More informationDrawing Conclusions from Data The Rough Set Way
Drawing Conclusions from Data The Rough et Way Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of ciences, ul Bałtycka 5, 44 000 Gliwice, Poland In the rough set theory
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationMATHEMATICAL IN COMPUTER SCIENCE
MATHEMATICAL IN COMPUTER SCIENCE Hieu Vu, Ph.D. American University of Nigeria 98 LamidoZubairu Way Yola by Pass, P.M.B. 2250, Yola, Adamawa State, Nigeria Abstract Mathematics has been known for provg
More informationA Functional Perspective
A Functional Perspective on SSA Optimization Algorithms Patryk Zadarnowski jotly with Manuel M. T. Chakravarty Gabriele Keller 19 April 2004 University of New South Wales Sydney THE PLAN ➀ Motivation an
More informationA Factorization Approach To Evaluating Simultaneous Influence Diagrams
JOURNAL OF L A TEX CLASS FILES, VOL., NO., NOVEMBER 00 A Factorization Approach To Evaluatg Simultaneous Influence Diagrams Weihong Zhang and Qiang Ji Senior Member, IEEE Abstract Evaluatg an fluence diagram
More informationCONVERGENCE OF FOURIER SERIES
CONVERGENCE OF FOURIER SERIES SOPHIA UE Abstract. The subject of Fourier analysis starts as physicist and mathematician Joseph Fourier s conviction that an arbitrary function f could be given as a series.
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationDATA MINING WITH DIFFERENT TYPES OF X-RAY DATA
DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA 315 C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA
More informationThermodynamics [ENGR 251] [Lyes KADEM 2007]
CHAPTER V The first law of thermodynamics is a representation of the conservation of energy. It is a necessary, but not a sufficient, condition for a process to occur. Indeed, no restriction is imposed
More informationShow that the following problems are NP-complete
Show that the following problems are NP-complete April 7, 2018 Below is a list of 30 exercises in which you are asked to prove that some problem is NP-complete. The goal is to better understand the theory
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationARPN Journal of Science and Technology All rights reserved.
Rule Induction Based On Boundary Region Partition Reduction with Stards Comparisons Du Weifeng Min Xiao School of Mathematics Physics Information Engineering Jiaxing University Jiaxing 34 China ABSTRACT
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS
International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationMining Approximative Descriptions of Sets Using Rough Sets
Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu
More informationInflow and Outflow Signatures in. Flowing Wellbore Electrical-Conductivity Logs. Abstract
LBNL-51468 Inflow and Outflow Signatures Flowg Wellbore Electrical-onductivity Logs hriste Doughty and h-fu Tsang Earth Sciences Division E.O. Lawrence Berkeley National Laboratory University of alifornia
More informationPOTENTIAL TURBULENCE MODEL PREDICTIONS OF FLOW PAST A TRIANGULAR CYLINDER USING AN UNSTRUCTURED STAGGERED MESH METHOD
POTENTIAL TURBULENCE MODEL PREDICTIONS OF FLOW PAST A TRIANGULAR CYLINDER USING AN UNSTRUCTURED STAGGERED MESH METHOD Xg Zhang Blair Perot Department of Mechanical and Industrial Engeerg, University of
More informationSimilarity-based Classification with Dominance-based Decision Rules
Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,
More informationDecision tables and decision spaces
Abstract Decision tables and decision spaces Z. Pawlak 1 Abstract. In this paper an Euclidean space, called a decision space is associated with ever decision table. This can be viewed as a generalization
More informationROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING
ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More information"Critical Experiment"
TECHNICAL UNIVERSITY DRESDEN Institute of Power Engeerg Trag Reactor Reactor Trag Course Experiment "Critical Experiment" Instruction for Experiment Critical Experiment Content: 1... Motivation 2... Tasks
More informationIssues in Modeling for Data Mining
Issues in Modeling for Data Mining Tsau Young (T.Y.) Lin Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192 tylin@cs.sjsu.edu ABSTRACT Modeling in data mining has
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationUPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM
Int. J. Appl. Math. Comput. Sci., 2002, Vol.12, No.3, 359 369 UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM ISMAIL BURHAN TÜRKŞEN Knowledge/Intelligence Systems
More informationUncertainty Analysis of the Temperature Resistance Relationship of Temperature Sensing Fabric
fibers Article Uncertaty Analysis Temperature Resistance Relationship Temperature Sensg Fabric Muhammad Dawood Husa 1, Ozgur Atalay 2,3, *, Asli Atalay 3,4 Richard Kennon 5 1 Textile Engeerg Department,
More informationUncertain Fuzzy Rough Sets. LI Yong-jin 1 2
Uncertain Fuzzy Rough Sets LI Yong-jin 1 2 (1. The Institute of Logic and Cognition, Zhongshan University, Guangzhou 510275, China; 2. Department of Mathematics, Zhongshan University, Guangzhou 510275,
More informationBanacha Warszawa Poland s:
Chapter 12 Rough Sets and Rough Logic: A KDD Perspective Zdzis law Pawlak 1, Lech Polkowski 2, and Andrzej Skowron 3 1 Institute of Theoretical and Applied Informatics Polish Academy of Sciences Ba ltycka
More informationAPPLICATION FOR LOGICAL EXPRESSION PROCESSING
APPLICATION FOR LOGICAL EXPRESSION PROCESSING Marcin Michalak, Michał Dubiel, Jolanta Urbanek Institute of Informatics, Silesian University of Technology, Gliwice, Poland Marcin.Michalak@polsl.pl ABSTRACT
More informationAchievement of Course Outcomes in Basic Thermodynamics Course based on Students Perception
Achievement of Course Outcomes Basic Thermodynamics Course based on Students Perception SITI ROZAIMAH SHEIKH ABDULLAH,, MOHD SHAHBUDDIN MASTAR & HASSIMI ABU HASSAN Centre for Engeerg Education Research,
More informationc(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)
Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation
More informationResearch Article Special Approach to Near Set Theory
Mathematical Problems in Engineering Volume 2011, Article ID 168501, 10 pages doi:10.1155/2011/168501 Research Article Special Approach to Near Set Theory M. E. Abd El-Monsef, 1 H. M. Abu-Donia, 2 and
More informationAveraging of the inelastic cross-section measured by the CDF and the E811 experiments.
Averagg of the astic cross-section measured by the CDF and the E8 experiments. S. Klimenko, J. Konigsberg, T. Liss. Introduction In un II the Tevatron lumosity is measured usg the system of Cherenkov Lumosity
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data cleaning
More informationVirtual Control Policy for Binary Ordered Resources Petri Net Class
sensors Article Virtual Control Policy Bary Ordered Resources Petri Net Class Carlos A. Rovet, Tomás J. Concepción Elia Esr Cano Computer Systems Engeerg Department, Technological University Panama, 0819-07289,
More informationA Working Distance Formula for Night Vision Devices Quality Preliminary Information *
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6 No 3 Sofia 2006 A Workg Distance Formula for Night Vision Devices Quality Prelimary Information * Daniela Borissova Ivan
More informationA Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties
A Patent Document Retrieval System Addressg Both Semantic and Syntactic Properties Liang Chen Naoyuki Tokuda Hisahiro Adachi Computer Science Department University of Northern British Columbia Prce George,
More informationTURBULENT VORTEX SHEDDING FROM TRIANGLE CYLINDER USING THE TURBULENT BODY FORCE POTENTIAL MODEL
Proceedgs of ASME FEDSM ASME 2 Fluids Engeerg Division Summer Meetg June 11-15, 2 Boston, Massachusetts FEDSM2-11172 TURBULENT VORTEX SHEDDING FROM TRIANGLE CYLINDER USING THE TURBULENT BODY FORCE POTENTIAL
More information2.4 The Smith Chart. Reading Assignment: pp The Smith Chart. The Smith Chart provides: The most important fact about the Smith Chart is:
2/7/2005 2_4 The Smith Chart 1/2 2.4 The Smith Chart Readg Assignment: pp. 64-73 The Smith Chart The Smith Chart provides: 1) 2) The most important fact about the Smith Chart is: HO: The Complex Γ plane
More informationSpectral Clustering. Zitao Liu
Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of
More informationThe Lefthanded Local Lemma characterizes chordal dependency graphs
The Lefthanded Local Lemma characterizes chordal dependency graphs Wesley Pegden March 30, 2012 Abstract Shearer gave a general theorem characterizing the family L of dependency graphs labeled with probabilities
More informationMATHEMATICS OF DATA FUSION
MATHEMATICS OF DATA FUSION by I. R. GOODMAN NCCOSC RDTE DTV, San Diego, California, U.S.A. RONALD P. S. MAHLER Lockheed Martin Tactical Defences Systems, Saint Paul, Minnesota, U.S.A. and HUNG T. NGUYEN
More informationInteraction Analysis of Spatial Point Patterns
Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More information2.4 Parsing. Computer Science 332. Compiler Construction. Chapter 2: A Simple One-Pass Compiler : Parsing. Top-Down Parsing
Computer Science 332 Compiler Construction Chapter 2: A Simple One-Pass Compiler 2.4-2.5: Parsg 2.4 Parsg Parsg : the process of determg whether a strg S is generated by a grammar G Short answer is yes/no
More informationInderjit Dhillon The University of Texas at Austin
Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance
More informationLecture 12 : Graph Laplacians and Cheeger s Inequality
CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful
More informationComparison of Rough-set and Interval-set Models for Uncertain Reasoning
Yao, Y.Y. and Li, X. Comparison of rough-set and interval-set models for uncertain reasoning Fundamenta Informaticae, Vol. 27, No. 2-3, pp. 289-298, 1996. Comparison of Rough-set and Interval-set Models
More informationData Mining 4. Cluster Analysis
Data Mining 4. Cluster Analysis 4.2 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationLecture notes 1: ECEN 489
Lecture notes : ECEN 489 Power Management Circuits and Systems Department of Electrical & Computer Engeerg Texas A&M University Jose Silva-Martez January 207 Copyright Texas A&M University. All rights
More informationSTATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS
STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables
More informationDecision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18
Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner
More informationBranch-and-Cut for the Split Delivery Vehicle Routing Problem with Time Windows
Gutenberg School of Management and Economics & Research Unit Interdisciplary Public Policy Discussion Paper Series Branch-and-Cut for the Split Delivery Vehicle Routg Problem with Time Wdows Nicola Bianchessi
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationType of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr
Foundation of Data Mining i Topic: Data CMSC 49D/69D CSEE Department, e t, UMBC Some of the slides used in this presentation are prepared by Jiawei Han and Micheline Kamber Data Data types Quality of data
More informationi jand Y U. Let a relation R U U be an
Dependency Through xiomatic pproach On Rough Set Theory Nilaratna Kalia Deptt. Of Mathematics and Computer Science Upendra Nath College, Nalagaja PIN: 757073, Mayurbhanj, Orissa India bstract: The idea
More informationA Logical Formulation of the Granular Data Model
2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University
More informationarxiv: v1 [cs.ai] 7 Sep 2016
Equilibrium Graphs Pedro Cabalar, Carlos Pérez, and Gilberto Pérez Department of Computer Science University of Corunna, Spa {cabalar,c.pramil,gperez}@udc.es arxiv:1609.02010v1 [cs.ai] 7 Sep 2016 Abstract.
More informationExtended breadth-first search algorithm in practice
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 59 66 doi: 10.14794/ICAI.9.2014.1.59 Extended breadth-first search algorithm
More informationHierarchical Clustering via Spreading Metrics
Journal of Machine Learning Research 18 2017) 1-35 Submitted 2/17; Revised 5/17; Published 8/17 Hierarchical Clustering via Spreading Metrics Aurko Roy College of Computing Georgia Institute of Technology
More informationRough operations on Boolean algebras
Rough operations on Boolean algebras Guilin Qi and Weiru Liu School of Computer Science, Queen s University Belfast Belfast, BT7 1NN, UK Abstract In this paper, we introduce two pairs of rough operations
More informationHigh Frequency Rough Set Model based on Database Systems
High Frequency Rough Set Model based on Database Systems Kartik Vaithyanathan kvaithya@gmail.com T.Y.Lin Department of Computer Science San Jose State University San Jose, CA 94403, USA tylin@cs.sjsu.edu
More information3) Aft bolted connection analysis: (See Figure 1.0)
Given: Both static and dynamic (fatigue) failure criteria will be used. A mimum factor of safety =2 will be adhered to. For fatigue analysis the ASME elliptic model with Von Mises equivalent stress will
More informationIndex. C, system, 8 Cech distance, 549
Index PF(A), 391 α-lower approximation, 340 α-lower bound, 339 α-reduct, 109 α-upper approximation, 340 α-upper bound, 339 δ-neighborhood consistent, 291 ε-approach nearness, 558 C, 443-2 system, 8 Cech
More informationThe three-dimensional matching problem in Kalmanson matrices
DOI 10.1007/s10878-011-9426-y The three-dimensional matching problem in Kalmanson matrices Sergey Polyakovskiy Frits C.R. Spieksma Gerhard J. Woeginger The Author(s) 2011. This article is published with
More informationClustering Lecture 1: Basics. Jing Gao SUNY Buffalo
Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering
More information