Rough Sets Used in the Measurement of Similarity of Mixed Mode Data

Size: px
Start display at page:

Download "Rough Sets Used in the Measurement of Similarity of Mixed Mode Data"

Transcription

1 Rough Sets Used the Measurement of Similarity of Mixed Mode Data Sarah Coppock Lawrence Mazlack Applied Artificial Intelligence Laboratory ECECS Department University of Ccnati Ccnati Ohio Abstract Similarity is important knowledge discovery Cluster analysis classification and granulation each volve some notion or defition of similarity The measurement of similarity is selected based on the doma and distribution of the data Even with a specific doma some similarity metrics may be considered more useful than others There is an amount of uncertaty quantitatively measurg the similarity between records of mixed data The uncertaty develops from the lack of scale that both nomal and ordal data have Rough set theory is one tool developed for handlg uncertaty Rough sets can be used dissimilarity analysis of qualitative data It would seem that rough sets could be applied measurg similarity between records contag both quantitative and qualitative data for the purpose of clusterg the records 1 Introduction Similarity metrics are used many fields When determg the similarity between records a data set that contas different kds of data a certa amount of uncertaty is troduced While metrics such as Euclidean and its generalized Mkowski metrics can be used when all of the data is quantitative (both discrete and contuous) it is not as easy to usefully combe scalar metrics representg qualitative data (both nomal and ordal) Data can be categorized to qualitative and quantitative data Qualitative data can be further described as either ordal or nomal Ordal data has order without scale eg small medium large Nomal data has no order and no scale eg Ccnati Tampa Atlanta Data such as cities and colors can be argued as havg some order" eg latitudes or longitudes and frequencies However the case of unsupervised learng such knowledge is not explicitly known by the learng algorithms For a more detailed discussion of data varieties see [1] [2] Similarity is important knowledge discovery Cluster analysis classification and granulation each volve some notion or defition of similarity Measurg similarity between multidimensional multi-modal data is difficult but offers formation The formation provided by clusterg records based on similarity measurements cludes an overall distribution of the data and discovery of possible outliers The measurement of similarity may be appropriately selected based on the doma and distribution of the data Even with a doma there may be some similarity metrics considered more useful than others There is an amount of uncertaty quantitatively measurg similarity (or dissimilarity) between records of mixed kds of data The uncertaty develops from the fact that both nomal and ordal data lack a natural fixed scale For example should we say that the similarity between red and orange is more or less than (or equal to) the similarity between blue and green? Some metrics assign a Boolean value for whether the values match The similarity would be considered equal Rough set theory is one of the tools developed for handlg uncertaty Pawlak [4] demonstrates how rough sets can be used dissimilarity analysis of qualitative data It would seem that rough sets could be applied measurg similarity between records contag both quantitative and qualitative data 2 Rough sets dissimilarity analysis Rough sets are built on the notion of discernibility There is an equivalence relation imposed on the items based on attribute values For example Table 1 and are considered discernible with respect to 0 denoted IND(0 ) A similar idea is used similarity metrics for nomal and ordal data A simple matchg technique where a Boolean 0 o is assigned based on whether two attribute values are the same for two records Pawlak [4] describes applyg rough sets to measure dissimilarity between records of Boolean values A brief description of Pawlak s method to measure dissimilarity usg rough sets follows usg his Middle East Situation example This example is given Table 1 Any attributes that have the same values as another attribute for all records ie equivalent attributes to one another are disregarded For example Table 2 have the same value for each record Only one of

2 would then be considered further the process but not all three This is because once one of the attributes is taken to account; the other two do not offer any more formation computg the dissimilarity In computg similarity it would be seem desirable to take to account that multiple attributes are equal That is the more values two records have common the greater the similarity is between the records Attributes that have the same value for all records are disregarded For example a 5 Table 2 would be disregarded sce there is only one value 1 for all records The attribute does not offer any formation discerng between any of the records Attributes that are the negation of another such as a 4 with Table 2 are also disregarded Only one of a 4 or a 8 Table 2 would be considered Table 1 Middle East Situation example Table 4 Core values for Table a 4 a 5 a 6 a 7 a Table 2 Small example a 4 a Table 1 is modified for measurg dissimilarity One of the possible resultg modified sets is given Table 3 Table 3 Possible modified set from Table A graph is then constructed from the modified table There is a node for each record and a labeled edge between the nodes if removg an attribute would put the records the same equivalence class For example an edge is between and with the label sce these records would be the same equivalence class IND( ) Figure 5 shows the graph for Table 3 The dissimilarity between two records is computed by determg the length of the shortest path between the nodes the graph correspondg to the records For example the dissimilarity between and would be 2 Figure1 Graph for Table 3 Dissimilarity is the complement to similarity Because of the relationship between dissimilarity and similarity we could modify the above approach to quantify similarity between records A consideration to make modifyg this approach is the generalization to multi-valued attributes For example if one attribute has more than two or three values such as the make of a car {Ford GM Toyota Nissan BMW} 3 Convertg quantitative attributes to qualitative A common approach to measure similarity between records contag mixed data is to add the measurements of qualitative similarity and of quantitative similarity Without knowledge of the doma and specifically the data set description fdg an appropriate weightg to give reasonable results would be computationally expensive Methods to cluster quantitative data have been developed One possibility for the discovery of similar records multi-modal data would be to convert the quantitative attributes to one qualitative attribute accordg to the natural clusters the quantitative attributes The modified rough set dissimilarity analysis approach can then be applied

3 Table 5 gives an example mixed data set with one nomal ( ) one ordal ( : {A B C D F} order) and one discrete quantitative ( ) attribute Table 6 is the modified data set with the quantitative attribute Table 5 clustered to c 1 and c 2 The attribute a 3 is the label which the value for the record belongs Table 5 Example mixed data set Coke B 4 Coke C 2 Pepsi B 1 Pepsi A 1 Bud F 2 Heeken B 3 Table 6 Modified data set from Table 5 a 3 Coke B c2 Coke C c1 Pepsi B c1 Pepsi A c1 Bud F c1 Heeken B c2 A modified approach as Section 2 may now be applied to determe pair-wise record similarity Figure 2 shows the graph associated with Table 6 From the graph it can be seen that some modification to handle multivalued attributes needs to be made The graph is not connected Table 7 which provides the similarities also demonstrates this need The similarities are computed as: (D max -D ij )/D max where D max is the maximum dissimilarity over all pairs and D ij is the dissimilarity between r i and r j After followg the method a straight-forward manner at this pot it is unclear whether the difficulty normalizg between attributes is handled Regardless of the method used to cluster or granulize the data it is difficult to evaluate whether the results are reasonable That is if we give a small data set such as the one our example to a number of students how would they group the records? It seems that this situation is more suited to havg a fuzzy measure associated with a sgle particular groupg or a clusterg of records The kd of measure and how to defe the measure function is unclear at this pot In this case heuristics such as those used [3] [6] can be used to restrict the search space to those that would be more likely to have a higher measure of certaty Figure 2 Graph associated with Table 5 Table 7 Pairwise similarities for Table 5 1 2/3 2/3 1/3 1 2/3 2/3 1 1/ /3 2/3 1/3 1 2/3 1 2/3 1/3 0 2/ / /3 1/3 2/3 1/ Fusg quantitative and qualitative formation Metrics and methods have been developed to cluster data records that have only quantitative or only qualitative data It is possible that formation can be extracted by the fusion of the methods results or the different measures Metrics are defed or can be normalized to the terval [01] Quantitative measures lie on the whole contuous terval while qualitative measures lie on a discrete lear subset of the terval 41 Fusg quantitative and qualitative partitions Metrics and methods have been developed to cluster records contag only one type of data The results of these methods and metrics have different meangs The characteristics that contribute to the similarity measures are different It is possible that rough sets can be used the fusion of the results of existg methods for the two sets of dimensions Let C q (X) denote a clusterg method for the quantitative dimensions of the data set X (X) denote the clusterg method for qualitative dimensions of the data set X C q clusters X based only on the quantitative attributes clusters X based only on the qualitative attributes Let C q (X)={q 1 q 2 q k } and C n (X)={n 1 n 2 n m } where the sets of q i are the clusters that result accordg to the quantitative and qualitative attributes respectively Table 9 shows one possibility for the results of C q applied to the simple example data set Table 5 Note that the q i

4 are arbitrary and are not a result of any specific metric or method There is one q i and one n i for every record Let s i = q i n i for a given r i The set s i contas all of the records considered similar to the record r i accordg to some quantitative and/or some qualitative metric or method There may be some order of the elements q i accordg to the similarity to r i That is given q i ={q i(1) q i(2) q i(k) } where q i(j) is the j th record the set q i it may be the case that s(r i q i(j) ) s(r i q i(k) ) s(r i q i(m) ) for any j k and m The same may be true for the set n i Table 8 Possible C q for Table 6 C q q q q C n n n 2 Table 9 S i for Table 8 s s s 3 s 4 s s Table 9 gives the s i for the example from Table 8 We can fer from the s i that the similarity between and is greater than the similarity between and Records and have the same membership for a greater number of s i (all of the s i ) than the pair and which differ s 1 We can also fer that and belong together the overall clusterg of the data set For each s i and have the same membership Thus far we have not addressed a weightg of attributes For example if there are 2 qualitative dimensions and 10 quantitative dimensions it seems reasonable that the q i would have more weight determg the overall clusters The overall clusterg would be more like the resultg quantitative clusters The fact that there may exist an order to each of the sets leads to the idea that rough sets may be used the development of a fuzzy measure The measure may be either a specific group identified as beg similar or an overall clusterg of mixed data are given the followg: C n ={{x 1 }{x 2 x 3 }{x 5 }} and C q ={{x 1 x 2 x 3 x 5 }{x 4 x 6 }} Both {x 1 } and {x 5 } are different clusters C q Suppose that the qualitative similarity between {x 1 } is maximal while the qualitative similarities between {x 2 x 3 } and {x 5 } are less than maximal Suppose also that the quantitative similarity between {x 1 } is mimal while the quantitative similarity between {x 5 } is greater than mimal We are not able to compare these similarities to determe if either pair should be kept together It may be more useful to consider the pair-wise qualitative and quantitative similarities One can consider "rough sets" from the perspective of each record In other words there are those records which defitely belong the same cluster as the record (lower approximation) those that defitely do not belong the same cluster and those that it is uncerta whether they belong the same cluster (boundary) Each of these can be determed by given similarity values For example we can say that for any two records if the similarity measurement is less than some threshold then they are not each others cluster approximation One can defe a similar threshold for those records that defitely belong the same cluster What these thresholds should be are subjective both to a particular doma and the metric that it used Suppose we have the followg similarity matrices for qualitative and quantitative dimensions respectively Table 10 and Table 11 The qualitative measure is computed as: number of matchg attribute values / number of qualitative attributes The quantitative measure is computed as: xik x jk 1 k quantitative R k where x mk is the k th attribute value for record m and R k is the range of attribute k Table 12 and Table 13 give the approximations with the lower threshold 1/2 and the upper approximation threshold of 9/10 0 denotes that the record is not the approximation 1 denotes that the record is the lower approximation Lastly '--' denotes that the record is the boundary For example both Table 12 and Table 13 is the boundary for From Table 12 and Table 13 we can see that for the cluster cludg the most likely record the same cluster would be sce it is both approximations One could use a similar idea to section 41 and use the union of the upper approximations to determe likely clusters For example { } based on the sets for both tables 42 Fusg qualitative and quantitative measures The sets C q provide less formation than havg pair-wise similarity measurements Suppose we

5 Table 10 Qualitative similarities for Table 5 1 1/2 1/ /2 1/ / /2 0 1/ / /2 0 1/ Table 11 Quantitative similarities for Table 5 1 1/ /3 2/3 1/3 1 2/3 2/3 1 2/3 0 2/ /3 1/3 0 2/ /3 1/3 1/3 1 2/3 2/3 1 2/3 2/3 2/3 1/3 1/3 2/3 1 The difficulty comparg different measures is still present because there still exists the problem of which approximation a resultg cluster should be more like For example sce the thresholds and therefore the equivalence relations are based on two different measures we cannot fer whether a likely result would be { }{ }{ }{ } or { } For this reason it would seem that a fuzzy measure is needed for the unsupervised discovery of similar records mixed data Table 12 Approximations for qualitative attributes Table 13 Approximations for quantitative attributes Summary This paper discussed two approaches for determg similarity between records of mixed data From both ideas it can be seen that due to the uncertaty and vagueness of qualitative data and of tryg to combe metrics leave rough set theory as an optional tool to be used As concluded the discussion an additional or other approach is needed the discovery of similar groups of records with data sets of mixed data References [1] Everitt B Cluster Analysis 3rd ed Hodder & Stoughton London 1993 [2] Han J and Kamber M Data Mg: Concepts and Techniques Morgan Kaufmann San Francisco 2001 [3] He A Unsupervised Data Mg by Recursive Partitiong Masters Thesis University of Ccnati June pg [4] Pawlak Z Rough Sets: Theoretical Aspects of Reasong About Data Kluwer Academic Publishers Dordrecht 1991 [5] Sneath P and Sokal R Numerical Taxonomy W H Freeman San Francisco 1973 [6] Zhu Y Unsupervised Database Discovery Based on Artificial Intelligence Techniques Masters Thesis University of Ccnati June pg

Granulating Data On Non-Scalar Attribute Values

Granulating Data On Non-Scalar Attribute Values Granulating Data On Non-Scalar Attribute Values Lawrence Mazlack Sarah Coppock Computer Science University of Cincinnati Cincinnati, Ohio 45220 {mazlack, coppocs}@uc.edu Abstract Data mining discouvers

More information

LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY. 1. Introduction

LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY. 1. Introduction LINEAR COMPARTMENTAL MODELS: INPUT-OUTPUT EQUATIONS AND OPERATIONS THAT PRESERVE IDENTIFIABILITY ELIZABETH GROSS, HEATHER HARRINGTON, NICOLETTE MESHKAT, AND ANNE SHIU Abstract. This work focuses on the

More information

Incorporating uncertainty in the design of water distribution systems

Incorporating uncertainty in the design of water distribution systems European Water 58: 449-456, 2017. 2017 E.W. Publications Incorporatg uncertaty the design of water distribution systems M. Spiliotis 1* and G. Tsakiris 2 1 Division of Hydraulic Engeerg, Department of

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak

More information

Feature Selection with Fuzzy Decision Reducts

Feature Selection with Fuzzy Decision Reducts Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Efficient Heuristics for Two-Echelon Spare Parts Inventory Systems with An Aggregate Mean Waiting Time Constraint Per Local Warehouse

Efficient Heuristics for Two-Echelon Spare Parts Inventory Systems with An Aggregate Mean Waiting Time Constraint Per Local Warehouse Efficient Heuristics for Two-Echelon Spare Parts Inventory Systems with An Aggregate Mean Waitg Time Constrat Per Local Warehouse Hartanto Wong a, Bram Kranenburg b, Geert-Jan van Houtum b*, Dirk Cattrysse

More information

1.2 Valid Logical Equivalences as Tautologies

1.2 Valid Logical Equivalences as Tautologies 1.2. VALID LOGICAL EUIVALENCES AS TAUTOLOGIES 15 1.2 Valid Logical Equivalences as Tautologies 1.2.1 The Idea, and Defition, of Logical Equivalence In lay terms, two statements are logically equivalent

More information

Finding Stable Matchings That Are Robust to Errors in the Input

Finding Stable Matchings That Are Robust to Errors in the Input Fdg Stable Matchgs That Are Robust to Errors the Input Tung Mai Georgia Institute of Technology, Atlanta, GA, USA tung.mai@cc.gatech.edu Vijay V. Vazirani University of California, Irve, Irve, CA, USA

More information

Mechanical thermal expansion correction design for an ultrasonic flow meter

Mechanical thermal expansion correction design for an ultrasonic flow meter Mechanical thermal expansion correction design for an ultrasonic flow meter Emil Martson* and Jerker Delsg EISLAB, Dept. of Computer Science and Electrical Engeerg, Luleå University of Technology, SE-97

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

A Simple and Efficient Initialization Strategy for Optimizing Water-Using Network Designs

A Simple and Efficient Initialization Strategy for Optimizing Water-Using Network Designs Ind. Eng. Chem. Res. 2007, 46, 8781-8786 8781 A Simple and Efficient Initialization Strategy for Optimizg Water-Usg Network Designs Bao-Hong Li* Department of Chemical Engeerg, Dalian Nationalities UniVersity,

More information

Interpreting Low and High Order Rules: A Granular Computing Approach

Interpreting Low and High Order Rules: A Granular Computing Approach Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

5.2 Single-Stub Tuning

5.2 Single-Stub Tuning 3/26/29 5_2 Sgle_Stub Tung.doc 1/1 5.2 Sgle-Stub Tung Readg Assignment: pp. 228-235 Q: If we cannot use lumped elements like ductors or capacitors to build lossless matchg networks, what can we use? A:

More information

Electronics Lecture 8 AC circuit analysis using phasors

Electronics Lecture 8 AC circuit analysis using phasors Electronics Lecture 8 A circuit analysis usg phasors 8. Introduction The preious lecture discussed the transient response of an circuit to a step oltage by switchg a battery. This lecture will estigate

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

A new Approach to Drawing Conclusions from Data A Rough Set Perspective Motto: Let the data speak for themselves R.A. Fisher A new Approach to Drawing Conclusions from Data A Rough et Perspective Zdzisław Pawlak Institute for Theoretical and Applied Informatics Polish Academy

More information

Value Range Propagation

Value Range Propagation Value Range Propagation LLVM Adam Wiggs Patryk Zadarnowski {awiggs,patrykz}@cse.unsw.edu.au 17 June 2003 University of New South Wales Sydney MOTIVATION The Problem: X Layout of basic blocks LLVM is bra-dead.

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Concept Lattices in Rough Set Theory

Concept Lattices in Rough Set Theory Concept Lattices in Rough Set Theory Y.Y. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca URL: http://www.cs.uregina/ yyao Abstract

More information

Autonomous Control of Production Networks using a Pheromone Approach

Autonomous Control of Production Networks using a Pheromone Approach Autonomous Control of Production Networks usg a Pheromone Approach D. Armbruster, C. de Beer, M. Freitag, T. Jagalski, C. Rghofer 3 Abstract To manage the creasg dynamics with complex production networks,

More information

A Probabilistic Language based upon Sampling Functions

A Probabilistic Language based upon Sampling Functions A Probabilistic Language based upon Samplg Functions Sungwoo Park Frank Pfenng Computer Science Department Carnegie Mellon University {gla,fp}@cs.cmu.com Sebastian Thrun Computer Science Department Stanford

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo

More information

Interface for module Ptset

Interface for module Ptset Interface for module Ptset 1 1 Interface for module Ptset 1. Sets of tegers implemented as Patricia trees. The followg signature is exactly Set.S with type elt = t, with the same specifications. This is

More information

Neural Networks. ICS 273A UC Irvine Instructor: Max Welling

Neural Networks. ICS 273A UC Irvine Instructor: Max Welling Neural Networks ICS 273A UC Irve Instructor: Max Wellg Neurons 1 b Neurons communicate by receivg signals on their dendrites. Addg these signals and firg off a new signal along the axon if the total put

More information

Drawing Conclusions from Data The Rough Set Way

Drawing Conclusions from Data The Rough Set Way Drawing Conclusions from Data The Rough et Way Zdzisław Pawlak Institute of Theoretical and Applied Informatics, Polish Academy of ciences, ul Bałtycka 5, 44 000 Gliwice, Poland In the rough set theory

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

MATHEMATICAL IN COMPUTER SCIENCE

MATHEMATICAL IN COMPUTER SCIENCE MATHEMATICAL IN COMPUTER SCIENCE Hieu Vu, Ph.D. American University of Nigeria 98 LamidoZubairu Way Yola by Pass, P.M.B. 2250, Yola, Adamawa State, Nigeria Abstract Mathematics has been known for provg

More information

A Functional Perspective

A Functional Perspective A Functional Perspective on SSA Optimization Algorithms Patryk Zadarnowski jotly with Manuel M. T. Chakravarty Gabriele Keller 19 April 2004 University of New South Wales Sydney THE PLAN ➀ Motivation an

More information

A Factorization Approach To Evaluating Simultaneous Influence Diagrams

A Factorization Approach To Evaluating Simultaneous Influence Diagrams JOURNAL OF L A TEX CLASS FILES, VOL., NO., NOVEMBER 00 A Factorization Approach To Evaluatg Simultaneous Influence Diagrams Weihong Zhang and Qiang Ji Senior Member, IEEE Abstract Evaluatg an fluence diagram

More information

CONVERGENCE OF FOURIER SERIES

CONVERGENCE OF FOURIER SERIES CONVERGENCE OF FOURIER SERIES SOPHIA UE Abstract. The subject of Fourier analysis starts as physicist and mathematician Joseph Fourier s conviction that an arbitrary function f could be given as a series.

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA

DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA DATA MINING WITH DIFFERENT TYPES OF X-RAY DATA 315 C. K. Lowe-Ma, A. E. Chen, D. Scholl Physical & Environmental Sciences, Research and Advanced Engineering Ford Motor Company, Dearborn, Michigan, USA

More information

Thermodynamics [ENGR 251] [Lyes KADEM 2007]

Thermodynamics [ENGR 251] [Lyes KADEM 2007] CHAPTER V The first law of thermodynamics is a representation of the conservation of energy. It is a necessary, but not a sufficient, condition for a process to occur. Indeed, no restriction is imposed

More information

Show that the following problems are NP-complete

Show that the following problems are NP-complete Show that the following problems are NP-complete April 7, 2018 Below is a list of 30 exercises in which you are asked to prove that some problem is NP-complete. The goal is to better understand the theory

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

ARPN Journal of Science and Technology All rights reserved.

ARPN Journal of Science and Technology All rights reserved. Rule Induction Based On Boundary Region Partition Reduction with Stards Comparisons Du Weifeng Min Xiao School of Mathematics Physics Information Engineering Jiaxing University Jiaxing 34 China ABSTRACT

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Mining Approximative Descriptions of Sets Using Rough Sets

Mining Approximative Descriptions of Sets Using Rough Sets Mining Approximative Descriptions of Sets Using Rough Sets Dan A. Simovici University of Massachusetts Boston, Dept. of Computer Science, 100 Morrissey Blvd. Boston, Massachusetts, 02125 USA dsim@cs.umb.edu

More information

Inflow and Outflow Signatures in. Flowing Wellbore Electrical-Conductivity Logs. Abstract

Inflow and Outflow Signatures in. Flowing Wellbore Electrical-Conductivity Logs. Abstract LBNL-51468 Inflow and Outflow Signatures Flowg Wellbore Electrical-onductivity Logs hriste Doughty and h-fu Tsang Earth Sciences Division E.O. Lawrence Berkeley National Laboratory University of alifornia

More information

POTENTIAL TURBULENCE MODEL PREDICTIONS OF FLOW PAST A TRIANGULAR CYLINDER USING AN UNSTRUCTURED STAGGERED MESH METHOD

POTENTIAL TURBULENCE MODEL PREDICTIONS OF FLOW PAST A TRIANGULAR CYLINDER USING AN UNSTRUCTURED STAGGERED MESH METHOD POTENTIAL TURBULENCE MODEL PREDICTIONS OF FLOW PAST A TRIANGULAR CYLINDER USING AN UNSTRUCTURED STAGGERED MESH METHOD Xg Zhang Blair Perot Department of Mechanical and Industrial Engeerg, University of

More information

Similarity-based Classification with Dominance-based Decision Rules

Similarity-based Classification with Dominance-based Decision Rules Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,

More information

Decision tables and decision spaces

Decision tables and decision spaces Abstract Decision tables and decision spaces Z. Pawlak 1 Abstract. In this paper an Euclidean space, called a decision space is associated with ever decision table. This can be viewed as a generalization

More information

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING

ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING ROUGH SETS THEORY AND DATA REDUCTION IN INFORMATION SYSTEMS AND DATA MINING Mofreh Hogo, Miroslav Šnorek CTU in Prague, Departement Of Computer Sciences And Engineering Karlovo Náměstí 13, 121 35 Prague

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

"Critical Experiment"

Critical Experiment TECHNICAL UNIVERSITY DRESDEN Institute of Power Engeerg Trag Reactor Reactor Trag Course Experiment "Critical Experiment" Instruction for Experiment Critical Experiment Content: 1... Motivation 2... Tasks

More information

Issues in Modeling for Data Mining

Issues in Modeling for Data Mining Issues in Modeling for Data Mining Tsau Young (T.Y.) Lin Department of Mathematics and Computer Science San Jose State University San Jose, CA 95192 tylin@cs.sjsu.edu ABSTRACT Modeling in data mining has

More information

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51

2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each

More information

UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM

UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM Int. J. Appl. Math. Comput. Sci., 2002, Vol.12, No.3, 359 369 UPPER AND LOWER SET FORMULAS: RESTRICTION AND MODIFICATION OF THE DEMPSTER-PAWLAK FORMALISM ISMAIL BURHAN TÜRKŞEN Knowledge/Intelligence Systems

More information

Uncertainty Analysis of the Temperature Resistance Relationship of Temperature Sensing Fabric

Uncertainty Analysis of the Temperature Resistance Relationship of Temperature Sensing Fabric fibers Article Uncertaty Analysis Temperature Resistance Relationship Temperature Sensg Fabric Muhammad Dawood Husa 1, Ozgur Atalay 2,3, *, Asli Atalay 3,4 Richard Kennon 5 1 Textile Engeerg Department,

More information

Uncertain Fuzzy Rough Sets. LI Yong-jin 1 2

Uncertain Fuzzy Rough Sets. LI Yong-jin 1 2 Uncertain Fuzzy Rough Sets LI Yong-jin 1 2 (1. The Institute of Logic and Cognition, Zhongshan University, Guangzhou 510275, China; 2. Department of Mathematics, Zhongshan University, Guangzhou 510275,

More information

Banacha Warszawa Poland s:

Banacha Warszawa Poland  s: Chapter 12 Rough Sets and Rough Logic: A KDD Perspective Zdzis law Pawlak 1, Lech Polkowski 2, and Andrzej Skowron 3 1 Institute of Theoretical and Applied Informatics Polish Academy of Sciences Ba ltycka

More information

APPLICATION FOR LOGICAL EXPRESSION PROCESSING

APPLICATION FOR LOGICAL EXPRESSION PROCESSING APPLICATION FOR LOGICAL EXPRESSION PROCESSING Marcin Michalak, Michał Dubiel, Jolanta Urbanek Institute of Informatics, Silesian University of Technology, Gliwice, Poland Marcin.Michalak@polsl.pl ABSTRACT

More information

Achievement of Course Outcomes in Basic Thermodynamics Course based on Students Perception

Achievement of Course Outcomes in Basic Thermodynamics Course based on Students Perception Achievement of Course Outcomes Basic Thermodynamics Course based on Students Perception SITI ROZAIMAH SHEIKH ABDULLAH,, MOHD SHAHBUDDIN MASTAR & HASSIMI ABU HASSAN Centre for Engeerg Education Research,

More information

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a)

c(a) = X c(a! Ø) (13.1) c(a! Ø) ˆP(A! Ø A) = c(a) Chapter 13 Statistical Parsg Given a corpus of trees, it is easy to extract a CFG and estimate its parameters. Every tree can be thought of as a CFG derivation, and we just perform relative frequency estimation

More information

Research Article Special Approach to Near Set Theory

Research Article Special Approach to Near Set Theory Mathematical Problems in Engineering Volume 2011, Article ID 168501, 10 pages doi:10.1155/2011/168501 Research Article Special Approach to Near Set Theory M. E. Abd El-Monsef, 1 H. M. Abu-Donia, 2 and

More information

Averaging of the inelastic cross-section measured by the CDF and the E811 experiments.

Averaging of the inelastic cross-section measured by the CDF and the E811 experiments. Averagg of the astic cross-section measured by the CDF and the E8 experiments. S. Klimenko, J. Konigsberg, T. Liss. Introduction In un II the Tevatron lumosity is measured usg the system of Cherenkov Lumosity

More information

CS570 Introduction to Data Mining

CS570 Introduction to Data Mining CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data cleaning

More information

Virtual Control Policy for Binary Ordered Resources Petri Net Class

Virtual Control Policy for Binary Ordered Resources Petri Net Class sensors Article Virtual Control Policy Bary Ordered Resources Petri Net Class Carlos A. Rovet, Tomás J. Concepción Elia Esr Cano Computer Systems Engeerg Department, Technological University Panama, 0819-07289,

More information

A Working Distance Formula for Night Vision Devices Quality Preliminary Information *

A Working Distance Formula for Night Vision Devices Quality Preliminary Information * BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 6 No 3 Sofia 2006 A Workg Distance Formula for Night Vision Devices Quality Prelimary Information * Daniela Borissova Ivan

More information

A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties

A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties A Patent Document Retrieval System Addressg Both Semantic and Syntactic Properties Liang Chen Naoyuki Tokuda Hisahiro Adachi Computer Science Department University of Northern British Columbia Prce George,

More information

TURBULENT VORTEX SHEDDING FROM TRIANGLE CYLINDER USING THE TURBULENT BODY FORCE POTENTIAL MODEL

TURBULENT VORTEX SHEDDING FROM TRIANGLE CYLINDER USING THE TURBULENT BODY FORCE POTENTIAL MODEL Proceedgs of ASME FEDSM ASME 2 Fluids Engeerg Division Summer Meetg June 11-15, 2 Boston, Massachusetts FEDSM2-11172 TURBULENT VORTEX SHEDDING FROM TRIANGLE CYLINDER USING THE TURBULENT BODY FORCE POTENTIAL

More information

2.4 The Smith Chart. Reading Assignment: pp The Smith Chart. The Smith Chart provides: The most important fact about the Smith Chart is:

2.4 The Smith Chart. Reading Assignment: pp The Smith Chart. The Smith Chart provides: The most important fact about the Smith Chart is: 2/7/2005 2_4 The Smith Chart 1/2 2.4 The Smith Chart Readg Assignment: pp. 64-73 The Smith Chart The Smith Chart provides: 1) 2) The most important fact about the Smith Chart is: HO: The Complex Γ plane

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

The Lefthanded Local Lemma characterizes chordal dependency graphs

The Lefthanded Local Lemma characterizes chordal dependency graphs The Lefthanded Local Lemma characterizes chordal dependency graphs Wesley Pegden March 30, 2012 Abstract Shearer gave a general theorem characterizing the family L of dependency graphs labeled with probabilities

More information

MATHEMATICS OF DATA FUSION

MATHEMATICS OF DATA FUSION MATHEMATICS OF DATA FUSION by I. R. GOODMAN NCCOSC RDTE DTV, San Diego, California, U.S.A. RONALD P. S. MAHLER Lockheed Martin Tactical Defences Systems, Saint Paul, Minnesota, U.S.A. and HUNG T. NGUYEN

More information

Interaction Analysis of Spatial Point Patterns

Interaction Analysis of Spatial Point Patterns Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

2.4 Parsing. Computer Science 332. Compiler Construction. Chapter 2: A Simple One-Pass Compiler : Parsing. Top-Down Parsing

2.4 Parsing. Computer Science 332. Compiler Construction. Chapter 2: A Simple One-Pass Compiler : Parsing. Top-Down Parsing Computer Science 332 Compiler Construction Chapter 2: A Simple One-Pass Compiler 2.4-2.5: Parsg 2.4 Parsg Parsg : the process of determg whether a strg S is generated by a grammar G Short answer is yes/no

More information

Inderjit Dhillon The University of Texas at Austin

Inderjit Dhillon The University of Texas at Austin Inderjit Dhillon The University of Texas at Austin ( Universidad Carlos III de Madrid; 15 th June, 2012) (Based on joint work with J. Brickell, S. Sra, J. Tropp) Introduction 2 / 29 Notion of distance

More information

Lecture 12 : Graph Laplacians and Cheeger s Inequality

Lecture 12 : Graph Laplacians and Cheeger s Inequality CPS290: Algorithmic Foundations of Data Science March 7, 2017 Lecture 12 : Graph Laplacians and Cheeger s Inequality Lecturer: Kamesh Munagala Scribe: Kamesh Munagala Graph Laplacian Maybe the most beautiful

More information

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning

Comparison of Rough-set and Interval-set Models for Uncertain Reasoning Yao, Y.Y. and Li, X. Comparison of rough-set and interval-set models for uncertain reasoning Fundamenta Informaticae, Vol. 27, No. 2-3, pp. 289-298, 1996. Comparison of Rough-set and Interval-set Models

More information

Data Mining 4. Cluster Analysis

Data Mining 4. Cluster Analysis Data Mining 4. Cluster Analysis 4.2 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Lecture notes 1: ECEN 489

Lecture notes 1: ECEN 489 Lecture notes : ECEN 489 Power Management Circuits and Systems Department of Electrical & Computer Engeerg Texas A&M University Jose Silva-Martez January 207 Copyright Texas A&M University. All rights

More information

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS

STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS STATISTICS 407 METHODS OF MULTIVARIATE ANALYSIS TOPICS Principal Component Analysis (PCA): Reduce the, summarize the sources of variation in the data, transform the data into a new data set where the variables

More information

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18 Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

More information

Branch-and-Cut for the Split Delivery Vehicle Routing Problem with Time Windows

Branch-and-Cut for the Split Delivery Vehicle Routing Problem with Time Windows Gutenberg School of Management and Economics & Research Unit Interdisciplary Public Policy Discussion Paper Series Branch-and-Cut for the Split Delivery Vehicle Routg Problem with Time Wdows Nicola Bianchessi

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr

Type of data Interval-scaled variables: Binary variables: Nominal, ordinal, and ratio variables: Variables of mixed types: Attribute Types Nominal: Pr Foundation of Data Mining i Topic: Data CMSC 49D/69D CSEE Department, e t, UMBC Some of the slides used in this presentation are prepared by Jiawei Han and Micheline Kamber Data Data types Quality of data

More information

i jand Y U. Let a relation R U U be an

i jand Y U. Let a relation R U U be an Dependency Through xiomatic pproach On Rough Set Theory Nilaratna Kalia Deptt. Of Mathematics and Computer Science Upendra Nath College, Nalagaja PIN: 757073, Mayurbhanj, Orissa India bstract: The idea

More information

A Logical Formulation of the Granular Data Model

A Logical Formulation of the Granular Data Model 2008 IEEE International Conference on Data Mining Workshops A Logical Formulation of the Granular Data Model Tuan-Fang Fan Department of Computer Science and Information Engineering National Penghu University

More information

arxiv: v1 [cs.ai] 7 Sep 2016

arxiv: v1 [cs.ai] 7 Sep 2016 Equilibrium Graphs Pedro Cabalar, Carlos Pérez, and Gilberto Pérez Department of Computer Science University of Corunna, Spa {cabalar,c.pramil,gperez}@udc.es arxiv:1609.02010v1 [cs.ai] 7 Sep 2016 Abstract.

More information

Extended breadth-first search algorithm in practice

Extended breadth-first search algorithm in practice Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 59 66 doi: 10.14794/ICAI.9.2014.1.59 Extended breadth-first search algorithm

More information

Hierarchical Clustering via Spreading Metrics

Hierarchical Clustering via Spreading Metrics Journal of Machine Learning Research 18 2017) 1-35 Submitted 2/17; Revised 5/17; Published 8/17 Hierarchical Clustering via Spreading Metrics Aurko Roy College of Computing Georgia Institute of Technology

More information

Rough operations on Boolean algebras

Rough operations on Boolean algebras Rough operations on Boolean algebras Guilin Qi and Weiru Liu School of Computer Science, Queen s University Belfast Belfast, BT7 1NN, UK Abstract In this paper, we introduce two pairs of rough operations

More information

High Frequency Rough Set Model based on Database Systems

High Frequency Rough Set Model based on Database Systems High Frequency Rough Set Model based on Database Systems Kartik Vaithyanathan kvaithya@gmail.com T.Y.Lin Department of Computer Science San Jose State University San Jose, CA 94403, USA tylin@cs.sjsu.edu

More information

3) Aft bolted connection analysis: (See Figure 1.0)

3) Aft bolted connection analysis: (See Figure 1.0) Given: Both static and dynamic (fatigue) failure criteria will be used. A mimum factor of safety =2 will be adhered to. For fatigue analysis the ASME elliptic model with Von Mises equivalent stress will

More information

Index. C, system, 8 Cech distance, 549

Index. C, system, 8 Cech distance, 549 Index PF(A), 391 α-lower approximation, 340 α-lower bound, 339 α-reduct, 109 α-upper approximation, 340 α-upper bound, 339 δ-neighborhood consistent, 291 ε-approach nearness, 558 C, 443-2 system, 8 Cech

More information

The three-dimensional matching problem in Kalmanson matrices

The three-dimensional matching problem in Kalmanson matrices DOI 10.1007/s10878-011-9426-y The three-dimensional matching problem in Kalmanson matrices Sergey Polyakovskiy Frits C.R. Spieksma Gerhard J. Woeginger The Author(s) 2011. This article is published with

More information

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering

More information