A Methodology for Direct and Indirect Discrimination Prevention in Data Mining
|
|
- Sheena Rich
- 6 years ago
- Views:
Transcription
1 A Methodology for Direct and Indirect Discrimination Prevention in Data Mining Sara Hajian and Josep Domingo-Ferrer IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013 Presented by Polina Rozenshtein
2 Outline Problem addressed Direct and indirect discrimination Background, definitions and measures Approach proposed Discrimination Measurement Data Transformation Algorithms and running time Experimental results
3 Problem Discrimination: direct or indirect. Direct discrimination: decisions are made based on sensitive attributes. Indirect discrimination (redlining): decisions are made based on nonsensitive attributes which are strongly correlated with biased sensitive ones. Decision rules
4 Definitions Dataset collection of records Item - attribute with its value, e.g., Race = black item set - collection of items X: {Foreign worker = Yes; City = NYC} classification rule - X C {yes/no} {Foreign worker = Yes; City = NYC} Hire = no
5 Definitions support, supp(x) - fraction of records that contain X confidence, conf X C - how often C appears in records that contain X conf X C = supp(x,c) supp(x) frequent classification rule: supp X, C > s conf X C > c negated item set: X = {Foreign worker = Yes} X = {Foreign worker = No}
6 Classification rules DI s - predetermined discriminatory items DI s = {Foreign worker = Yes; Race = Black} X C - potentially discriminatory (PD) X = A, B with A DI s, B DI s {Foreign worker = Yes; City = NYC} Hire = No X C - potentially nondiscriminatory (PND) X = D, B with D DI s, B DI s {Zip = 10451; City = NYC} Hire = No
7 Direct Discrimination Measure extended lift (elift): elift A, B C = conf(a,b C) conf(b C) A DI S A, B C is α-protective, if and elift A, B C < α A, B C is α-discriminatory, if elift A, B C α
8 Indirect Discrimination Measure Theorem: Let r: D, B C is PND; γ = conf(r: D, B C) and δ = conf B C > 0 A DI s, conf r b1 : A, B D β 1, conf r b2 : D, B A β 2 > 0 f x = β 1 β 2 β 2 + x 1 elb x, y = f(x) y, if f x > 0 0, otherwise Then if elb γ, δ α, then PD r : A, B C is α-discriminatory
9 Indirect Discrimination or not A PND rule r: D, B C is a redlining rule, if it could yield αdiscriminatory rule r 0 : A, B C available knowledge rules r b1 : A, B D and r b2 : D, B A With A DI s {Zip = 10451; City = NYC} Hire = No. A PND rule r: D, B C is a nonredlining rule, if it cannot yield α-discriminatory rule r 0 : A, B C available rules r b1 : A, B D and r b2 : D, B A and A DI s {Experience = Low; City = NYC} Hire = No.
10 The Approach Discrimination measurement: Find PD and PND Direct discrimination: In PD find α-discriminatory by elif() Indirect discrimination: In PND find redlining by elb() + background knowledge Data transformation: Alter dataset and remove discriminatory biases Minimum impact on data and legitimate rules
11 Direct rules protection A ID S, Wish elif r : A, B C > α conf(a,b C) conf(b C) < α Decrease conf A, B C = supp(a,b,c) supp(a,b) Decrease conf(a, B C) by increasing supp(a, B)! A, B C A, B C supp A, B, C remains the same
12 Direct rules protection 2 Wish elif r : A, B C > α conf(a,b C) conf(b C) < α Increase conf B C = supp(b,c) supp(b) Increase supp B, C! A, B C A, B C supp B remains the same
13 Direct rules generalization PD: {Foreign worker = Yes; City = NYC} Hire = No. PND: {Experience = Low; City = NYC} Hire = No. If conf r: D, B C conf r : A, B C, and conf A, B D = 1 then PD rule r : A, B C is an instance of a PND rule r: D, B C
14 Direct rules generalization PD: {Foreign worker = Yes; City = NYC} Hire = No. PND: {Experience = Low; City = NYC} Hire = No. 1) If conf r: D, B C p conf r : A, B C, 2) and conf A, B D p then PD rule r : A, B C is an p-instance of a PND rule r: D, B C Change α-discriminatory to be p-instance of some PND rule r: D, B C
15 Direct rules generalization Condition 2 is satisfied, but Condition 1 is not: Wish conf r: D, B C p conf r : A, B C Decrease conf r : A, B C, preserve conf A, B D A, B, D C A, B, D C Condition 1 is satisfied, but Condition 2 is not: Wish conf A, B D p Increase conf A, B D, preserve conf r: D, B C p conf r : A, B C Impossible
16 Direct rules generalization Use generalization when possible to increase number of PND Use generalization when at least Condition 2 is satisfied After generalization is done, use methods for direct protection Try to perform minimum transformation
17 Indirect Rule Protection The same strategy as for Directed Rule Protection: Wish elb conf r: D, B C, conf B C > α conf r b1 :A,B D conf r b2 :D,B A conf r b2:d,b A +conf r:d,b C 1 conf(b C) < α Method 1: Decrease conf A, B D A, B, D C A, B, D C Method 2: Increase conf B C A, B, D C A, B, D C
18 Simultaneous direct and indirect discrimination prevention Method 1 Method 2 Direct Rule Protection A, B C A, B C A, B C A, B C Indirect Rule Protection A, B, D C A, B, D C A, B, D C A, B, D C Lemma 1. Method 1 for DRP cannot be used for simultaneous DRP and IRP Method 1 for DRP might undo the protection provided by Method 1 for IRP
19 Simultaneous direct and indirect discrimination prevention Method 1 Method 2 Direct Rule Protection A, B C A, B C A, B C A, B C Indirect Rule Protection A, B, D C A, B, D C A, B, D C A, B, D C Lemma 2. Method 2 for IRP is beneficial for Method 2 for DRP. Method 2 for DRP is at worst neutral for Method 2 for IRP. Method 2 for DRP and Method 2 for IRP both increase conf(b C).
20 Simultaneous direct and indirect discrimination prevention Transform PD to PND when possible Run Method 2 for IRP for PND and Method 2 for DRP for the rest PD.
21 Algorithms DB database FR frequent rules MR direct discriminative rules DI s discriminative item set
22 Computational Cost m - the number of records in DB k - number of rules in FR h - number of records in subset DB c n - the number of discriminatory rules in MR O(m) to get DB c O(kh) to get impact(db c ) for all db c DB c O(h log h ) for sorting O(dm) for modification O(n (m + kh + h log h + dm))
23 Experiments German credit data set and adult data set. Direct discrimination prevention degree (DDPD): percentage of α-discriminatory rules that are no longer αdiscriminatory Direct discrimination protection preservation (DDPP): percentage of α-protective rules that remain α- protective IDPD and IDPP the same for redlining rules
24 German credit data set Min support 5%, min confidence 10% frequent classification rules background knowledge rules 37 redlining rules, 42 indirect and 991 direct discriminations
25 Information loss Misses cost (MC): percentage of lost rules Ghost cost (GC): percentage of introduced rules
26 Conclusions Considers frequent classification rule mining Defines direct and indirect discrimination Propose measures of discrimination Propose methods to modify dataset to avoid discrimination Meaningful qualitative results
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.
Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D. Sánchez 18th September 2013 Detecting Anom and Exc Behaviour on
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationAssignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran 1. Let X, Y be two itemsets, and let denote the support of itemset X. Then the confidence of the rule X Y,
More informationCS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014
CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.
More informationCse537 Ar*fficial Intelligence Short Review 1 for Midterm 2. Professor Anita Wasilewska Computer Science Department Stony Brook University
Cse537 Ar*fficial Intelligence Short Review 1 for Midterm 2 Professor Anita Wasilewska Computer Science Department Stony Brook University Data Mining Process Ques*ons: Describe and discuss all stages of
More informationAssociation Analysis. Part 1
Association Analysis Part 1 1 Market-basket analysis DATA: A large set of items: e.g., products sold in a supermarket A large set of baskets: e.g., each basket represents what a customer bought in one
More informationEffective Elimination of Redundant Association Rules
Effective Elimination of Redundant Association Rules James Cheng Yiping Ke Wilfred Ng Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay,
More informationFUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH
FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH M. De Cock C. Cornelis E. E. Kerre Dept. of Applied Mathematics and Computer Science Ghent University, Krijgslaan 281 (S9), B-9000 Gent, Belgium phone: +32
More informationCSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.
22 February 2007 CSE-4412(M) Midterm p. 1 of 12 CSE-4412(M) Midterm Sur / Last Name: Given / First Name: Student ID: Instructor: Parke Godfrey Exam Duration: 75 minutes Term: Winter 2007 Answer the following
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 2017 Road map The Apriori algorithm Step 1: Mining all frequent
More informationData Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29
Data Mining and Knowledge Discovery Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2011/11/29 1 Practice plan 2011/11/08: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate test set,
More informationPrivacy-preserving Data Mining
Privacy-preserving Data Mining What is [data] privacy? Privacy and Data Mining Privacy-preserving Data mining: main approaches Anonymization Obfuscation Cryptographic hiding Challenges Definition of privacy
More information1 Frequent Pattern Mining
Decision Support Systems MEIC - Alameda 2010/2011 Homework #5 Due date: 31.Oct.2011 1 Frequent Pattern Mining 1. The Apriori algorithm uses prior knowledge about subset support properties. In particular,
More informationData Analytics Beyond OLAP. Prof. Yanlei Diao
Data Analytics Beyond OLAP Prof. Yanlei Diao OPERATIONAL DBs DB 1 DB 2 DB 3 EXTRACT TRANSFORM LOAD (ETL) METADATA STORE DATA WAREHOUSE SUPPORTS OLAP DATA MINING INTERACTIVE DATA EXPLORATION Overview of
More informationarxiv: v1 [cs.db] 26 Oct 2016
Measuring airness in Ranked Outputs Ke Yang Drexel University ky323@drexel.edu Julia Stoyanovich Drexel University stoyanovich@drexel.edu arxiv:1610.08559v1 [cs.db] 26 Oct 2016 ABSTRACT Ranking and scoring
More information732A61/TDDD41 Data Mining - Clustering and Association Analysis
732A61/TDDD41 Data Mining - Clustering and Association Analysis Lecture 6: Association Analysis I Jose M. Peña IDA, Linköping University, Sweden 1/14 Outline Content Association Rules Frequent Itemsets
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationAssociation Rules. Fundamentals
Politecnico di Torino Politecnico di Torino 1 Association rules Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket counter Association rule
More informationData Mining of Medical Data: Opportunities and Challenges
1 Data Mining of Medical Data: Opportunities and Challenges Dan A. Simovici IALS Cecilienhof Potsdam Brandemburg, Germany UMB 2 Data Mining Processes Mining Tabular Data AR and Nosocomial Infections Association
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.
Definitions Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Itemset is a set including one or more items Example: {Beer, Diapers} k-itemset is an itemset that contains k
More informationD B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example
Association rules Data Base and Data Mining Group of Politecnico di Torino Politecnico di Torino Objective extraction of frequent correlations or pattern from a transactional database Tickets at a supermarket
More informationData Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 9. Business Intelligence 9. Business Intelligence
More informationData Warehousing & Data Mining
9. Business Intelligence Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 9. Business Intelligence
More informationPrivacy Preserving Frequent Itemset Mining. Workshop on Privacy, Security, and Data Mining ICDM - Maebashi City, Japan December 9, 2002
Privacy Preserving Frequent Itemset Mining Stanley R. M. Oliveira 1,2 Osmar R. Zaïane 2 1 oliveira@cs.ualberta.ca zaiane@cs.ualberta.ca Embrapa Information Technology Database Systems Laboratory Andre
More informationP, NP, NP-Complete, and NPhard
P, NP, NP-Complete, and NPhard Problems Zhenjiang Li 21/09/2011 Outline Algorithm time complicity P and NP problems NP-Complete and NP-Hard problems Algorithm time complicity Outline What is this course
More informationParts 3-6 are EXAMPLES for cse634
1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.
More informationData Mining Part 4. Prediction
Data Mining Part 4. Prediction 4.3. Fall 2009 Instructor: Dr. Masoud Yaghini Outline Introduction Bayes Theorem Naïve References Introduction Bayesian classifiers A statistical classifiers Introduction
More information10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules
10/19/2017 MIST6060 Business Intelligence and Data Mining 1 Examples of Association Rules Association Rules Sixty percent of customers who buy sheets and pillowcases order a comforter next, followed by
More informationAssociation Analysis. Part 2
Association Analysis Part 2 1 Limitations of the Support/Confidence framework 1 Redundancy: many of the returned patterns may refer to the same piece of information 2 Difficult control of output size:
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationCHAPTER 7 FUNCTIONS. Alessandro Artale UniBZ - artale/
CHAPTER 7 FUNCTIONS Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/ SECTION 7.1 Functions Defined on General Sets Copyright Cengage Learning. All rights reserved. Functions Defined on General
More informationarxiv: v1 [cs.lg] 22 Nov 2016
A Causal Framework for Discovering and Removing Direct and Indirect Discrimination Lu Zhang, Yongkai Wu, and Xintao Wu University of Arkansas {lz006,yw009,xintaowu}@uark.edu arxiv:1611.07509v1 [cs.lg]
More informationData preprocessing. DataBase and Data Mining Group 1. Data set types. Tabular Data. Document Data. Transaction Data. Ordered Data
Elena Baralis and Tania Cerquitelli Politecnico di Torino Data set types Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures Ordered Spatial Data Temporal Data Sequential
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationData Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining
Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar 1 Types of data sets Record Tables Document Data Transaction Data Graph World Wide Web Molecular Structures
More informationAlgorithms for Classification: The Basic Methods
Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.
More informationTesting for Discrimination
Testing for Discrimination Spring 2010 Alicia Rosburg (ISU) Testing for Discrimination Spring 2010 1 / 40 Relevant Readings BFW Appendix 7A (pgs 250-255) Alicia Rosburg (ISU) Testing for Discrimination
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 6
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 6 Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign & Simon Fraser University 2013 Han, Kamber & Pei. All rights
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationRough Set Model Selection for Practical Decision Making
Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca
More informationUnit 1A: Computational Complexity
Unit 1A: Computational Complexity Course contents: Computational complexity NP-completeness Algorithmic Paradigms Readings Chapters 3, 4, and 5 Unit 1A 1 O: Upper Bounding Function Def: f(n)= O(g(n)) if
More information15 Introduction to Data Mining
15 Introduction to Data Mining 15.1 Introduction to principle methods 15.2 Mining association rule see also: A. Kemper, Chap. 17.4, Kifer et al.: chap 17.7 ff 15.1 Introduction "Discovery of useful, possibly
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationStatistical Privacy For Privacy Preserving Information Sharing
Statistical Privacy For Privacy Preserving Information Sharing Johannes Gehrke Cornell University http://www.cs.cornell.edu/johannes Joint work with: Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh
More information1 [15 points] Frequent Itemsets Generation With Map-Reduce
Data Mining Learning from Large Data Sets Final Exam Date: 15 August 2013 Time limit: 120 minutes Number of pages: 11 Maximum score: 100 points You can use the back of the pages if you run out of space.
More informationSTAT Section 2.1: Basic Inference. Basic Definitions
STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.
More informationData Mining Project. C4.5 Algorithm. Saber Salah. Naji Sami Abduljalil Abdulhak
Data Mining Project C4.5 Algorithm Saber Salah Naji Sami Abduljalil Abdulhak Decembre 9, 2010 1.0 Introduction Before start talking about C4.5 algorithm let s see first what is machine learning? Human
More informationUniversità di Pisa A.A Data Mining II June 13th, < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D} > t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
Università di Pisa A.A. 2016-2017 Data Mining II June 13th, 2017 Exercise 1 - Sequential patterns (6 points) a) (3 points) Given the following input sequence < {A} {B,F} {E} {A,B} {A,C,D} {F} {B,E} {C,D}
More informationThe Beauty and Joy of Computing
The Beauty and Joy of Computing Lecture #23 Limits of Computing UC Berkeley EECS Sr Lecturer SOE Dan You ll have the opportunity for extra credit on your project! After you submit it, you can make a 5min
More informationMeelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05
Meelis Kull meelis.kull@ut.ee Autumn 2017 1 Sample vs population Example task with red and black cards Statistical terminology Permutation test and hypergeometric test Histogram on a sample vs population
More informationChapter 4.5 Association Rules. CSCI 347, Data Mining
Chapter 4.5 Association Rules CSCI 347, Data Mining Mining Association Rules Can be highly computationally complex One method: Determine item sets Build rules from those item sets Vocabulary from before
More informationAlgorithms and Complexity Theory. Chapter 8: Introduction to Complexity. Computer Science - Durban - September 2005
Algorithms and Complexity Theory Chapter 8: Introduction to Complexity Jules-R Tapamo Computer Science - Durban - September 2005 Contents 1 Introduction 2 1.1 Dynamic programming...................................
More informationUnsupervised Data Discretization of Mixed Data Types
Unsupervised Data Discretization of Mixed Data Types Jee Vang Outline Introduction Background Objective Experimental Design Results Future Work 1 Introduction Many algorithms in data mining, machine learning,
More informationApproximate counting: count-min data structure. Problem definition
Approximate counting: count-min data structure G. Cormode and S. Muthukrishhan: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55 (2005) 58-75. Problem
More informationMining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies
Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies Juliano Brito da Justa Neves 1 Marina Teresa Pires Vieira {juliano,marina}@dc.ufscar.br Computer Science Department
More informationData Structures and Algorithms
Data Structures and Algorithms Spring 2017-2018 Outline 1 Sorting Algorithms (contd.) Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Analysis of Quicksort Time to sort array of length
More informationA Clear View on Quality Measures for Fuzzy Association Rules
A Clear View on Quality Measures for Fuzzy Association Rules Martine De Cock, Chris Cornelis, and Etienne E. Kerre Fuzziness and Uncertainty Modelling Research Unit Department of Applied Mathematics and
More informationStandardising the Lift of an Association Rule
Standardising the Lift of an Association Rule P.D. McNicholas a,1,, T.B. Murphy a,1, M. O Regan a a Department of Statistics, Trinity College Dublin, Ireland. Abstract The lift of an association rule is
More informationWhy Spatial Data Mining?
Intelligent Data Analysis for Spatial Data Mining Applications Wei Ding Knowledge Discovery Lab Department of Computer Science University of Massachusetts Boston Why Spatial Data Mining? Spatial Data mining
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Data & Data Preprocessing & Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Chapter
More informationDecision trees for stream data mining new results
Decision trees for stream data mining new results Leszek Rutkowski leszek.rutkowski@iisi.pcz.pl Lena Pietruczuk lena.pietruczuk@iisi.pcz.pl Maciej Jaworski maciej.jaworski@iisi.pcz.pl Piotr Duda piotr.duda@iisi.pcz.pl
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationCS5112: Algorithms and Data Structures for Applications
CS5112: Algorithms and Data Structures for Applications Lecture 19: Association rules Ramin Zabih Some content from: Wikipedia/Google image search; Harrington; J. Leskovec, A. Rajaraman, J. Ullman: Mining
More informationMachine Learning (CS 567) Lecture 2
Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationCS4445 B10 Homework 4 Part I Solution
CS4445 B10 Homework 4 Part I Solution Yutao Wang Consider the zoo.arff dataset converted to arff from the Zoo Data Set available at Univ. of California Irvine KDD Data Repository. 1. Load this dataset
More informationCse352 AI Homework 2 Solutions
1 Cse352 AI Homework 2 Solutions PART ONE Classification: Characteristic and Discriminant Rules Here are some DEFINITIONS from the Lecture Notes that YOU NEED for your Homework Definition 1 Given a classification
More informationA Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors
A Tiered Screen Protocol for the Discovery of Structurally Diverse HIV Integrase Inhibitors Rajarshi Guha, Debojyoti Dutta, Ting Chen and David J. Wild School of Informatics Indiana University and Dept.
More informationMultiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund
Multiprocessor Scheduling I: Partitioned Scheduling Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 22/23, June, 2015 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 47 Outline Introduction to Multiprocessor
More informationData Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary How to build a DW The DW Project:
More informationMachine Learning: Pattern Mining
Machine Learning: Pattern Mining Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Wintersemester 2007 / 2008 Pattern Mining Overview Itemsets Task Naive Algorithm Apriori Algorithm
More informationChapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining
Chapter 6. Frequent Pattern Mining: Concepts and Apriori Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining Pattern Discovery: Definition What are patterns? Patterns: A set of
More informationDecision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University
Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain
More informationSummary. 8.1 BI Overview. 8. Business Intelligence. 8.1 BI Overview. 8.1 BI Overview 12/17/ Business Intelligence
Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de How to build a DW The DW Project:
More informationVariables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010
Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010 Review Recording observations - Must extract that which is to be analyzed: coding systems,
More informationData Mining. Chapter 1. What s it all about?
Data Mining Chapter 1. What s it all about? 1 DM & ML Ubiquitous computing environment Excessive amount of data (data flooding) Gap between the generation of data and their understanding Looking for structural
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationMN 400: Research Methods. CHAPTER 7 Sample Design
MN 400: Research Methods CHAPTER 7 Sample Design 1 Some fundamental terminology Population the entire group of objects about which information is wanted Unit, object any individual member of the population
More informationThe Beauty and Joy of Computing
The Beauty and Joy of Computing Lecture #23 Limits of Computing UC Berkeley EECS Sr Lecturer SOE Dan Researchers at CMU have built a system which searches the Web for images constantly and tries to decide
More informationReport on Differential Privacy
Report on Differential Privacy Lembit Valgma Supervised by Vesal Vojdani December 19, 2017 1 Introduction Over the past decade the collection and analysis of personal data has increased a lot. This has
More informationReal-Time Course. Transaction based temporal model for Real-time databases
Real-Time Course Transaction based temporal model for Real-time databases 1 Real-time data Data used in classical administration system bank account -> represent status of constant real-world Data used
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Mining Frequent Patterns and Associations: Basic Concepts (Chapter 6) Huan Sun, CSE@The Ohio State University Slides adapted from Prof. Jiawei Han @UIUC, Prof. Srinivasan
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationMining Molecular Fragments: Finding Relevant Substructures of Molecules
Mining Molecular Fragments: Finding Relevant Substructures of Molecules Christian Borgelt, Michael R. Berthold Proc. IEEE International Conference on Data Mining, 2002. ICDM 2002. Lecturers: Carlo Cagli
More informationAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics
More informationPossibilities of third parties in real estate management in the light of the INSPIRE Directive
Possibilities of third parties in real estate management in the light of the INSPIRE Directive Faculty of Mining Surveying and Environmental Engineering Department of Geomatics Barcelona, Spain, 29 September
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationKey words. free Boolean algebra, measure, Bonferroni-type inquality, exclusion-inclusion, missing
AN INCLUSION-EXCLUSION RESULT FOR BOOLEAN POLYNOMIALS AND ITS APPLICATIONS IN DATA MINING SZYMON JAROSZEWICZ, DAN A. SIMOVICI, AND IVO ROSENBERG Abstract. We characterize measures on free Boolean algebras
More informationDistributed Consensus
Distributed Consensus Reaching agreement is a fundamental problem in distributed computing. Some examples are Leader election / Mutual Exclusion Commit or Abort in distributed transactions Reaching agreement
More informationMining Positive and Negative Fuzzy Association Rules
Mining Positive and Negative Fuzzy Association Rules Peng Yan 1, Guoqing Chen 1, Chris Cornelis 2, Martine De Cock 2, and Etienne Kerre 2 1 School of Economics and Management, Tsinghua University, Beijing
More informationTHE IMPACT ON SCALING ON THE PAIR-WISE COMPARISON OF THE ANALYTIC HIERARCHY PROCESS
ISAHP 200, Berne, Switzerland, August 2-4, 200 THE IMPACT ON SCALING ON THE PAIR-WISE COMPARISON OF THE ANALYTIC HIERARCHY PROCESS Yuji Sato Department of Policy Science, Matsusaka University 846, Kubo,
More informationAnomaly Detection for the CERN Large Hadron Collider injection magnets
Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing
More informationMining Infrequent Patter ns
Mining Infrequent Patter ns JOHAN BJARNLE (JOHBJ551) PETER ZHU (PETZH912) LINKÖPING UNIVERSITY, 2009 TNM033 DATA MINING Contents 1 Introduction... 2 2 Techniques... 3 2.1 Negative Patterns... 3 2.2 Negative
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationTest and Evaluation of an Electronic Database Selection Expert System
282 Test and Evaluation of an Electronic Database Selection Expert System Introduction As the number of electronic bibliographic databases available continues to increase, library users are confronted
More information