Data Mining. Chapter 1. What s it all about?

Similar documents
Tools of AI. Marcin Sydow. Summary. Machine Learning

Decision Support. Dr. Johan Hagelbäck.

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Algorithms for Classification: The Basic Methods

The popular table. Table (relation) Example. Table represents a sample from a larger population Attribute

Administrative notes. Computational Thinking ct.cs.ubc.ca

The Solution to Assignment 6

Administrative notes February 27, 2018

Chapter 4.5 Association Rules. CSCI 347, Data Mining

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Decision Trees. Gavin Brown

Bayesian Classification. Bayesian Classification: Why?

Learning Decision Trees

Symbolic methods in TC: Decision Trees

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

Decision Trees. Tirgul 5

Machine Learning Alternatives to Manual Knowledge Acquisition

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Classification Using Decision Trees

Classification: Decision Trees

Learning Classification Trees. Sargur Srihari

Rule Generation using Decision Trees

Decision Tree Learning and Inductive Inference

Chapter 7 Forecasting Demand

Unsupervised Learning. k-means Algorithm

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Learning Decision Trees

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

About Nnergix +2, More than 2,5 GW forecasted. Forecasting in 5 countries. 4 predictive technologies. More than power facilities

Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

Slides for Data Mining by I. H. Witten and E. Frank

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Symbolic methods in TC: Decision Trees


Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Induction on Decision Trees

Artificial Intelligence. Topic

Short Term Load Forecasting Using Multi Layer Perceptron

Integrated Electricity Demand and Price Forecasting

Mining Classification Knowledge

Decision Tree Learning

Dan Roth 461C, 3401 Walnut

Decision Trees / NLP Introduction

MeteoGroup RoadMaster. The world s leading winter road weather solution

Machine Learning Chapter 4. Algorithms

Decision Trees Part 1. Rao Vemuri University of California, Davis

Mining Classification Knowledge

Data Mining Part 4. Prediction

Chapter 3: Decision Tree Learning

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Lazy Rule Learning Nikolaus Korfhage

Data Mining and Machine Learning

The Naïve Bayes Classifier. Machine Learning Fall 2017

Abduction in Classification Tasks

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

WEATHER NORMALIZATION METHODS AND ISSUES. Stuart McMenamin Mark Quan David Simons

Fischer 1508BTH-45 5" Brass Barometer with Temperature & Humidity User Manual

Empirical Approaches to Multilingual Lexical Acquisition. Lecturer: Timothy Baldwin

Bias Correction in Classification Tree Construction ICML 2001

Answers Machine Learning Exercises 2

Decision Tree Learning - ID3

Sample questions for Fundamentals of Machine Learning 2018

Decision Trees. Danushka Bollegala

Bayesian Learning. Bayesian Learning Criteria

CS 6375 Machine Learning

OFFSHORE. Advanced Weather Technology

Classification and Regression Trees

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Impact on Agriculture

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Diploma Part 2. Quantitative Methods. Examiner s Suggested Answers

Classification and Prediction

VERY HOT ALL YEAR WEATHER CONDITIONS IN A LONG TIME THE CONDITIONS FOR FEW DAYS

Modelling the Electric Power Consumption in Germany

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Numerical Learning Algorithms

Naïve Bayes Lecture 6: Self-Study -----

Machine Learning 2nd Edi7on

CustomWeather Statistical Forecasting (MOS)

Fischer Banjo Weather Station with Thermometer, Hygrometer, Barometer User Manual

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Fault prediction of power system distribution equipment based on support vector machine

Fischer Instruments Chrome and Black Wood Base Weather Station with Barometer, Hygrometer, Thermometer and Quartz Clock User Manual

Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

1 Introduction. Station Type No. Synoptic/GTS 17 Principal 172 Ordinary 546 Precipitation

Data Mining and Machine Learning (Machine Learning: Symbolische Ansätze)

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Data classification (II)

Transcription:

Data Mining Chapter 1. What s it all about? 1

DM & ML Ubiquitous computing environment Excessive amount of data (data flooding) Gap between the generation of data and their understanding Looking for structural patterns in data i.e., intelligently analyzed data Data mining The process of discovering patterns in data Pattern: making useful predictions on new data Structural! (capturing the decision structure) 2

DM & ML Structural patterns Table 1.1 Contact Lens Data Recommendation: soft, hard, none e.g.) If tear production rate = reduced, then recommendation = none. Rule: generalizing the missing rows No null values (vs. real-life data sets) 3

DM & ML 4

Simple examples Attributes : the values of features Measuring different aspects of the instance The weather problem (Table 1.2) Attributes, Outcome Possible combinations : 3 x 3 x 2 x 2 = 36 A rule learned from the information e.g.) If outlook = overcast then play = yes Decision list A set of rules being interpreted in sequence Numeric values in Table 1.3 5

Simple examples Table 1.2 The weather data Outlook Temperature Humidity Windy Play Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No 6

Simple examples Table 1.3 The weather data with some numeric attributes Outlook Temperature Humidity Windy Play Sunny 85 85 False No Sunny 80 90 True No Overcast 83 86 False Yes Rainy 70 96 False Yes Rainy 68 80 False Yes Rainy 65 70 True No overcast 64 65 True Yes Sunny 72 95 False No Sunny 69 70 False Yes Rainy 75 80 False Yes Sunny 75 70 True Yes Overcast 72 90 True Yes Overcast 81 75 False Yes Rainy 71 91 True No 7

Simple examples Classification rules vs. Association rules Association rules: strongly associating different attribute values e.g.) IF humidity = normal and windy = false, THEN play = yes IF outlook = sunny and play = no, THEN humidity = high Predicting any of the attributes 8

Simple examples More examples Contact lenses, Weather problem Irises: A classic numeric dataset Attributes : numeric Outcome : category Computer configurations Outcome : CPU performance regression (numeric prediction) Labor negotiations (realistic) Outcome : whether the contract is acceptable or not. (by both labor & management)? : unknown or missing values 9

Simple examples 10

Simple examples 11

Simple examples 12

Simple examples Soybean classification 35 attributes, 19 disease categories Domain Knowledge IF leaf condition = normal and, IF leaf malformation = absent and, THEN diagnosis is rhizoctonia-root-rot The computer-generated rules outperformed the expert-derived rules. (97.5% vs 72%) 13

Simple examples 14

Fielded applications Web mining How to rank web pages Decisions involving judgment Whether to lend you money 1,000 training examples of borderline cases 20 attributes : age, years with current employer, Solution: reject all borderline cases? No! Borderline cases are most active customers Learned rules: correct on 70% of cases but human experts only 50% Improving the success rate of the loan decisions, explaining the reasons behind the decision 15

Fielded applications Screening images Detecting oil slicks Oil slicks appear as dark regions with changing size and shape, and few training examples. Expensive process requiring highly trained personnel Input : a set of raw pixel images from a radar satellite Output : a set of images with putative oil slicks Attributes: size of region, shape, area, intensity, 16

Fielded applications Load forecasting An automated load forecasting assistant to determine future demand for power a utility supplier in the electricity industry Given: manually constructed load model that assumes normal climatic conditions Problem: adjust for weather conditions Attributes: temperature, humidity, wind speed, Collecting 15 years data Far quicker (seconds) than trained human forecasters (hours) 17

Fielded applications Diagnosis Principal application areas of expert systems Preventative maintenance of electromechanical devices (e.g. 600 faults) Learned rules were slightly superior to the handcrafted ones. The system was put into use because the domain expert approved of the rules. 18

Fielded applications Marketing and sales Customer loyalty: identifying customers that are likely to defect by detecting changes in their behavior (e.g. banks/phone companies) Special offers: identifying profitable customers (e.g. reliable owners of credit cards that need extra money during the holiday season) Market basket analysis: Finding groups of items that tend to occur together in transactions of supermarket checkout data Manufacturing, customer support & service, scientific applications, monitoring, etc. 19

The data mining process 20

http://cis.catholic.ac.kr/sunoh 21