Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Similar documents
Classification and Prediction

CSE 5243 INTRO. TO DATA MINING

CS145: INTRODUCTION TO DATA MINING

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Data Mining Part 5. Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

UNIT 6 Classification and Prediction

Notes on Machine Learning for and

Decision Tree And Random Forest

Decision T ree Tree Algorithm Week 4 1

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Lecture 7 Decision Tree Classifier

Artificial Neural Networks Examination, June 2005

Holdout and Cross-Validation Methods Overfitting Avoidance

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Learning and Neural Networks

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Mining Classification Knowledge

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Mining Classification Knowledge

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

CS590D: Data Mining Prof. Chris Clifton

Nonlinear Classification

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

1 Handling of Continuous Attributes in C4.5. Algorithm

Classification Using Decision Trees

Naïve Bayesian. From Han Kamber Pei

1 Handling of Continuous Attributes in C4.5. Algorithm

Learning Decision Trees

Rule Generation using Decision Trees

Supervised Learning. George Konidaris

Decision Tree Learning

Knowledge Discovery and Data Mining

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

CHAPTER-17. Decision Tree Induction

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Classification. Classification. What is classification. Simple methods for classification. Classification by decision tree induction

Learning Decision Trees

CS 6375 Machine Learning

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

brainlinksystem.com $25+ / hr AI Decision Tree Learning Part I Outline Learning 11/9/2010 Carnegie Mellon

Classification and Regression Trees

Chapter 6 Classification and Prediction (2)

Statistical Learning. Philipp Koehn. 10 November 2015

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Decision Tree Learning Lecture 2

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Decision Trees

Data Mining: Concepts and Techniques

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Artificial Neural Networks Examination, March 2004

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

Data Mining Part 4. Prediction

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Induction of Decision Trees

Final Exam, Machine Learning, Spring 2009

Lecture 3: Decision Trees

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Data Mining und Maschinelles Lernen

CS6220: DATA MINING TECHNIQUES

Neural Networks and Ensemble Methods for Classification

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 7: DecisionTrees

Supervised Learning (contd) Decision Trees. Mausam (based on slides by UW-AI faculty)

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Machine Learning 2010

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Data Mining: Concepts and Techniques

Chapter 6: Classification

Decision Trees / NLP Introduction

PATTERN CLASSIFICATION

Multilayer Neural Networks

the tree till a class assignment is reached

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Classification: Decision Trees

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

Machine Learning & Data Mining

Classification and Prediction

CC283 Intelligent Problem Solving 28/10/2013

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Empirical Risk Minimization, Model Selection, and Model Assessment

CS6220: DATA MINING TECHNIQUES

Decision Trees (Cont.)

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Algorithm-Independent Learning Issues

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

day month year documentname/initials 1

Transcription:

Data Mining Andrew Kusiak 2139 Seamans Center Iowa City, Iowa 52242-1527 Preamble: Control Application Goal: Maintain T ~Td Tel: 319-335 5934 Fax: 319-335 5669 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Partially based on the material provided by J Han and M Kamber HVAC Application Practitioner s Approach Industrial Researcher s Approach Adust one at a time: Flow rate f Outside air inflow Rarely air temperature t Select the desired parameter value of fan speed For an on/off outside air inlet position, and Fixed t Evaluate the impact of various values of f on T Select the most suitable flow rate 1

Classical Science (Physical) Approach Design of Experiments Approach Approximate the a load model as L ~ F(f, Δt) Model demand D Match the load L with demand D Optimize the model to determine the flow value f Note: Another equation is needed to get fan speed or a control set point For a given level of outside air Constant temperature t Flow rate f at different levels Monitor the actual temperature T Select flow rate f resulting in T ~ Td Note: Fun speed needs to be selected separately Data Mining Approach Data Mining Approach Collect data on fun speed, f, and t Collect data on different values of outside air flow Collect data on the corresponding T Develop a model T = F(fun speed, f, t, outside air flow) Optimize the extracted model Note: Set points or parameter values can be explicitly derived, as needed. Data mining model can act the process controller The use of actual data makes the DM process controller to understand the process 2

Extensions: DM Approach Classification and Prediction CO2 Assume an analog control loop An additional performance metric CO2 can be easily is considered Classical approaches can not be easily extended to handle the new problem Learning Classification and prediction Classification by decision tree induction Classification by backpropagation Other classification methods Prediction Classification accuracy http://www.kdnuggets.com/ Learning What is learning? Extraction of knowledge Pattern creation Basis of learning Training data set Classification vs. Prediction Classification: predicts categorical class labels classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it for classifying cases with unknown outcomes Prediction: models continuous-valued functions, i.e., predicts unknown or missing values http://www.twocrows.com/glossary.htm 3

Classification and Decision Making Classification Process: Model Construction Step 1: Model construction (e.g., knowledge extraction) Training Data Learning Algorithm Step 2: Model usage (e.g., decision making) NAME RANK YEARS TENURED Mike Assistant Prof 3 no Mary Assistant Prof 7 Bill Professor 2 Jim Associate Prof 7 Dave Assistant Prof 6 no Anne Associate Prof 3 no Classifier (Model) IF rank = professor OR years > 6 THEN tenured = Classification Process: Use the Model in Prediction Testing Data Classifier (Model) e.g., decision rules NAME RANK YEARS TENURED Tom Assistant Prof 2 no Merlisa Associate Prof 7 no George Professor 6 Joseph Assistant Prof 7 Unseen Data (Professor, 4) Tenured? Training Data Set No. F1 F2 F3 F4 D 1 1.02 Red 2.98 High Good 2 2.03 Black 1.04 Low Bad 3 0.99 Blue 3.04 High Good 4 2.03 Blue 3.11 High Good 5 0.03 Orange 0.96 Low Bad 6 0.04 Blue 1.04 Medium Bad 7 0.99 Orange 1.04 Medium Good 8 1.02 Red 0.94 Low Bad 4

Extracted Rules No. F1 F2 F3 F4 D 1 1.02 Red 2.98 High Good 2 2.03 Black 1.04 Low Bad 3 0.99 Blue 3.04 High Good 4 2.03 Blue 3.11 High Good 5 0.03 Orange 0.96 Low Bad 6 0.04 Blue 1.04 Medium Bad 7 0.99 Orange 1.04 Medium Good 8 1.02 Red 0.94 Low Bad Rule 1. IF (F4 = High) THEN (D = Good); [1, 3, 4] Rule 2. IF (F4 = Medium) AND (F2 = Blue) THEN (D = Bad); [6] Rule 3. IF (F4 = Medium) AND (F2 = Orange) THEN (D = Good); [7] Rule 4. IF (F4 = Low) THEN (D = Bad); [2, 5, 8] Patterns Rule 1. IF (F4 = High) THEN (D = Good); [1, 3, 4] Rule 2. IF (F4 = Medium) AND (F2 = Blue) THEN (D = Bad); [6] Rule 3. IF (F4 = Medium) AND (F2 = Orange) THEN (D = Good); [7] Rule 4. IF (F4 = Low) THEN (D = Bad); [2, 5, 8] No. F1 F2 F3 F4 D Rule 1 1.02 Red 2.98 High Good 1 2 2.03 Black 1.04 Low Bad 4 3 0.99 Blue 3.04 High Good 1 4 2.03 Blue 3.11 High Good 1 5 0.03 Orange 0.96 Low Bad 4 6 0.04 Blue 1.04 Medium Bad 2 7 0.99 Orange 1.04 Medium Good 3 8 1.02 Red 0.94 Low Bad 4 Decision Making No. F1 F2 F3 F4 D 9 Blue Medium? No. F1 F2 F3 F4 D Rule 1 1.02 Red 2.98 High Good 1 2 203 2.03 Black 104 1.04 Low Bad 4 3 0.99 Blue 3.04 High Good 1 4 2.03 Blue 3.11 High Good 1 5 0.03 Orange 0.96 Low Bad 4 6 0.04 Blue 1.04 Medium Bad 2 7 0.99 Orange 1.04 Medium Good 3 8 1.02 Red 0.94 Low Bad 4 Supervised vs. Unsupervised Learning Supervised learning (classification) Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations New data is classified based on the training set Unsupervised learning (clustering) The class labels of training data is unknown Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data 5

Classification by Decision Tree Induction Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases Tree construction At start, all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers Use of decision tree: Classifying an unknown sample Test the attribute values of the sample against the decision tree Quinlan s data set Decision Tree Algorithm age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair >40 medium no fair >40 low fair >40 low excellent no 31 40 low excellent <=30 medium no fair no <=30 low fair >40 medium fair <=30 medium excellent 31 40 medium no excellent 31 40 high fair >40 medium no excellent no http://www.megaputer.com/products/pa/algorithms/dt.php3 Output: A Decision Tree for buys_computer age? Information Gain (C4.5) Attributes (Features) <=30 30 40 >40 student? credit rating? no excellent fair no no (rows) Set S of examples ( p elements (rows, examples) n elements (rows) P class Information I( p, n) N class p p n log 2 log p + n p + n p + n = 2 n p + n 6

Information Gain in Decision Tree Induction Assume that using attribute A a set S will be partitioned into sets {S 1, S 2,, S v }, i.e., v attribute A values If S i contains p i examples of P and n i examples of N, the entropy, or the expected information needed to classify obects in all subtrees S i is + = ν pi ni E( A) I( pi, ni ) i= 1 p + n The encoding information that would be gained by branching on A Gain ( A) = I( p, n) E( A) age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 30 40 high no fair >40 medium no fair >40 low fair >40 low excellent no 31 40 low excellent <=30 medium no fair no <=30 low fair >40 medium fair <=30 medium excellent 31 40 medium no excellent 31 40 high fair >40 medium no excellent no Continuous attribute values a split value Attribute Selection by Information Gain Computation g Class P: buys_computer = g Class N: buys_computer = no g Information I(p, n) = I(9, 5) = 0.940 g Compute the entropy for age: age p i n i I(p i, n i) <=30 2 3 0.971 30 40 4 0 0 >40 3 2 0.971 + = ν pi ni E( A) I( pi, ni ) Entropy i= 1 p + n 5 4 E ( age ) = I (2,3) + I (4,0) 14 14 5 + I (3,2) = 0.694 14 Gain (age) = I - E = 0.246 Also Gain( income) = 0.029 Gain( student) = 0.151 Gain( credit _ rating) = 0.048 Decision Tree Revisited student age <=30 30..40 >40 credit rating no excellent fair no no s @ each terminal node 7

Neural Networks Advantages prediction accuracy is generally high robust, works when training examples contain errors output may be discrete, real-valued, or a vector of several discrete or real-valued attributes fast evaluation of fthe learned target tfunction Disadvantages long training time difficult to understand the learned function (weights) not easy to incorporate domain knowledge http://www.doc.ic.ac.uk/~nd/surprise_96/ournal/vol4/cs11/report.html x 0 x 1 x n Input vector x w 0 w 1 w n Weight vector w Weighted sum - μ k f Activation function Neuron Output y The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping Neural Network Training The ultimate obective of training obtain a set of weights that makes almost all the tuples in the training data classified correctly Steps Initialize weights with random values Feed the input tuples into the network one by one For each unit Compute the net input to the unit as a linear combination of all the inputs to the unit Compute the output value using the activation function Compute the error Update the weights and the bias Output vector Output nodes Hidden nodes Input nodes Input vector: x i Multi-Layer Perceptron Err = O (1 O ) Err w i θ =θ + (l) Err w = w + ( l) Err O w i i Err = O 1 O )( T O ) 1 O = I 1 + e I = w O + θ i i k i i k k ( 8

Other Classification Methods k-nearest neighbor classifier Case-based reasoning Genetic algorithm Rough set approach Fuzzy set approach Genetic Algorithms GA: based on an analogy to biological evolution Each rule is represented by a string of bits An initial population is created consisting of randomly generated rules e.g., IF A 1 and Not A 2 then C 2 can be encoded as 100 Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offspring The fitness of a rule is represented by its classification accuracy on a set of training examples Offspring are generated by crossover and mutation http://www4.ncsu.edu/eos/users/d/dhloughl/public/stable.htm Rough Set Approach Rough sets are used to approximately or roughly define equivalent classes A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to C) Finding the minimal subsets (reducts) of attributes (for feature reduction) is NP-hard but a discernibility matrix is used to reduce the computation intensity Regression Analysis and Log- Linear Models in Prediction Linear regression: Y = α + β X Two parameters, α and β specify the line and are to be estimated by using the data at hand. using the least squares criterion to the known values of Y1, Y2,, X1, X2,. Multiple regression: Y = b0 + b1 X1 + b2 X2. Many nonlinear functions can be transformed into the above. Log-linear models: The multi-way table of oint probabilities is approximated by a product of lower-order tables. Probability: p(a, b, c, d) = αab βacχad δbcd 9

Classification Accuracy 1 Classification Accuracy 2 Predicted result _ + Classification accuracy (CA)ofarulesetistheratio the ratio of the number of correctly classified obects from the test set and all obects in the test set esult Actual r + A B _ C D Accuracy = (A + D)/(A + B + C + D) Actual result + _ Classification Accuracy 3 Predicted result _ + A C B D Accuracy = (A + D)/(A + B + C + D) Sensitivity (true positive rate) = A/(A+B) Specificity (true negative rate) = D/(C+D) False negative rate = B/(A+B) = 1 Sensitivity (Type I error) False positive rate = C/(C+D) = 1 Specificity (Type II error) Positive predicted value = A/(A+C) Negative predicted value = D/(B+D) Classification Accuracy: Estimating Error Rates Partition: Training-and-testing use two independent data sets, e.g., training set (2/3), test set(1/3) used for data set with large number of samples Cross-validation divide the data set into k subsamples use k-1 subsamples as training data and one sub-sample as test data --- k-fold cross-validation for data set with moderate size Bootstrapping (leave-one-out) for small size data 10

Reference Han J. and M. Kamber (2000), Data Mining: Concepts and Techniques, Academic Press, San Diego, CA. 11