Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Similar documents
CC283 Intelligent Problem Solving 28/10/2013

Learning Decision Trees

Learning Decision Trees

Decision Trees. Gavin Brown

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Machine Learning 2nd Edi7on

CS 6375 Machine Learning

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Lecture 3: Decision Trees

Induction on Decision Trees

Algorithms for Classification: The Basic Methods

EECS 349:Machine Learning Bryan Pardo

CSC 4510 Machine Learning

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Decision Trees.

Modern Information Retrieval

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Introduction to Supervised Learning. Performance Evaluation

Decision Trees.

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Machine Learning Alternatives to Manual Knowledge Acquisition

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Data Warehousing & Data Mining

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Lecture 3: Decision Trees

Bayesian Classification. Bayesian Classification: Why?

Performance Evaluation

Decision Trees. Danushka Bollegala

Decision Trees. Tirgul 5

Dan Roth 461C, 3401 Walnut

Decision Trees / NLP Introduction

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Stephen Scott.

Administrative notes. Computational Thinking ct.cs.ubc.ca

Bayesian Learning Features of Bayesian learning methods:


Decision Trees Part 1. Rao Vemuri University of California, Davis

Least Squares Classification

Typical Supervised Learning Problem Setting

Bayesian Decision Theory

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

The Naïve Bayes Classifier. Machine Learning Fall 2017

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Classification Using Decision Trees

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Classification and Regression Trees

Classification and Prediction

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Lecture 9 Evolutionary Computation: Genetic algorithms

Performance. Learning Classifiers. Instances x i in dataset D mapped to feature space:

Artificial Intelligence. Topic

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

CS 380: ARTIFICIAL INTELLIGENCE

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Decision T ree Tree Algorithm Week 4 1

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

CSCE 478/878 Lecture 6: Bayesian Learning

Parts 3-6 are EXAMPLES for cse634

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Model Accuracy Measures

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Classification and regression trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Performance Evaluation and Hypothesis Testing

The Solution to Assignment 6

Introduction to Machine Learning CMU-10701

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Decision Tree Learning and Inductive Inference

Decision Tree Learning

Decision Tree Learning

Evaluation requires to define performance measures to be optimized

Inteligência Artificial (SI 214) Aula 15 Algoritmo 1R e Classificador Bayesiano

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

the tree till a class assignment is reached

Induction of Decision Trees

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Lecture 22. Introduction to Genetic Algorithms

Artificial Intelligence Decision Trees

Modern Information Retrieval

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Learning Classification Trees. Sargur Srihari

Rule Generation using Decision Trees

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Artificial Intelligence Programming Probability

Performance Evaluation and Comparison

Course 395: Machine Learning

Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Evaluation. Andrea Passerini Machine Learning. Evaluation

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Applied Machine Learning Annalisa Marsico

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Tutorial 6. By:Aashmeet Kalra

Chapter 8: Introduction to Evolutionary Computation

Transcription:

Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy Sell Hold... Machine learning is function fitting Does function f exist? What y to learn? How to prepare x? How should f look like? How to evaluate f? Predictions How to find f? What to learn? We could try to predict the price tomorrow We cold try to predict whether prices will go up (or down) by a margin E.g. will the price go up by r% within n days? tes: Always ask: can the market be predicted? There is no magic in machine learning Harder task less chance to succeed FTSE 2009.08.18 2010.10.22 FTSE 2009.08.18 2010.10.22 5900.0 5900.0 5700.0 5700.0 5500.0 5500.0 5300.0 5100.0 5300.0 5100.0 Price 7 Days Moving Average 14 Days Moving Average 4900.0 4900.0 4700.0 4700.0 4500.0 4500.0 05 vember 2010 All Rights Reserved, Edward Tsang 05 vember 2010 All Rights Reserved, Edward Tsang Moving Average Rules to Find Let m MA be the m days moving average n MA be the n days moving average m < n Possible rules to find: If the m MA n MA, on day d, but m MA > n MA on day d+1, then buy If the m MA n MA, on day d, but m MA < n MA on day d+1, then sell Confusion Matrix Prediction + 5 2 7 + 1 2 3 6 4 10 Prediction + + + + + + + 05/11/2010 All Rights Reserved, Edward Tsang Edward Tsang (all rights reserved) 1

Performance Measures Ideal Predictions 7 0 7 + 0 3 3 7 3 10 Actual Predictions, Example 5 2 7 + 1 2 3 6 4 10 RC = (5+2) 10 = 70% Precision = 2 4 = 50% Recall = 2 3 = 67% Matthews correlation coefficient (MCC) TP TN FP FN MCC O O P P MCC measures the quality of binary classifications Range from 1 to 1 (Related to Chi square 2 ) ity Real Predictions TN FP O + FN TP O + MCC P P + n 2 n More on Performance Measures Predictions TN 5 FP 2 7 + FN 1 TP 2 3 6 4 10 TN = True Negative FN = False Negative = Miss = Type II Error FP = False Positive = False Alarm = Type I Error TP = True Positive False Positive Rate = FP/(FP+TN) = 2/(2+5) = 14% True Positive Rate = recall = TP/(TP+FN) = 2/(2+1) = 67% Receiver Operating Characteristic (ROC) Each predication measure occupies one point Points on diagonal represent random predictions A curve can be fitted on multiple measures te: points may not cover widely Area under the curve measures the classifier s performance e Rate True Positive 0.8 0.6 0.4 0.2 0.0 Perfect classification Random classifications 0.0 0.2 0.4 0.6 0.8 False Positive Rate Receiver Operating Characteristic (ROC) A classifier may occupy only one area in the ROC space Other classifiers may occupy multiple points Some classifiers may be tuned to occupy different parts of the curve e Rate True Positive 0.8 0.6 0.4 0.2 Cautious classifiers 0.0 Perfect classific ation Random classific ations 0.0 0.2 0.4 0.6 0.8 False Positive Rate Liberal classifiers Scarce opportunities 1 Ideal Predictions 9,900 0 99% + 0 100 1% Ideal prediction Accuracy = Precision = Recall = 100% Easy to achieve high accuracy Worthless predications 05 vember 2010 All Rights Reserved, Edward Tsang Easy Predictions 9,900 0 99% + 100 0 1% 100% 0% Easy score on accuracy Accuracy = 99%, Precision =? Recall = 0% Edward Tsang (all rights reserved) 2

12 CC283 Intelligent Problem Solving 05/11/2010 Scarce opportunities 2 Scarce opportunities 3 Easy Predictions 9,900 0 99% + 100 0 1% 100% 0% Random Predictions 9,801 99 99% + 99 1 1% Random Predictions 9,801 99 99% + 99 1 1% Predictions 9,810 90 99% + 90 10 1% Easy score on accuracy Accuracy = 99%, Precision =? Recall = 0% Random move from to + Accuracy = 98.02% Precision = Recall = 1% Random move from to + Accuracy = 98.02% Precision = Recall = 1% Better moves from to + Accuracy = 98.2% Precision = Recall = 10% Unintelligent improvement of recall Random + ve predictions A more useful predication Sacrifice accuracy (from 99% in easy prediction) 05 vember 2010 All Rights Reserved, Edward Tsang 05 vember 2010 All Rights Reserved, Edward Tsang Machine Learning Techniques Early work: Quinlan s ID3 / C4.5/ C5 Building decision trees Neural networks A solution is represented by a network Learning through feedbacks Evolutionary computation Keep a population of solutions Learning through survival of the fittest ID3 for machine learning ID3 is an algorithm invented by Quinlan ID3 performs supervised learning It builds decision trees Perfect fitting with training data Like other machine learning techniques: guarantee that it fits testing data Danger of over fitting Example Classification Problem: Play Tennis? Day Temper. Humid. Wind Play? 1 Sunny Hot High Weak 2 Sunny Hot High Strong 3 Overcast Hot High Weak 4 Rain Mild High Weak 5 Rain Cool rmal Weak 6 Rain Cool rmal Strong 7 Overcast Cool rmal Strong 8 Sunny Mild High Weak 9 Sunny Cool rmal Weak 10 Rain Mild rmal Weak 11 Sunny Mild rmal Strong 12 Overcast Mild High Strong 13 Overcast Hot rmal Weak 14 Rain Mild High Strong Example Decision Tree Sunny Overcast Rain Humidity Wind High rmal Strong Weak Decision to make: play tennis? Edward Tsang (all rights reserved) 3

Example: ID3 Picks an attribute Simplified ID3 in Action (1) Pick an attribute A Compute Gain(S, A): : 0.246 Humidity: 0.151 Wind: 0.048 Temperature: 0.029 is picked Sunny Overcast D8-, D3+, D7+, D12+, D13+ Rain D10+, [4+, 0-] [2+, 3-] [3+, 2-] 1) t all examples agree on conclusion 2) Pick one attribute is picked 3) Divide examples according to values in 4) Build each branch recursively for Overcast Sunny Overcast D8-, D3+, D7+, D12+, D13+ Rain D10+, [4+, 0-] [2+, 3-] [3+, 2-] Simplified ID3 in Action (2) Simplified ID3 in Action (3) 1) Expand =Sunny 2) t all examples agree 3) Pick one attribute Humidity is picked 4) Divide examples 5) Build two branches for High for rmal = Sunny D8-, Humidity High D8- rmal 1) Expand =Rain 2) t all examples agree 3) Pick one attribute Wind is picked 4) Divide examples 5) Build two branches for Strong for Weak = Rain D10+, Wind Strong Weak D10+ D8- High ID3 in Action Tree Built Humidity Sunny Overcast Rain rmal D3+, D7+, D12+, D13+ Strong Wind Weak D10+ Remarks on ID3 Decision trees are easy to understand Decision trees are easy to use But what if data has noise? I.e. under the same situation, contradictory results were observed Besides, what if some values are missing from the decision tree? E.g. Humidity = Low These are handled by C4.5 and See5 (beyond our syllabus) Edward Tsang (all rights reserved) 4

Exercise Classification Problem Admit? Student Maths English Physics IT Exam 1 A A C B Admit 2 A A A C Admit 3 A B C A Admit 4 A A B A Admit 5 B A A C Admit 6 B B C A Admit 7 A B B B Admit 8 B A B C Admit 9 B B A B Admit 10 C A B A Admit 11 C A C A Reject 12 A C C B Reject 13 C A B C Reject 14 B C A C Reject y accept students with at cluding at least one A. 3 find it? Suppose the rule is: Only least three (A or B)s, inc Could ID3 1+, 2+, 4+ Admit ID3 Exercise: admit or reject A Maths B C English A B C 3+, 6+, 7+, 9+ 12-, 14- Reject 5+, 8+ 10+, Admit 11-, Admit 13- Will the rule found admit students with (B, B, B, B), or (C, A, A, B)? Lessons from the exercise Without the right columns in the database, one cannot learn the underlying rule It would help if there were a value A or B Rules generalise they may not be correct Some branches may not be covered t shown in the above slide Machine Learning Summary Machine learning is function fitting First question: what function to find? Do such functions exist? If not, all efforts are futile How to measure success? Finally, find such functions (ID3, NN, EA) ( more important than the above questions) Caution: avoid too much faith (There is too much hype!) Artificial Neural Network... Input Units Update weights Hidden Units Output Units t with Target Compare output Friday, 05 vember 2010 Edward Tsang (Copyright) 32 Terminology in GA String with Building blocks 1 0 0 0 1 1 0 with values evaluation fitness Chromosome with Genes with alleles Example of a candidate solution in binary representation (vs real coding) A population is maintained 5 vember, 2010 Copyright Edward Tsang 33 Edward Tsang (all rights reserved) 5

Example Problem For GA Example Initial Population To maximize f(x) = 100 + 28x x2 maximize f(x) = 100 + 28x x 2 optimal solution: x = 14 (f(x) = 296) Use 5 bits representation e.g. binary 01101 = decimal 13 f(x) = 295 te: representation issue can be tricky Choice of representation is crucial 5 vember, 2010 Copyright Edward Tsang 34. String Decim. f(x) weight Accum 1 01010 10 280 28 28 2 01101 13 295 29 57 3 11000 24 196 19 76 4 10101 21 247 24 100 Accum averg 1,018 254.5 5 vember, 2010 Copyright Edward Tsang 35 Selection in Evolutionary Computation Crossover F(x) Roulette wheel method The fitter a string, the more chance it has to be selected 24% 10101 28% 01010 19% 11000 29% 01101 1 0 1 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 18 0 1 1 0 1 13 0 1 1 1 0 14 0 1 0 0 1 9 Sum of fitness: Average fitness: 280 295 296 271 1,142 285.5 5 vember, 2010 Copyright Edward Tsang 36 5 vember, 2010 Copyright Edward Tsang 37 Evolutionary Computation: Model based Generate & Test Fee edback (To upd date model) Model (To Evolve) A Model could be a population of solutions, or a probability model Generate: select, create/mutate vectors / trees Test Candidate Solution Observed Performance A Candidate Solution could be a vector of variables, or a tree The Fitness of a solution is application-dependent, e.g. drug testing 5 vember, 2010 Copyright Edward Tsang 38 Edward Tsang (all rights reserved) 6