CC283 Intelligent Problem Solving 28/10/2013

Similar documents
Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Learning Decision Trees

Decision Trees. Gavin Brown

Learning Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 6375 Machine Learning

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Machine Learning 2nd Edi7on

EECS 349:Machine Learning Bryan Pardo

Lecture 3: Decision Trees

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

Induction on Decision Trees

Algorithms for Classification: The Basic Methods

Modern Information Retrieval

Lecture 3: Decision Trees

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Introduction to Supervised Learning. Performance Evaluation

Decision Trees. Danushka Bollegala

Machine Learning Alternatives to Manual Knowledge Acquisition

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

CSC 4510 Machine Learning

Least Squares Classification

Decision Trees.

Stephen Scott.

Bayesian Classification. Bayesian Classification: Why?

Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees. Tirgul 5

Typical Supervised Learning Problem Setting

Performance Evaluation

The Naïve Bayes Classifier. Machine Learning Fall 2017

Decision Trees / NLP Introduction

CS 380: ARTIFICIAL INTELLIGENCE

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Administrative notes. Computational Thinking ct.cs.ubc.ca


SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Decision Trees.

Dan Roth 461C, 3401 Walnut

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Bayesian Decision Theory

Evaluation requires to define performance measures to be optimized

16.4 Multiattribute Utility Functions

Bayesian Learning Features of Bayesian learning methods:

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Classification and Regression Trees

Lecture 9 Evolutionary Computation: Genetic algorithms

Tools of AI. Marcin Sydow. Summary. Machine Learning

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Artificial Intelligence. Topic

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Evaluation. Andrea Passerini Machine Learning. Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

CSCE 478/878 Lecture 6: Bayesian Learning

Classification and Prediction

Data Warehousing & Data Mining

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Decision Tree Learning

[read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] General-to-specific ordering over hypotheses

Classification Using Decision Trees

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Artificial Intelligence Programming Probability

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

The Solution to Assignment 6

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Anomaly Detection. Jing Gao. SUNY Buffalo

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Decision Tree Learning and Inductive Inference

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Performance. Learning Classifiers. Instances x i in dataset D mapped to feature space:

Model Accuracy Measures

Machine Learning, Fall 2009: Midterm

Applied Machine Learning Annalisa Marsico

Classification: Decision Trees

Classification and regression trees

the tree till a class assignment is reached

Lecture 22. Introduction to Genetic Algorithms

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

Course 395: Machine Learning

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Support. Dr. Johan Hagelbäck.

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Parts 3-6 are EXAMPLES for cse634

Performance Evaluation and Hypothesis Testing

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Learning Classification Trees. Sargur Srihari

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Modern Information Retrieval

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Inductive Learning. Chapter 18. Why Learn?

Induction of Decision Trees

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Performance Evaluation and Comparison

Transcription:

Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy Sell Hold... Supervised learning is function fitting Does function f exist? What y to learn? How to prepare x? How should f look like? How to evaluate f? Predictions How to find f? What to learn? We could try to predict the price tomorrow We cold try to predict whether prices will go up (or down) by a margin E.g. will the price go up by r% within n days? tes: Always ask: can the market be predicted? There is no magic in machine learning Harder task less chance to succeed 5900.0 5700.0 5500.0 5300.0 5100.0 4900.0 4700.0 4500.0 FTSE 2009.08.18 2010.10.22 FTSE 2009.08.18 2010.10.22 FTSE 2009.08.18 2010.10.22 5900.0 5900.0 5700.0 5700.0 5500.0 5500.0 5300.0 5300.0 5100.0 Price 7 Days Moving Average 14 Days Moving Average 5100.0 7 Days Moving Average 14 Days Moving Average 4900.0 4900.0 4700.0 4700.0 4500.0 4500.0 Edward Tsang (all rights reserved) 1

Moving Average Rules to Find Let m MA be the m days moving average n MA be the n days moving average m < n Possible rules to find: If the m MA n MA, on day d, but m MA > n MA on day d+1, then buy If the m MA n MA, on day d, but m MA < n MA on day d+1, then sell Performance Measures Summarising results with Confusion Matrix Basic measures Matthews correlation coefficient Receiver Operating Characteristic, ROC Performance Types of classifiers Dealing with Scarce Opportunities Unintelligent / Intelligent improvement 28/10/2013 All Rights Reserved, Edward Tsang Confusion Matrix Prediction + 5 2 7 + 1 2 3 6 4 10 Prediction + + + + + + + Performance Measures Ideal Predictions 7 0 7 + 0 3 3 7 3 10 Actual Predictions, Example 5 2 7 + 1 2 3 6 4 10 RC = (5+2) 10 = 70% Precision = 2 4 = 50% Recall = 2 3 = 67% More on Performance Measures Predictions TN 5 FP 2 7 + FN 1 TP 2 3 6 4 10 TN = True Negative FN = False Negative = Miss = Type II Error FP = False Positive = False Alarm = Type I Error TP = True Positive False Positive Rate = FP/(FP+TN) = 2/(2+5) = 14% True Positive Rate = recall = TP/(TP+FN) = 2/(2+1) = 67% Receiver Operating Characteristic (ROC) Each predication measure occupies one point Points on diagonal represent random predictions A curve can be fitted on multiple measures te: points may not cover widely Area under the curve measures the classifier s performance True Positive Rate 0.8 0.6 0.4 0.2 0.0 Perfect classification Random classifications 0.0 0.2 0.4 0.6 0.8 False Positive Rate Edward Tsang (all rights reserved) 2

Receiver Operating Characteristic (ROC) A classifier may occupy only one area in the ROC space Other classifiers may occupy multiple points Some classifiers may be tuned to occupy different parts of the curve True Positive Rate 0.8 0.6 0.4 0.2 Cautious classifiers 0.0 Perfect classific ation Random classific ations 0.0 0.2 0.4 0.6 0.8 False Positive Rate Liberal classifiers Scarce opportunities 1 Ideal Predictions 9,900 0 99% + 0 100 1% Ideal prediction Accuracy = Precision = Recall = 100% Easy to achieve high accuracy Worthless predications 28 October 2013 All Rights Reserved, Edward Tsang Easy Predictions 9,900 0 99% + 100 0 1% 100% 0% Easy score on accuracy Accuracy = 99%, Precision =? Recall = 0% Scarce opportunities 2 Scarce opportunities 3 Easy Predictions 9,900 0 99% + 100 0 1% 100% 0% Random Predictions 9,801 99 99% + 99 1 1% Random Predictions 9,801 99 99% + 99 1 1% Predictions 9,810 90 99% + 90 10 1% Easy score on accuracy Accuracy = 99%, Precision =? Recall = 0% Random move from to + Accuracy = 98.02% Precision = Recall = 1% Random move from to + Accuracy = 98.02% Precision = Recall = 1% Better moves from to + Accuracy = 98.2% Precision = Recall = 10% Unintelligent improvement of recall Random + ve predictions A more useful predication Sacrifice accuracy (from 99% in easy prediction) 28 October 2013 All Rights Reserved, Edward Tsang 28 October 2013 All Rights Reserved, Edward Tsang Machine Learning Techniques Early work: Quinlan s ID3 / C4.5/ C5 Building decision trees Neural networks A solution is represented by a network Learning through feedbacks Evolutionary computation Keep a population of solutions Learning through survival of the fittest ID3 for machine learning ID3 is an algorithm invented by Quinlan ID3 performs supervised learning It builds decision trees Perfect fitting with training data Like other machine learning techniques: guarantee that it fits testing data Danger of over fitting Edward Tsang (all rights reserved) 3

12 CC283 Intelligent Problem Solving 28/10/2013 Example Classification Problem: Play Tennis? Day Temper. Humid. Wind Play? 1 Sunny Hot High Weak 2 Sunny Hot High Strong 3 Overcast Hot High Weak 4 Rain Mild High Weak 5 Rain Cool rmal Weak 6 Rain Cool rmal Strong 7 Overcast Cool rmal Strong 8 Sunny Mild High Weak 9 Sunny Cool rmal Weak 10 Rain Mild rmal Weak 11 Sunny Mild rmal Strong 12 Overcast Mild High Strong 13 Overcast Hot rmal Weak 14 Rain Mild High Strong Example Decision Tree Sunny Overcast Rain Humidity Wind High rmal Strong Weak Decision to make: play tennis? Example: ID3 Picks an attribute Simplified ID3 in Action (1) Pick an attribute A Compute Gain(S, A): : 0.246 Humidity: 0.151 Wind: 0.048 Temperature: 0.029 is picked D8-, Sunny Overcast D3+, D7+, D12+, D13+ Rain D10+, [4+, 0-] [2+, 3-] [3+, 2-] 1) t all examples agree on conclusion 2) Pick one attribute is picked 3) Divide examples according to values in 4) Build each branch recursively for Overcast D8-, Sunny Overcast D3+, D7+, D12+, D13+ Rain D10+, [4+, 0-] [2+, 3-] [3+, 2-] Simplified ID3 in Action (2) Simplified ID3 in Action (3) 1) Expand =Sunny 2) t all examples agree 3) Pick one attribute Humidity is picked 4) Divide examples 5) Build two branches for High for rmal = Sunny D8-, Humidity High D8- rmal 1) Expand =Rain 2) t all examples agree 3) Pick one attribute Wind is picked 4) Divide examples 5) Build two branches for Strong for Weak = Rain D10+, Wind Strong Weak D10+ Edward Tsang (all rights reserved) 4

D8- High ID3 in Action Tree Built Humidity Sunny Overcast Rain rmal D3+, D7+, D12+, D13+ Strong Wind Weak D10+ Remarks on ID3 Decision trees are easy to understand Decision trees are easy to use But what if data has noise? I.e. under the same situation, contradictory results were observed Besides, what if some values are missing from the decision tree? E.g. Humidity = Low These are handled by C4.5 and See5 (beyond our syllabus) Exercise Classification Problem Admit? Student Maths English Physics IT Exam 1 A A C B Admit 2 A A A C Admit 3 A B C A Admit 4 A A B A Admit 5 B A A C Admit 6 B B C A Admit 7 A B B B Admit 8 B A B C Admit 9 B B A B Admit 10 C A B A Admit 11 C A C A Reject 12 A C C B Reject 13 C A B C Reject 14 B C A C Reject Suppose the rule is: Only accept students with at least three (A or B)s, including at least one A. Could ID3 find it? 1. modelling Unsupervised Learning in Trading and Markets Mechanism Design Agent 1 Agent 2 Agent n 5. modify 4. Compare and contrast 2. interaction 1. modelling Artificial Market 3. Observe Prices, wealth, etc Observed market data ( stylized facts ) I may not know how a solution looks like, but can tell what is good / bad Experimenter E.g. supply prices CHASM, an artificial market platform Machine Learning Summary Artificial Neural Network Supervised learning is function fitting First question: what function to find? Do such functions exist? If not, all efforts are futile How to measure success? Finally, find such functions (ID3, NN, EA) Caution: avoid too much faith Unsupervised learning is generate and test One know what is good/bad when one sees it.. Input Units Update weights Hidden Units Output Units Compare output with Target Monday, 28 October 2013 Edward Tsang (Copyright) 35 Edward Tsang (all rights reserved) 5

Terminology in GA String with Building blocks 1 0 0 0 1 1 0 with values evaluation fitness Chromosome with Genes with alleles Example of a candidate solution in binary representation (vs real coding) A population is maintained Example Problem For GA maximize f(x) = 100 + 28x x 2 optimal solution: x = 14 (f(x) = 296) Use 5 bits representation e.g. binary 01101 = decimal 13 f(x) = 295 te: representation issue can be tricky Choice of representation is crucial 28 October, 2013 Copyright Edward Tsang 36 28 October, 2013 Copyright Edward Tsang 37 Example Initial Population Selection in Evolutionary Computation To maximize f(x) = 100 + 28x x2. String Decim. f(x) weight Accum 1 01010 10 280 28 28 2 01101 13 295 29 57 3 11000 24 196 19 76 4 10101 21 247 24 100 Accum averg 1,018 254.5 Roulette wheel method The fitter a string, the more chance it has to be selected 24% 10101 28% 01010 19% 11000 29% 01101 28 October, 2013 Copyright Edward Tsang 38 28 October, 2013 Copyright Edward Tsang 39 1 0 1 0 1 Crossover 1 0 0 1 0 18 F(x) 280 Evolutionary Computation: Model based Generate & Test Model (To Evolve) A Model could be a population of solutions, or a probability model 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 1 13 0 1 1 1 0 14 0 1 0 0 1 9 Sum of fitness: Average fitness: 295 296 271 1,142 285.5 Feedback (To update model) Generate: select, create/mutate vectors / trees Test Candidate Solution Observed Performance A Candidate Solution could be a vector of variables, or a tree The Fitness of a solution is application-dependent, e.g. drug testing 28 October, 2013 Copyright Edward Tsang 40 28 October, 2013 Copyright Edward Tsang 41 Edward Tsang (all rights reserved) 6