Incremental Learning and Concept Drift: Overview

Similar documents
Learning Decision Trees

Learning Decision Trees

Foundations of Classification

Lecture 3: Decision Trees

Decision Trees / NLP Introduction

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

MODULE -4 BAYEIAN LEARNING

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Jeffrey D. Ullman Stanford University

Tutorial 6. By:Aashmeet Kalra

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Midterm, Fall 2003

Lecture 3: Decision Trees

Notes on Machine Learning for and

Induction of Decision Trees

Statistical Learning. Philipp Koehn. 10 November 2015

Learning Classification Trees. Sargur Srihari

Dan Roth 461C, 3401 Walnut

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

Decision Trees. Lewis Fishgold. (Material in these slides adapted from Ray Mooney's slides on Decision Trees)

Decision Trees.

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Stephen Scott.

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

CS 6375 Machine Learning

Models, Data, Learning Problems

CHAPTER-17. Decision Tree Induction

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Symbolic methods in TC: Decision Trees

Decision Tree Learning Lecture 2

the tree till a class assignment is reached

Empirical Risk Minimization, Model Selection, and Model Assessment

Decision Trees.

From inductive inference to machine learning

Machine Learning 2nd Edi7on

Decision Tree Learning and Inductive Inference

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Notes on Logarithmic Lower Bounds in the Cell Probe Model

Decision Trees. Tirgul 5

Modern Information Retrieval

Incremental and adaptive learning for online monitoring of embedded software

Robotics 2 Data Association. Giorgio Grisetti, Cyrill Stachniss, Kai Arras, Wolfram Burgard

Learning Theory. Machine Learning CSE546 Carlos Guestrin University of Washington. November 25, Carlos Guestrin

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

Introduction to Machine Learning CMU-10701

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Supervised Learning. George Konidaris

Lecture Notes in Machine Learning Chapter 4: Version space learning

Classification Using Decision Trees

Decision trees COMS 4771

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Decision trees. Decision tree induction - Algorithm ID3

Decision Tree Learning

Machine Learning Alternatives to Manual Knowledge Acquisition

Lecture 1: September 25, A quick reminder about random variables and convexity

C4.5 - pruning decision trees

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Classification: Decision Trees

Decision Tree Learning

Information Theory & Decision Trees

} It is non-zero, and maximized given a uniform distribution } Thus, for any distribution possible, we have:

Machine Learning (CS 567) Lecture 2

CPSC 340: Machine Learning and Data Mining. Stochastic Gradient Fall 2017

Chapter 3: Decision Tree Learning

Machine Learning & Data Mining

On-line Support Vector Machine Regression

Ad Placement Strategies

Decision Trees: Overfitting

Lossless Online Bayesian Bagging

EECS 349:Machine Learning Bryan Pardo

Informal Definition: Telling things apart

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Lecture 9: Bayesian Learning

1 Approximate Quantiles and Summaries

Machine Learning 2010

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Lecture 4: Perceptrons and Multilayer Perceptrons

Statistical Machine Learning from Data

A Decision Stump. Decision Trees, cont. Boosting. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. October 1 st, 2007

Decision Tree Learning

Machine Learning (CS 419/519): M. Allen, 14 Sept. 18 made, in hopes that it will allow us to predict future decisions

Decision Tree Learning - ID3

Learning with multiple models. Boosting.

Decision Tree Learning

Day 3: Classification, logistic regression

Lecture 17: Trees and Merge Sort 10:00 AM, Oct 15, 2018

Algorithm-Independent Learning Issues

The Perceptron Algorithm, Margins

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Name (NetID): (1 Point)

Final Exam, Machine Learning, Spring 2009

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Generative v. Discriminative classifiers Intuition

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Decision Tree Learning

Transcription:

Incremental Learning and Concept Drift: Overview Incremental learning The algorithm ID5R Taxonomy of incremental learning Concept Drift Teil 5: Incremental Learning and Concept Drift (V. 1.0) 1 c G. Grieser

Motivation Batch -Scenario: all examples are available at once single run over all example just one hypothesis Stream -Scenario: always new examples arive series of hypothesis not a good idea to collect all data and build hypothesis from scratch better: reuse already built hypothesis and modify it Another point: In some situations, accuracy is not always the matter of concern sometimes, one wants to have a quick estimation / raw approximation very quickly one wants to keep track of the process of hypothesis creation Teil 5: Incremental Learning and Concept Drift (V. 1.0) 2 c G. Grieser

Desirable properties average cost of updating tree should be much lower than average cost of building new tree from scratch update cost independent of number of training examples tree should depend only on the set of examples, not their order Teil 5: Incremental Learning and Concept Drift (V. 1.0) 3 c G. Grieser

The algorithm ID5R Developed by Utgoff Basic ideas: keep counts on each node (that allow to compute entropies) when a new example arrives and the tree needs to be changed: re-structure the tree pull up instead of simply deleting subtrees leafs maintain lists of examples seen Assumption: binary classification problem Teil 5: Incremental Learning and Concept Drift (V. 1.0) 4 c G. Grieser

Decision Trees created by ID5R Extend definition of a tree: Leaf nodes: also contain a set of instance descriptions non-leaf nodes (decision nodes): also contain set of potential candidate tests, each with positive and negative counts for each possible outcome branches: also contain positive and negative counts for this outcome Example: classification is done as usual Teil 5: Incremental Learning and Concept Drift (V. 1.0) 5 c G. Grieser

ID5R: The algorithm start with empty tree n ID5R(n,instance): 1. If n is empty: create leaf set leaf class to class of of instance set of instances: singleton set consisting of instance 2. If n is a leaf and instance is of the same class: add instance to the set of instances 3. Otherwise: (a) If n is a leaf: expand it one level (choose test arbitrary) (b) for all candidate tests at node n: update counts (c) if the current test is not the best one among all candidate tests: i. Restructure tree n so that the best test is now at root ( Pull-Up ) ii. Recursively reestablish best test in each subtree (d) Recursively update tree t along the path instance goes. Grow the branch if necessary. Teil 5: Incremental Learning and Concept Drift (V. 1.0) 6 c G. Grieser

ID5R: An example First example: (-, height=short, hair=blond, eyes=brown) Hypothesis: Second example: (-, height=tall, hair=dark, eyes=brown) Hypothesis: Teil 5: Incremental Learning and Concept Drift (V. 1.0) 7 c G. Grieser

ID5R: An example (2) Third example: (+, height=tall, hair=blond, eyes=blue) (a) Expand tree one level, choosing arbitrary test (b) Update counts Note that height=... is held implicitly (c) if the current test is not the best one among all candidate tests: i. Restructure tree n so that the best test is now at root ( Pull-Up ) ii. Recursively reestablish best test in each subtree What means best here? How to restructure trees? Teil 5: Incremental Learning and Concept Drift (V. 1.0) 8 c G. Grieser

ID5R s measure ID3 uses measure information gain: Gain(S, A) Entropy(S) v values(a) S v S Entropy(S v) Entropy(S) S S log 2 S S S S log 2 S S S: trainings set, A: test S (S ): set of all positive (negative) examples in S S v subset of all examples in S for which test A has value v ID5R uses the same measure (but omits to compute Entropy(S) since it is the same for all tests in a node). Teil 5: Incremental Learning and Concept Drift (V. 1.0) 9 c G. Grieser

How to restructure trees? Pull-Up a test A 1. If the test A is already at the root, do nothing 2. Otherwise: (a) recursively pull up A to the root of each immediate subtree (if necessary, expand leafes) (b) transpose the tree so that A is now in the root and the old root attribute is now in the roots of all immediate subtrees. Teil 5: Incremental Learning and Concept Drift (V. 1.0) 10 c G. Grieser

ID5R: An example (3) Back to our example, the situation is: The best candidate test (due to Gain) is eyes. Pull up eyes. First: recursively pull up A to the root of each immediate subtree (if necessary, expand leafes) Then, transpose the tree: Teil 5: Incremental Learning and Concept Drift (V. 1.0) 11 c G. Grieser

ID5R: An example (4) Next, recursively reestablish best test in each subtree. Situation before: Currently, height is test in the subtree just by chance we have to descend into that subtree. Just by chance, height is the best test in the right subtree, so we get Note: The right subtree could be collapsed, but ID5R does not! Why? it is possible that the tree may be expanded again. Shown by experiments that the efford would not pay off. Teil 5: Incremental Learning and Concept Drift (V. 1.0) 12 c G. Grieser

ID5R: An example (5) 4th example: (-, height=tall, hair=dark, eyes=blue) eyes is still best attribute in root. expand left node, e.g. by test height. best test here is hair. restructure left subtree, this gives Teil 5: Incremental Learning and Concept Drift (V. 1.0) 13 c G. Grieser

ID5R: An example (6) 5th example: (-, height=short, hair=dark, eyes=blue) best test in root is now hair need to pull up hair and reestablish best tests in the subtrees result: Teil 5: Incremental Learning and Concept Drift (V. 1.0) 14 c G. Grieser

ID5R: An example (7) 6th example: (-, height=tall, hair=red, eyes=blue) best test in root is now still hair, but: new value: red grow new branch, result: Teil 5: Incremental Learning and Concept Drift (V. 1.0) 15 c G. Grieser

Teil 5: Incremental Learning and Concept Drift (V. 1.0) 16 c G. Grieser ID5R: An example (finished) the remaining examples finally lead to This is equivalent to

Properties of ID5R computes the same hypothesis as ID3! Complexities: d... number of attributes b... maximum number of possible values for tests n... number of training instances ID3 ID3 ID5R instance-count additions: O(n d 2 ) O(n 2 d 2 ) O(n d b d ) how often compute entropy? O(b d ) O(n b d ) O(n b d ) ID3: only one hypothesis is constructed! ID3 : for each new example, a new tree is computed. Teil 5: Incremental Learning and Concept Drift (V. 1.0) 17 c G. Grieser

Taxonomy of incremental algorithms full example memory Store all examples allows for efficient restructuring good accuracy huge storage needed Examples: ID5, ID5R, IDL, ITI, BOAT no example memory Only store statistical information in the nodes loss of accuracy (depending on the information stored or again huge storage needed) relatively low storage space Examples: ID4, MSC, G, TG partial example memory Only store selected examples trade of between storage space and accuracy Examples: HILLARY, FLORA, MetaL, AQ-PM, INC-DT Teil 5: Incremental Learning and Concept Drift (V. 1.0) 18 c G. Grieser

Partial Example Memory Learning Problem: How to select examples? Obvious approach: only keep the last n examples Windowing How large shall n be? First Variant: Let n be fixed. Second Variant: Adapt n depending on data situation. Other ideas, which examples to be stored: store only examples that have been classified incorrectly store only examples which are believed to be important: AQ-PM: store examples that are close to the seperating hyperplanes INC-DT: store examples where the classifier beliefs it has a weak competence Teil 5: Incremental Learning and Concept Drift (V. 1.0) 19 c G. Grieser

Concept Drift Usual assumption in ML: examples are drawn i.i.d. (identically and independently distributed) w.r.t. to a fixed distribution In real world situations, target concept as well as data distribution change over time example: Spam Filtering Concept Drift slow changes Concept Shift rapid changes Teil 5: Incremental Learning and Concept Drift (V. 1.0) 20 c G. Grieser

Approaches to cope with concept drift/shift Aging: weight data according to their age (and/or utility for the classification task) Windowing fixed size adaptive size in phases without or with little concept drift, enlarge window in phases with concept drift, decrease window size (drop oldest examples) How can we detect concept drift/shift? Performance measures e.g. accuracy properties of the classification model e.g. complexity of hypothesis properties of the data e.g. class distribution, attribute value distribution, current top attributes Teil 5: Incremental Learning and Concept Drift (V. 1.0) 21 c G. Grieser