CSCE 478/878 Lecture 2: Concept Learning and the General-to-Specific Ordering

Outline Learning from eamples CSCE 78/878 Lecture : Concept Learning and te General-to-Specific Ordering Stepen D. Scott (Adapted from Tom Mitcell s slides) General-to-specific ordering over ypoteses Version spaces and candidate elimination algoritm Picking new eamples (making queries) Te need for inductive bias A Concept Learning Task: EnjoySport Sky Temp Humid Wind Water Forecst EnjoySpt Sunny Warm Normal Strong Warm Same Yes Sunny Warm Hig Strong Warm Same Yes Rainy Cold Hig Strong Warm Cange No Sunny Warm Hig Strong Cool Cange Yes Goal: Output a ypotesis to predict labels of future eamples. Note: simple approac assuming no noise, illustrates key concepts How to Represent te Hypotesis? Many possible representations Here, will be conjunction of constraints on attributes Eac constraint can be E.g. a specfic value (e.g. Water = Warm) don t care (i.e. Water =? ) no value allowed (i.e. Water= ) Sky AirTemp Humid Wind Water Forecst Sunny?? Strong? Same (i.e. If Sky == Sunny and Wind == Strong and Forecast == Same ten predict Yes else predict No. ) Given: Prototypical Concept Learning Task Instance Space X, e.g. Possible days, eac described by te attributes Sky, AirTemp, Humidity, Wind, Water, Forecast [all possible values listed in Table., p. ] Hypotesis Class H, e.g. conjunctions of literals, suc as?,cold,hig,?,?,? Training Eamples D: Positive and negative eamples of te target function c,c( ),... m,c( m), were i X and c : X {0, }, e.g. c = EnjoySport Determine: A ypotesis H suc tat () = c() for all X Prototypical Concept Learning Task Typically X is eponentially or infinitely large, so in general we can never be sure tat () =c() for all X (can do tis in special restricted, teoretical cases) Instead, settle for a good approimation, e.g. () =c() D Te inductive learning ypotesis: Any ypotesis found to approimate te target function well over a sufficiently large set of training eamples D will also approimate te target function well over oter unobserved eamples. Will study tis more quantitatively later 5 6

Te More-General-Tan Relation Instances X Hypoteses H Find-S Algoritm (Find Maimally Specific Hypotesis) Specific General. Initialize to,,,,,, te most specific ypotesis in H Hypotesis Space Searc by Find-S Instances X Hypoteses H. For eac positive training instance - 0 Specific = <Sunny, Warm, Hig, Strong, Cool, Same> = <Sunny, Warm, Hig, Ligt, Warm, Same> = <Sunny,?,?, Strong,?,?> = <Sunny,?,?,?,?,?> = <Sunny,?,?,?, Cool,?> For eac attribute constraint a i in If te constraint a i in is satisfied by, ten do noting + + +, General j g k iff ( k () =) ( j () =) X g, g, g, g So g induces a partial order on yps from H Else replace a i in by te net more general constraint tat is satisfied by. Output ypotesis = <Sunny Warm Normal Strong Warm Same>, + = <Sunny Warm Hig Strong Warm Same>, + = <Rainy Cold Hig Strong Warm Cange>, - = <Sunny Warm Hig Strong Cool Cange>, + = <!,!,!,!,!,!> 0 = <Sunny Warm Normal Strong Warm Same> = <Sunny Warm? Strong Warm Same> = <Sunny Warm? Strong Warm Same> = <Sunny Warm? Strong?? > Can define > g similarly Wy can we ignore negative eamples? 7 8 9 Complaints about Find-S Assuming tere eists some function in H consistent wit D, Find-S will find one Version Spaces Te List-Ten-Eliminate Algoritm But Find-S cannot detect if tere are oter consistent ypoteses, or ow many tere are. In oter words, if c H, as Find-S found it? Is a maimally specific ypotesis really te best one? Depending on H, tere migt be several maimally specific yps, and Find-S doesn t backtrack Not robust against errors or noise, ignores negative eamples A ypotesis is consistent wit a set of training eamples D of target concept c if and only if () = c() for eac training eample, c() in D Consistent(, D) (, c() D) () =c() Te version space, VS H,D, wit respect to ypotesis space H and training eamples D, is te subset of ypoteses from H consistent wit all training eamples in D VS H,D { H : Consistent(, D)}. VersionSpace a list containing every ypotesis in H. For eac training eample,, c() Remove from VersionSpaceany ypotesis for wic () c(). Output te list of ypoteses in VersionSpace Problem: Requires Ω ( H ) time to enumerate all yps. Can address many of tese concerns by tracking te entire set of consistent yps. 0

Candidate Elimination Algoritm G set of maimally general ypoteses in H Representing Version Spaces S set of maimally specific ypoteses in H Te General boundary, G, of version space VS H,D is te set of its maimally general members Te Specific boundary, S, of version space VS H,D is te set of its maimally specific members Every member of te version space lies between tese boundaries VS H,D = { H :( s S)( g G)(g g g s)} S: Eample Version Space { <Sunny, Warm,?, Strong,?,?> } G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } For eac training eample d D, do If d is a positive eample Remove from G any yp. inconsistent wit d For eac ypotesis s S tat is not consistent wit d Remove s from S Add to S all minimal generalizations of s suc tat. is consistent wit d, and. some member of G is more general tan Remove from S any ypotesis tat is more general tan anoter ypotesis in S 5 Candidate Elimination Algoritm S 0 : Eample Trace {<Ø, Ø, Ø, Ø, Ø, Ø>} Eample Trace S 0 : { <!,!,!,!,!,! > } If d is a negative eample Remove from S any yp. inconsistent wit d S : { <Sunny, Warm, Normal, Strong, Warm, Same> } For eac ypotesis g G tat is not consistent wit d Remove g from G S : Add to G all minimal specializations of g suc tat. is consistent wit d, and G 0, G, G : { <?,?,?,?,?,?>}. some member of S is more specific tan Remove from G any ypotesis tat is less general tan anoter ypotesis in G G 0 : {<?,?,?,?,?,?>} Training eamples:. <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes. <Sunny, Warm, Hig, Strong, Warm, Same>, Enjoy Sport = Yes 6 7 8

Eample Trace S, S : Eample Trace S : Final Version Space G : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?> <?,?,?,?,?, Same> } S : { <Sunny, Warm,?, Strong,?,?>} S : { <Sunny, Warm,?, Strong,?,?>} G : { <?,?,?,?,?,?> } G : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?>} Training Eample: G : { <Sunny,?,?,?,?,?> <?, Warm,?,?,?,?> <?,?,?,?,?, Same> } G : { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?>}. <Rainy, Cold, Hig, Strong, Warm, Cange>, EnjoySport=No Training Eample:.<Sunny, Warm, Hig, Strong, Cool, Cange>, EnjoySport = Yes Wy is G only? E.g. wy?,?,normal,?,?,? G 9 0 Aside: Asking Queries Generalizing Beyond Training Data An UNBiased Learner S : { <Sunny, Warm,?, Strong,?,?>} S: { <Sunny, Warm,?, Strong,?,?> } Wat if we assumed noting about te structure of c? G : { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?>} G: { <Sunny,?,?,?,?,?>, <?, Warm,?,?,?,?> } Ten learning becomes rote memorization, e.g. if c is any boolean function over variables wit D = { (000), +, (0), +, (00),, (0), }, ten version space is defined by S = {(000) (0)} and G = { ((0) (00))} Wat if te learner can ask queries, i.e. present an eample and ave a teacer (oracle) give classification? [Like running eperiments] Wy is Sunny,Warm,Normal, Ligt, Warm, Same a good query to make? In general, wat is a good strategy? Sunny Warm Normal Strong Cool Cange () [Unanimous yes over version space] Rainy Cool Normal Ligt W arm Same () [Unanimous no over version space] Sunny Warm Normal Ligt Warm Same () [/ no, / yes] Wy believe we can accurately classify () and ()? Wy not ()? Originally VS = X = power set of X; now it is te set of trut tables satisfying te following: 000 + 00 00 0 + 00 0 0 Since tere are oles, VS = =6=number of ways to fill oles, and for any yet unclassified eample, eactly alf of yps in VSclassify as + and alf as Tus, cannot generalize witout bias!

Inductive Bias Consider concept learning algoritm L Inductive Systems and Equivalent Deductive Systems Tree Learners wit Different Biases instances X, target concept c training eamples D c = {, c() } let L( i,d c) denote classification assigned to instance i by L after training on data D c Definition: Training eamples New instance Training eamples New instance Inductive system Candidate Elimination Algoritm Using Hypotesis Space H Equivalent deductive system Teorem Prover Classification of new instance, or "don t know" Classification of new instance, or "don t know". Rote learner: Store eamples, Classify iff it matces previously observed eample. Version space candidate elimination algoritm. Find-S Te inductive bias of L is any minimal set of assertions B suc tat for any target concept c and corresponding training eamples D c Assertion " H contains te target concept" Inductive bias made eplicit Generally, stronger bias ability to generalize on more eamples from X, but correctness of learner depends on correctness of bias! ( i X)[(B D c i ) L( i,d c)] were y z means y logically entails z 5 6 7