Applied inductive learning - Lecture 4

Size: px

Start display at page:

Download "Applied inductive learning - Lecture 4"

Edwin Brooks
5 years ago
Views:

1 Applied inductive learning - Lecture 4 Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège Montefiore - Liège - October 10, 2005 Find slides: lwh/aia/ Louis Wehenkel AIA... (1/12)

2 Batch-mode Supervised Learning Louis Wehenkel AIA... (2/12)

3 Batch-mode Supervised Learning (Notations) Objects (or observations): LS = {o 1,...,o N } Attribute vector: a i =(a 1 (o i ),...,a n (o i )) T, Outputs: y i = y(o i )orc i = c(o i ), LS Table i =1,...,N. i =1,...,N. o a 1 (o) a 2 (o)... a n (o) y(o) 1 a 1 1 a a 1 n y 1 2 a 2 1 a a 2 n y N a1 N a2 N... an N y N Louis Wehenkel AIA... (3/12)

4 Nearest neighbor methods Intuition: similar objects shouldhave similar output values. NB: all inputs are numerical scalars Define distance measure in the input space: d a (o, o )=(a(o) a(o )) T (a(o) a(o )) = n (a i (o) a i (o )) 2 i=1 Nearest neighbor: NN a (o, LS) =arg min o LS d a(o, o ) Extrapolate output from nearest neighbor: ŷ NN (o) =y(nn a (o, LS)) Louis Wehenkel AIA... (4/12)

5 Pu (MW) Qu (Mvars) + Nearest neighbor (state 2276) Pu=1090MW Zoom around state 4984 Qu=-20Mvar 3000 learning states Louis Wehenkel AIA... (5/12)

6 Computational Training: storage of the LS (n N) Testing: N distance computations N n computations Accuracy Asymptotically (N ): suboptimal (except if problem is deterministic) Strong dependence on choice of attributes weighting of attributes d w a (o, o )= n w i (a i (o) a i (o )) 2 i=1 or attribute selection... Louis Wehenkel AIA... (6/12)

7 1. The k-nn method: Insteadof using only the nearest neighbor, one uses the k (a number to be determined) nearest neighbors: knn a (o, LS) =First(k, Sort(LS, d a (o, ))) Extrapolate from k nearest neighbors, e.g. for regression ŷ knn (o) =k 1 y(o ) andmajority class for classification. o knn a(o,ls) k allows to control overfitting (like pruning of trees). Asymptotically (N ): k(n) and k(n) N 0 optimal method(minimum error) Louis Wehenkel AIA... (7/12)

8 2. Condensing and editing of the LS: Condensing: remove useless objects LS Editing: remove outliers from LS Apply first editing then condensing (see notes) 3. Automatic tuning of the weight vector w Parzen windows and/or kernel methods: ŷ K (o) = o LS y(o )K(o, o ) where K(o, o ) is a measure of similarity Louis Wehenkel AIA... (8/12)

9 Nearest neighbor, editing and condensing Initial LS Edited LS Condensed LS Louis Wehenkel AIA... (9/12)

10 Kernel defined by a regression tree: Let L i, i =1,..., T denote the leaves of T. Let N i denote the number of objects in the sub-ls of L i. Let K T (o, o ) be equal to N 1 i if o and o reach same leaf L i, and0 otherwise. Then the approximation of the regression tree may be written as ŷ T (o) = y(o )K T (o, o ). o LS Louis Wehenkel AIA... (10/12)

11 Scalar product representation of tree kernels Kernel defined by a regression tree: Let L i, i =1,..., T denote the leaves of T. Let N i denote the number of objects in the sub-ls of L i. For each leaf, define a function attribute a Li (o) by a Li (o) =N 1/2 i if o reaches L i, andzero otherwise. Let a T (o) =(a L1 (o),...,a L T (o)) T Then we have that K T (o, o )=a T T (o)a T (o ) and ŷ T (o) = o LS y(o )a T T (o)a T (o ). Louis Wehenkel AIA... (11/12)

12 Let us consider a two-class classification problem, and define y(o) =1ifc(o) =c 1 and y(o) = 1 ifc(o) =c 2. Let us construct a simple classifier: Center of class 1: c + = N+ 1 o LS + a(o ) Center of class 2: c = N 1 o LS a(o ) Classifier: ŷ(o) =1ifd(c +, a(o)) < d(c, a(o)). Define c = c++c 2 and c = c + c With these notations we have ŷ(o) =sgn((a(o) c) T c) In other words: ŷ(o) =sgn N+ 1 where b = 1 2 ( c 2 c + 2 ) a T (o )a(o) N 1 a T (o )a(o)+b o LS + o LS Louis Wehenkel AIA... (12/12)

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest