Release from Active Learning / Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time

Size: px

Start display at page:

Download "Release from Active Learning / Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time"

Beryl Wilcox
5 years ago
Views:

1 IJCNN2002 May 12-17, 2002 Release from Active Learning / Model Selection Dilemma: Optimizing Sample Points and Models at the Same Time Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan Masashi Sugiyama Hidemitsu Ogawa

2 Supervised Learning: 2 Function Approximation y 1 y 2 x1 x2 L L x M y M f (x) :Learning target f ˆ( x ) :Learned result { } M : Samples x y m, m m= 1 y = f ( ) + ε m x m m { } M m ˆ x From x m, y m = 1, find f ( ) so that it is as close to f (x) as possible

3 Active Learning 3 Target function Learned result x1 x2 x1 x2 Location of sample points AFFECTS heavily { } M m Determine x m = 1 for optimal generalization min J { xm} G J G : Generalization error

4 Model Selection 4 Target function Learned result Too simple Appropriate Too complex Choice of models AFFECTS heavily (Model refers to, e.g., order of polynomials) Select a model S for optimal generalization min J S C G C : Set of model candidates J G : Generalization error

5 Simultaneous Optimization of 5 Sample Points and Models So far, active learning and model selection have been studied thoroughly, but INDEPENDENTLY Simultaneously determine sample points { x } M m m 1 and a model for optimal generalization min JG { }, x m S C = S C : Set of model candidates J G : Generalization error

6 Active Learning / Model Selection 6 Dilemma We can NOT directly optimize sample points and models simultaneously by simply combining existing active learning and model selection methods Because Model should be fixed for active learning Sample points should be fixed for model selection

7 How to Dissolve the Dilemma 7 Model candidates : C = { S1, S2, S3} A set of sample points arg min J G for S1 arg min for S { x m } { x m } J G 2 { x m } (OPT) { x m } = arg min J { x for m } all G S C arg min { x m } J G for S 3 { } M m ( OPT ) 1. Find sample points x m that are = 1 commonly optimal for all models 2. Just perform model selection as usual

8 Is It Just Idealistic? 8 No! Commonly optimal sample points surely exist for trigonometric polynomial models Trigonometric polynomial model fˆ ( x) From here on, we assume Least mean squares (LMS) estimate Generalization measure: n ( θ + ) 2 p sin px θ2 p+ 1 px = θ1 + cos p= 1 J G = E fˆ( x) f ( x) dx π π of order E :Expectation over n noise

9 Theorem 9 For all trigonometric polynomial models that include learning target function, equidistance sampling gives the optimal generalization capability 1-dimensional input x1 x2 x3 L x M π 2π M π M :Number of samples

10 10 Multi-Dimensional Input Cases 2-dimensional input L x 15 x1 x2 L Sampling on regular grid is optimal

11 Computer Simulations 11 (Artificial, Realizable) Learning target function: f S 50 S n : Trigonometric polynomial model of order n Model candidates: C Generalization measure: Sampling schemes: Equidistance sampling Random sampling = { S0, S1, S2, K, S100} J G = fˆ( x) f ( x) dx π 1 2π π

12 12 Simulation Results (Large Samples) Number of samples = 500 Noise variance = 0.02 Noise variance = Horizontal: Vertical: Order of models Generalization error Averaged over 100 trials Equidistance sampling outperforms random sampling for all models!

13 13 Simulation Results (Small Samples) Number of samples = 230 Noise variance = 0.02 Noise variance = Horizontal: Order of models Vertical: Generalization error Averaged over 100 trials With small samples, equidistance sampling performs excellently for all models!

14 Computer Simulations 14 (Unrealizable) Interpolate 600 chaotic series (red) from noisy samples (blue) Model candidates: C = { S0, S1, S2, K, S40} S n : Trigonometric polynomial model of order n

15 Simulation Results 15 (Unrealizable) ( M, σ 2 ) = (300, 0.04) ( M, σ 2 ) = (100, 0.07) Horizontal: Vertical: Order of models Averaged over 100 trials Test error at all 600 points Equidistance sampling outperforms random sampling for all models!

16 16 Interpolated Chaotic Series After model selection with equidistance sampling, Selected model : S 13

17 Compared with True Series Compared with True Series We obtained good estimates from sparse data! 17

18 Conclusions 18 Active learning / model selection dilemma: Sample points and models can not be simultaneously optimized by simply combining existing active learning and model selection methods How to dissolve the dilemma: Find commonly optimal sample points for all models Is it realistic? Commonly optimal sample points surely exist for trigonometric polynomial models: equidistance sampling Is it practical? Computer simulations showed that the proposed method works excellently even in unrealizable cases

Designing Kernel Functions Using the Karhunen-Loève Expansion

July 7, 2004. Designing Kernel Functions Using the Karhunen-Loève Expansion 2 1 Fraunhofer FIRST, Germany Tokyo Institute of Technology, Japan 1,2 2 Masashi Sugiyama and Hidemitsu Ogawa Learning with Kernels