The Dynamics of Learning Vector Quantization

Similar documents
Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

Clustering Methods without Given Number of Clusters

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

Connectivity in large mobile ad-hoc networks

Factor Analysis with Poisson Output

Simple Observer Based Synchronization of Lorenz System with Parametric Uncertainty

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Multi-dimensional Fuzzy Euler Approximation

Alternate Dispersion Measures in Replicated Factorial Experiments

Optimal Coordination of Samples in Business Surveys

One Class of Splitting Iterative Schemes

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

The continuous time random walk (CTRW) was introduced by Montroll and Weiss 1.

THE IDENTIFICATION OF THE OPERATING REGIMES OF THE CONTROLLERS BY THE HELP OF THE PHASE TRAJECTORY

Learning Multiplicative Interactions

MATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.:

CHEAP CONTROL PERFORMANCE LIMITATIONS OF INPUT CONSTRAINED LINEAR SYSTEMS

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

Control of Delayed Integrating Processes Using Two Feedback Controllers R MS Approach

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

SOLVING THE KONDO PROBLEM FOR COMPLEX MESOSCOPIC SYSTEMS

Chapter Landscape of an Optimization Problem. Local Search. Coping With NP-Hardness. Gradient Descent: Vertex Cover

Network based Sensor Localization in Multi-Media Application of Precision Agriculture Part 2: Time of Arrival

Massless fermions living in a non-abelian QCD vortex based on arxiv: [hep-ph]

Determination of the local contrast of interference fringe patterns using continuous wavelet transform

Comparing Means: t-tests for Two Independent Samples

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

INTEGRATION OF A PHENOMENOLOGICAL RADAR SENSOR MODELL IN IPG CARMAKER FOR SIMULATION OF ACC AND AEB SYSTEMS

DYNAMIC MODELS FOR CONTROLLER DESIGN

Online supplementary information

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model

Center Manifolds Optimal Regularity for Nonuniformly Hyperbolic Dynamics 1

Mathematical modeling of control systems. Laith Batarseh. Mathematical modeling of control systems

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

Some Sets of GCF ϵ Expansions Whose Parameter ϵ Fetch the Marginal Value

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Secretary problems with competing employers

arxiv: v1 [cs.sy] 9 Aug 2017

SUPPLEMENTARY INFORMATION

Linear Quadratic Stochastic Differential Games under Asymmetric Value of Information

Control Systems Engineering ( Chapter 7. Steady-State Errors ) Prof. Kwang-Chun Ho Tel: Fax:

arxiv:hep-ex/ v1 4 Jun 2001

Integration of RTO with MPC through the gradient of a convex function

EFFECT ON PERSISTENCE OF INTRA-SPECIFIC COMPETITION IN COMPETITION MODELS

Stability. ME 344/144L Prof. R.G. Longoria Dynamic Systems and Controls/Lab. Department of Mechanical Engineering The University of Texas at Austin

Massachusetts Institute of Technology Dynamics and Control II

QUENCHED LARGE DEVIATION FOR SUPER-BROWNIAN MOTION WITH RANDOM IMMIGRATION

Reinforcement Learning

Quark-Gluon Plasma in Proton-Proton Scattering at the LHC?

NONLINEAR CONTROLLER DESIGN FOR A SHELL AND TUBE HEAT EXCHANGER AN EXPERIMENTATION APPROACH

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

STOCHASTIC DIFFERENTIAL GAMES:THE LINEAR QUADRATIC ZERO SUM CASE

Codes Correcting Two Deletions

EXTENDED STABILITY MARGINS ON CONTROLLER DESIGN FOR NONLINEAR INPUT DELAY SYSTEMS. Otto J. Roesch, Hubert Roth, Asif Iqbal

Multiple Sequence Alignment. Progressive Alignment Iterative Pairwise Guide Tree ClustalW Co-linearity Multiple Sequence Alignment Editors

Asymptotic Values and Expansions for the Correlation Between Different Measures of Spread. Anirban DasGupta. Purdue University, West Lafayette, IN

PARAMETRIC ESTIMATION OF HAZARD FUNCTIONS WITH STOCHASTIC COVARIATE PROCESSES

A generalized mathematical framework for stochastic simulation and forecast of hydrologic time series

Overflow from last lecture: Ewald construction and Brillouin zones Structure factor

THE EXPERIMENTAL PERFORMANCE OF A NONLINEAR DYNAMIC VIBRATION ABSORBER

arxiv:cond-mat/ v1 [cond-mat.soft] 5 Nov 2002

c 2017 SIAM. Published by SIAM under the terms

CISE302: Linear Control Systems

Streaming Calculations using the Point-Kernel Code RANKERN

ANALYSIS OF DECISION BOUNDARIES IN LINEARLY COMBINED NEURAL CLASSIFIERS

Beta Burr XII OR Five Parameter Beta Lomax Distribution: Remarks and Characterizations

Departure Time and Route Choices with Bottleneck Congestion: User Equilibrium under Risk and Ambiguity

THE HAUSDORFF MEASURE OF SIERPINSKI CARPETS BASING ON REGULAR PENTAGON

Lecture 10 Filtering: Applied Concepts

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

The Use of MDL to Select among Computational Models of Cognition

CONGESTION control is a key functionality in modern

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Symmetry Lecture 9. 1 Gellmann-Nishijima relation

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Transient Vibration Signal Analysis for Bedload Transport Monitoring Systems

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

arxiv: v2 [nucl-th] 3 May 2018

Non-stationary phase of the MALA algorithm

arxiv: v1 [math.ds] 29 Dec 2015

DISCRETE ROUGH PATHS AND LIMIT THEOREMS

CHAPTER 10 CHEMICAL BONDING II: MOLECULAR GEOMETRY AND HYBRIDIZATION OF ATOMIC ORBITALS

Efficient Methods of Doppler Processing for Coexisting Land and Weather Clutter

Design spacecraft external surfaces to ensure 95 percent probability of no mission-critical failures from particle impact.

STUDY OF THE INFLUENCE OF CONVECTIVE EFFECTS IN INCIDENT RADIATIVE HEAT FLUX DENSITY MEASUREMENT UNCERTAINTY

Supplementary Figures

Multivariate class labeling in Robust Soft LVQ

Extending MFM Function Ontology for Representing Separation and Conversion in Process Plant

Advanced D-Partitioning Analysis and its Comparison with the Kharitonov s Theorem Assessment

MRAC + H Fault Tolerant Control for Linear Parameter Varying Systems

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

Fair Game Review. Chapter 7 A B C D E Name Date. Complete the number sentence with <, >, or =

RNA Secondary Structure Prediction by MFT Neural Networks

Numerical algorithm for the analysis of linear and nonlinear microstructure fibres

Jul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer

Available online Journal of Scientific and Engineering Research, 2018, 5(6):1-9. Research Article

Research Article Least-Mean-Square Receding Horizon Estimation

AMS 212B Perturbation Methods Lecture 20 Part 1 Copyright by Hongyun Wang, UCSC. is the kinematic viscosity and ˆp = p ρ 0

Transcription:

The Dynamic of Learning Vector Quantization Barbara Hammer TU Clauthal-Zellerfeld Intitute of Computing Science Michael Biehl, Anarta Ghoh Rijkuniveriteit Groningen Mathematic and Computing Science

Introduction prototype-baed learning from example data: repreentation, claification Vector Quantization (VQ) Learning Vector Quantization (LVQ) The dynamic of learning a model ituation: randomized data learning algorithm for VQ und LVQ analyi and comparion: dynamic, ucce of learning Summary Outlook

Vector Quantization (VQ) aim: repreentation of large amount of data by (few) prototype vector example: identification and grouping in cluter of imilar data aignment of feature vector ξ to the cloet prototype w (imilarity or ditance meaure, e.g. Euclidean ditance )

unupervied competitive learning initialize K prototype vector preent a ingle example identify the cloet prototype, i.e the o-called winner move the winner even cloer toward the example intuitively clear, plauible procedure - place prototype in area with high denity of data - identifie the mot relevant combination of feature - (tochatic) on-line gradient decent with repect to the cot function...

quantization error H VQ K P K = j j= 1 = 1 k j ( ) 2 ( ) ξ w Θ d d k j here: Euclidean ditance prototype data d w j j i the winner! aim: faithful repreentation (in general: clutering ) Reult depend on - the number of prototype vector - the ditance meaure / metric ued

Learning Vector Quantization (LVQ) aim: claification of data learning from example example itutation: 3 clae, 3 prototype claification: aignment of a vector ξ to the cla of the cloet prototype w Learning: choice of prototype according to example data aim : generalization ability, i.e. correct claification of novel data after training

motly: heuritically motivated variation of competitive learning prominent example [Kohonen]: LVQ 2.1. initialize prototype vector (for different clae) preent a ingle example identify the cloet correct and the cloet wrong prototype move the correponding winner toward / away from the example known convergence / tability problem, e.g. for infrequent clae

LVQ algorithm... - appear plauible, intuitive, flexible - are fat, eay to implement - are frequently applied in a variety of problem involving the claification of tructured data, a few example: - real time peech recognition - medical diagnoi, e.g. from hitological data - gene expreion data analyi - texture recognition and claification -...

illutration: microcopic image of (pig) emen cell after freezing and torage, c/o Lidia Sanchez-Gonzalez, Leon/Spain

illutration: microcopic image of (pig) emen cell after freezing and torage, c/o Lidia Sanchez-Gonzalez, Leon/Spain healthy cell damaged cell prototype obtained by LVQ (1)

LVQ algorithm... - are often baed on purely heuritic argument, or derived from a cot function with unclear relation to the generalization ability - almot excluively ue the Euclidean ditance meaure, inappropriate for heterogeneou data - lack, in general, a thorough theoretical undertanding of dynamic, convergence propertie, performance w.r.t. generalization, etc.

In the following: analyi of LVQ algorithm w.r.t. - dynamic of the learning proce - performance, i.e. generalization ability - aymptotic behavior in the limit of many example typical behavior in a model ituation - randomized, high-dimenional data - eential feature of LVQ learning aim: - contribute to the theoretical undertanding - develop efficient LVQ cheme - tet in application

model ituation: two cluter of N-dimenional data random vector ξ R N according to P( ξ) = p P( ξ σ) mixture of two Gauian: σ=± 1 ( 2π) σ ( ) ξ -l 1 1 P( ξ σ) exp Β N/2 2 2 = σ orthonormal center vector: B +, B - R N, ( B σ ) 2 =1, B + B - =0 (p - ) prior weight of clae p +, p - p + + p - = 1 eparation l B - l B + independent component: ξ = l j B σ σ, j (p + ) ξ 2 j σ 2 j σ = 1 ξ ξ 2 = N j= 1 ξ 2 j = N + l 2

high-dimenional data (formally: N ) 400 example ξ R N, N=200, l=1, p + =0.6 projection into the plane of center vector B +, B - projection in two independent random direction w 1,2 (240) (160) (240) (160) y = B ξ x =w ξ 2 2 y + = B + ξ x1 =w ξ 1 Note: model for tudying typical behavior of LVQ algorithm, not: denity-etimation baed claification

dynamic of on-line training equence of independent random data ( = 1,2,3,... ) ( ) ξ acc. to P ξ update of prototype vector: w -1 η = w + f -... N [ ] ( -1 ) d, d, S,σ, ξ w S,σ= ± 1 d = 1 ( ξ w ) 2 learning rate, tep ize competition, direction of update etc. change of prototype toward or away from the current data above example: Vector Quantization [ ] ( ) f... Θ d d unupervied Vector Quantization = The Winner Take It All (clae irrelevant/unknown) + 1 ( correct cla) f 1 ( wrong cla ) Learning Vector Quantization 2.1. [ ] {... = S σ = here: two prototype, no explicit competition

mathematical analyi of the learning dynamic 1. decription in term of a few characteritic quantititie R σ = w B projection in the (B +, B - )-plane σ Q t = w w length and relative poition of prototype t (,t,σ { 1+, 1} ) ( here: R 2N R 7 ) w [ ] ( -1) d d, S,σ, ξ w -1 η = w + f... N, - recurion R Q σ t R 1/N Q 1/N -1 σ -1 t = = η f η f [ ] ( -1)... y R σ [ ] ( -1) [ ] ( -1) 2... x Q + η f... x Q + η f [...] f [...] + Ο( 1 ) t σ t t t t Ν random vector ξ enter only in the form of projection x -1 = w ξ yτ = Bτ ξ ditance ( ) 2 ( ) ξ w 1 = ξ 2 2x Q -1 d = +

2. average over the current example random vector acc. to P( ξ σ) in the thermodynamic limit N correlated Gauian random quantitie x y τ = = w B -1 τ ξ ξ completely pecified in term of firt and econd moment (w/o indice ) x σ N w,j ξj j= 1 = = l w,jbσ,j = σ N j= 1 l R σ y τ σ = l δ σ = 0 l if S= σ ele y ρ y τ σ - y ρ σ y τ σ = δ ρτ x xt - x σ x σ t = Q σ t x yτ - x σ y σ τ = R σ τ averaged recurion L = pσ L σ cloed in { R σ, Q t } σ=± 1

3. elf-averaging propertie characteritic quantitie Qt, R σ - depend on the random equence of example data - their variance vanihe with N (here: N -1 ) learning dynamic i completely decribed in term of average 4. continuou learning time α= N # of example # of learning tep per degree of freedom recurion coupled, ordinary differential equation evolution of projection Q t ( α ), R σ ( α )

5. learning curve probability for miclaification of a novel example ε g ( d+ d ) + p Θ( ) + d = p+ Θ d+ L = p + Φ 2 ( R R ) Q Q 2 ( R R ) 1 Q Q 2l l ++ ++ + ++ + Q 2Q + Q 2 Q 2Q + Q ++ + ++ + 1 + p Φ generalization error ε g (α) after training with α N example invetigation and comparion of given algorithm - repulive/attractive fixed point of the dynamic - aymptotic behavior for α - dependence on learning rate, eparation, initialization -... optimization and development of new precription - time-dependent learning rate η(α) maximize - variational optimization w.r.t. f [...] -... d ε g d α

optimal claification with minimal generalization error in the model ituation (equal variance of cluter): eparation of clae by the plane with p P( ξ σ= 1) = p+ P( ξ σ= + 1) (p + ) l B + B - (p - >p + ) exce error 0.50 ε g l=0 minimal ε g a a function of prior weight l=2 0.25 l=1 0 0 0.5 p + 1.0

LVQ 2.1. update the correct and wrong winner [Seo, Obermeyer]: LVQ2.1. ս cot function (likelihood ratio) w = w -1 + η N σ S ( -1) ξ w p σ = (1+m σ ) / 2 (m>0) (analytical) integration for w (0) = 0 R R ++ = = l m l m 1+ m 2 1 m 2 η m α l 1 m ηm α ( 1 e ) R = ( 1 e ) + m η m α l 1+ m + η m α ( 1 e + ) R = ( 1 e ) K K + m 2 2 Q ++ = K theory and imulation (N=100) p + =0.8, l=1, η=0.5 average over 100 independent run R ++, R +, Q ++ remain finite 6 0 Q R Q + + R + + R + R, R +, Q, Q + with α - 6 Q + R + 2 4 α 6 8 10

problem: intability of the algorithm due to repulion of wrong prototype (p + > p - ) trivial claification für α : ε g = max { p +,p - } (p - ) trategie: - election of data in a window cloe to the current deciion boundary low down the repulion, ytem remain intable - Soft Robut Learning Vector Quantization [Seo & Obermayer] denity-etimation baed cot function limiting cae Learning from mitake: LVQ2.1-tep only, if the example i currently miclaified low learning, poor generalization

The winner take it all [ ] ( ) LVQ 1 [Kohonen] -1 η -1 w w + Θd d σ S ξ w I) LVQ 1 = S only the winner i updated according to the cla memberhip N S winner w ±1 numerical integration for w (0)=0 R Q ++ ++ R -- R S+ w + l B + w - Q +- R -+ Q -- R -- w - l B - α theory and imulation (N=200) p + =0.2, l=1.2, η=1.2 averaged over 100 indep. run R S- trajectorie in the (B +,B - )-plane ( ) α=20,40,...140... optimal deciion boundary aymptotic poition

learning curve (p+=0.2, l=1.2) ε0.26 g ε g 0.22 η=1.2 η - role of the learning rate - tationary tate: ε g (α ) grow lin. with η 0.18 0.4 0.2 η 0 - variable rate η(α)!? 0.14 0 2.0 α 100 200 300 α 0.26 - well-defined aymptotic: ε g 0.22 η 0 η 0, α ( η α ) 0.18 (ODE linear in η) 0.14 0 10 uboptimal min. ε 20 30 g 40 50 (η α)

The winner take it all II ) LVQ+ ( only poitive tep without repulion) w η N -1 = w + Θ S [ ] ( ) ( -1) d d δ σ, S ξ w S winner correct (w updated only from cla S) w + l B + α aymptotic configuration ymmetric about l (B + +B - )/2 l B - w - p+=0.2, l=1.2, η=1.2 claification cheme and the achieved generalization error are independent of the prior weight p ± (and optimal for p ± = 1/2 ) LVQ+ VQ within the clae

ε g p+=0.2, l=1.0, η=1.0 learning curve LVQ+ LVQ1 α ε g min {p +,p - } aymptotic: η 0, (ηα) - LVQ 2.1. trivial aignment to the more frequent cla optimal claification p + - LVQ 1 here: cloe to optimal claification - LVQ+ min-max olution p ± -independent claification

Vector Quantization [ ] ( ) competitive learning -1 η -1 w = w + Θ d d ξ w N S S w winner numerical integration for w (0) 0 ( p + =0.2, l=1.0, η=1.2 ) cla memberhip i unknown or identical for all data ε g 1.0 LVQ+ VQ R -- R ++ R +- ytem i invariant under exchange of the prototype weakly repulive fixed point 0 0 α α LVQ1 R -+ α 100 200 300

interpretation: - VQ, unupervied learning unlabelled data - LVQ, two prototype of the ame cla, identical label - LVQ, different clae, but label are not ued in training ε g aymptotic (α,η 0, ηα ) p + 0 p + p - 1 - low quantization error - high gen. error ε g

Summary prototype prototype-baed learning Vector Quantization and Learning Vector Quantization a model cenario: two cluter, two prototype dynamic of online training comparion of algorithm: LVQ 2.1.: intability, trivial (tationary) claification LVQ 1 : cloe to optimal aymptotic generalization LVQ + : min-max olution w.r.t. aymptotic generalization VQ : ymmetry breaking, repreentation work in progre, outlook regularization of LVQ 2.1, Robut Soft LVQ [Seo, Obermayer] model: different cluter variance, more cluter/prototype optimized procedure: learning rate chedule, variational approach / denity etimation / Baye optimal on-line everal clae and prototype

Perpective Generalized Relevance LVQ [Hammer & Villmann] N λ ( i= 1 ( ) adaptive metric, e.g. ditance meaure i d w,ξ) = λi ξ i w Self Self-Organizing Map (SOM) (many) N-dim. prototype form a (low) d-dimenional grid repreentation of data in a topology preerving map training 2 neighborhood preerving SOM Neural Ga (ditance baed) application