More precisely, given the collection fx g, with Eucliden distnces between pirs (; b) of ptterns: = p (x? x b ) ; one hs to nd mp, ' : R n distnce-erro

Similar documents
Chapter 3. Vector Spaces

Quadratic Forms. Quadratic Forms

CHAPTER 4a. ROOTS OF EQUATIONS

Math 1B, lecture 4: Error bounds for numerical methods

Review of Gaussian Quadrature method

Review of Calculus, cont d

Recitation 3: More Applications of the Derivative

New Expansion and Infinite Series

Math 520 Final Exam Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Jim Lambers MAT 169 Fall Semester Lecture 4 Notes

Chapter 4 Contravariance, Covariance, and Spacetime Diagrams

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Applicable Analysis and Discrete Mathematics available online at

Week 10: Line Integrals

Vectors , (0,0). 5. A vector is commonly denoted by putting an arrow above its symbol, as in the picture above. Here are some 3-dimensional vectors:

Chapter 3 Polynomials

ODE: Existence and Uniqueness of a Solution

Theoretical foundations of Gaussian quadrature

NOTES ON HILBERT SPACE

CMDA 4604: Intermediate Topics in Mathematical Modeling Lecture 19: Interpolation and Quadrature

Infinite Geometric Series

MATH34032: Green s Functions, Integral Equations and the Calculus of Variations 1

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Numerical Integration

p-adic Egyptian Fractions

Abstract inner product spaces

Numerical integration

Frobenius numbers of generalized Fibonacci semigroups

1 Linear Least Squares

NUMERICAL INTEGRATION. The inverse process to differentiation in calculus is integration. Mathematically, integration is represented by.

Math 61CM - Solutions to homework 9

September 13 Homework Solutions

Best Approximation. Chapter The General Case

The Regulated and Riemann Integrals

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

Best Approximation in the 2-norm

1 Online Learning and Regret Minimization

APPROXIMATE INTEGRATION

ODE: Existence and Uniqueness of a Solution

Notes on length and conformal metrics

A REVIEW OF CALCULUS CONCEPTS FOR JDEP 384H. Thomas Shores Department of Mathematics University of Nebraska Spring 2007

Matrices, Moments and Quadrature, cont d

8 Laplace s Method and Local Limit Theorems

4.5 JACOBI ITERATION FOR FINDING EIGENVALUES OF A REAL SYMMETRIC MATRIX. be a real symmetric matrix. ; (where we choose θ π for.

Chapter 14. Matrix Representations of Linear Transformations

Math 270A: Numerical Linear Algebra

Numerical Linear Algebra Assignment 008

Bases for Vector Spaces

Chapter 5. Numerical Integration

The First Fundamental Theorem of Calculus. If f(x) is continuous on [a, b] and F (x) is any antiderivative. f(x) dx = F (b) F (a).

The Islamic University of Gaza Faculty of Engineering Civil Engineering Department. Numerical Analysis ECIV Chapter 11

Lecture 2e Orthogonal Complement (pages )

Lecture 14: Quadrature

How do we solve these things, especially when they get complicated? How do we know when a system has a solution, and when is it unique?

Torsion in Groups of Integral Triangles

State space systems analysis (continued) Stability. A. Definitions A system is said to be Asymptotically Stable (AS) when it satisfies

New data structures to reduce data size and search time

P 3 (x) = f(0) + f (0)x + f (0) 2. x 2 + f (0) . In the problem set, you are asked to show, in general, the n th order term is a n = f (n) (0)

Jack Simons, Henry Eyring Scientist and Professor Chemistry Department University of Utah

Elementary Linear Algebra

NUMERICAL ANALYSIS MEETS NUMBER THEORY: USINGROOTFINDINGMETHODSTOCALCULATE INVERSES MOD p n

Section 14.3 Arc Length and Curvature

Variational Techniques for Sturm-Liouville Eigenvalue Problems

13: Diffusion in 2 Energy Groups

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

along the vector 5 a) Find the plane s coordinate after 1 hour. b) Find the plane s coordinate after 2 hours. c) Find the plane s coordinate

Lecture Note 9: Orthogonal Reduction

Math 32B Discussion Session Session 7 Notes August 28, 2018

Multivariate problems and matrix algebra

Recitation 3: Applications of the Derivative. 1 Higher-Order Derivatives and their Applications

Here we study square linear systems and properties of their coefficient matrices as they relate to the solution set of the linear system.

Chapter 0. What is the Lebesgue integral about?

20 MATHEMATICS POLYNOMIALS

Lecture 19: Continuous Least Squares Approximation

expression simply by forming an OR of the ANDs of all input variables for which the output is

Numerical Integration

TMA 4180 Optimeringsteori Least Squares Optimization. Harald E. Krogstad, rev. 2010

We will see what is meant by standard form very shortly

Natural examples of rings are the ring of integers, a ring of polynomials in one variable, the ring

Section 17.2 Line Integrals

Exploring parametric representation with the TI-84 Plus CE graphing calculator

CS667 Lecture 6: Monte Carlo Integration 02/10/05

On the interval Legendre polynomials

ARITHMETIC OPERATIONS. The real numbers have the following properties: a b c ab ac

19 Optimal behavior: Game theory

REPRESENTATION THEORY OF PSL 2 (q)

Matrices 13: determinant properties and rules continued

Elements of Matrix Algebra

Integrals along Curves.

Lecture 3. In this lecture, we will discuss algorithms for solving systems of linear equations.

dx dt dy = G(t, x, y), dt where the functions are defined on I Ω, and are locally Lipschitz w.r.t. variable (x, y) Ω.

Math 131. Numerical Integration Larson Section 4.6

Introduction to Numerical Analysis

1B40 Practical Skills

1 The Riemann Integral

Machine Learning for Signal Processing Fundamentals of Linear Algebra

Math Lecture 23

MIXED MODELS (Sections ) I) In the unrestricted model, interactions are treated as in the random effects model:

How to simulate Turing machines by invertible one-dimensional cellular automata

5.7 Improper Integrals

Transcription:

Improved Multidimensionl Scling Anlysis Using Neurl Networks with Distnce-Error Bckpropgtion Llus Grrido (), Sergio Gomez () nd Jume Roc () () Deprtment d'estructur i Constituents de l Mteri/IFAE Universitt de Brcelon, Digonl 647 E-0808 BARCELONA, Spin () Deprtment d'enginyeri Informtic (ETSE) Universitt Rovir i Virgili Crt. de Slou s/n (Complex eductiu) E-43006 TARRAGONA, Spin. Abstrct We show tht neurl networks, with suitle error function for bckpropgtion, cn be successfully used for Metric Multidimensionl Scling (MDS) (i.e. dimensionl reduction while trying to preserve the originl distnces between ptterns) nd re in fct le to outdo the stndrd lgebric pproch to MDS, known s Clssicl Scling (CLS). Published in Neurl Computtion (999) 595{600. Introduction A stndrd problem in multidimensionl scling nlysis consists in trying to mp collection of ptterns, represented s points in n n-dimensionl spce fx R n ; = ; : : : ; pg; to lower dimensionl spce in such wy tht the distnces between the projected points resemble s closely s possible the distnces between the originl ones.

More precisely, given the collection fx g, with Eucliden distnces between pirs (; b) of ptterns: = p (x? x b ) ; one hs to nd mp, ' : R n distnce-error function! R m ; with m < n, such tht it minimizes the qudrtic E ' = X? d(m) ; where d (m) re the Eucliden distnces computed in the projected spce d (m) = p ('(x )? '(x b )) : Typiclly, m is chosen to be two or three in order to mke ville grphicl representtion of the projected congurtion. This cn help visulize n underlying structure tht might be obscured by the cluttering of dt in the originl spce. It is not known in generl how to nd the exct expression of the best mp '. Yet, there is stndrd method to pproximte it, known s Clssicl Scling (CLS), which involves the digonliztion of the symmetric mtrix S of sclr products S = x x b ; P by mens of n orthogonl mtrix C. Tking fx g to be centered t the origin, i.e. x = 0, nd ssuming tht p > n, it is simple to show tht S cn hve t most n non-zero eigenvlues. Ech of these eigenvlues cn be regrded s the typicl scle of principl direction. If we denote by ; : : : m the m lrgest eigenvlues, the resultnt mpping to R m is given by ' CLS(x ) = = C = ; : : : ; m: See, for exmple, [] nd references therein for detiled description of this method. It should be sid tht Clssicl Scling cn be used in broder context, when only mtrix of dissimilrities is known, s tool to ssign coordintes to the ptterns. It cn be shown tht once coordintes re lredy known for ptterns, s in our cse, Clssicl Scling ctully reduces to Principl Component Anlysis (PCA).

Multidimensionl scling with neurl networks In this note we provide n lterntive solution to this problem which involves the use of neurl networks. The min ide consists in building net with n input units nd number of hidden lyers, contining bottle-neck lyer with only m units nd n output lyer with n units. A modied version of the stndrd bckpropgtion lgorithm [] is then invoked. In ddition to the qudrtic error term between input nd output, it contins new term which is introduced to minimize the dierence between the distnces of pirs in the input nd neck lyers. Finlly, when enough itertions hve been performed, the projected congurtion is red out from the neck lyer. In order to use the net in the most ecient wy it is convenient to perform trnsltion nd globl scling of the initil dt: x?! in = in (x? ); so s to mke in [0; ] n. Then one cn use in s the input to the net. The outcome of the neck lyer, nk, lives in the region [0; ] m since we re using sigmoid ctivtion functions. This implies tht 0 d nk d nk nd din respectively. p m while 0 d in p n for ny pir of input points ( in ; in b ), where stnd for the distnces between ptterns nd b in the neck nd initil lyers, The error function tht we hve considered in the bckpropgtion method is given by E = E + (? ) E where E = X? out X? in nd E = d in pn? dnk p ; m nd [0; ] controls the reltive contribution of ech prt. The term E fvours those mps for which the representtion in the bottle-neck lyer cn be best ccurtely inverted to recover the originl congurtion. The second term, E, is the most importnt one since it forces this representtion in the bottle-neck to inherit, s closely s possible, the metric structure of the originl congurtion. The dierent sclings for d in nd dnk in this term re introduced in order to hve both numbers in the sme rnge. In this wy we cn gurntee tht ll possible congurtions cn still be covered with the use of sigmoids. 3

The vrious sclings involved in this process mke the outcome of the neck lyer not to be directly interpretle s the nl nswer; we cn bring it bck to the originl scle by setting ' NN (x ) = out nk ; p with out = n=m in. However, slightly better solution cn be obtined by choosing insted out = P d(n) dnk P? d nk ; since this is the vlue of tht minimizes the function E() = P (d(n)? dnk for the given ) neck congurtion, which is wht we re ultimtely trying to chieve with the whole procedure. In the prcticl use of the neurl network we hve noticed tht the best results re obtined by letting the prmeter fll to zero s the lerning grows so tht the error function E reduces to E fter certin number of itertions. Actully, non-zero vlue of is only useful in the erly stges of the lerning, in order to speed up convergence. In this sitution, i.e. with E = E, it is esy to prove nlyticlly tht the congurtion minimizing E diers from the P p one minimizing directly (d in?d nk ) only by globl scling n=m of ll coordintes. Thus, the (otherwise techniclly convenient) sclings tht we hve introduced ove re completely hrmless for the purpose of serching for the best mpped congurtion. It is well known tht network with just the input, output nd neck lyers, with liner ctivtion functions nd subject to self-supervised bckpropgtion is equivlent to PCA [3]. Our pproch goes beyond PCA, not only becuse of the use of sigmoid (non-liner) ctivtion functions nd the ddition of number of hidden lyers, but essentilly for the presence of this new distnce-term contribution, E, which fvors those congurtions in the neck lyer tht pproximte better the originl distnces. One my wonder how does our method compre to Non-Liner PCA (NLPCA) [4, 5, 6]. Actully, NLPCA cn be recovered s prticulr cse of our pproch by setting = in the error function (i.e. with E = E ). Clerly, NLPCA will generlly do better thn ordinry PCA in the minimiztion of the term E becuse of the ility to model non-liner congurtions. However, NLPCA does not cre t ll of the distnces between ptterns in the bottle-neck representtion: ny two neck congurtion re eqully good for NLPCA if both provide the sme result in the output lyer. Hence, the comprison of NLPCA with our pproch is relly inpproprite becuse both methods re in fct designed for dierent purposes (minimizing E nd 4

E, respectively). On the contrry, the projected congurtion of stndrd PCA still retins prt of the metric structure of the initil congurtion since it is just liner orthogonl projection onto the lrgest-vrince xes nd hence it produces better results for E thn NLPCA. This is why we will compre performnce of our method with CLS (i.e. PCA) nd not with NLPCA. A comprtive nlysis of both pproches over severl types of congurtions shows tht our method produces better results in the \tougher" situtions, i.e. when some of the discrded directions in the CLS method still hve reltively lrge ssocited eigenvlues. Finlly, it is worth stressing tht CLS provides just liner orthogonl projection wheres the neurl net is le to produce more generl (non-liner) mppings. An exmple As n illustrtion of both procedures we hve considered dt set consisting of dierent niml species, chrcterized by n = 7 ttributes ech (5 boolen + numericl). The coordintes x nd distnces hve been obtined fter scling the numericl ttributes to the rnge [0; ] in order to ssign n equl weight to ll ttributes (implying tht in this cse we simply hve in = x ). When using the neurl net, the best scling for the two-dimensionl neck representtion is given by out out = p 7=. = :946, which is in less thn : % disgreement with the expected vlue of The projected congurtions obtined with ech method re drwn in gure. Ptterns re represented by their lel. As shown in the plot, both pproches produce firly similr congurtion. However, the computtion of the overll reltive error, i.e. P? d(m) " = 0 B @ P C A for ech method shows tht the neurl network is giving out slightly better result, " CLS = 0:78; " NN = 0:346; which mounts to 4:00 % improvement over the CLS method. Extrcted from the \Zoo Dtse" creted by Richrd S. Forsyth (990) (ftp://ftp.ics.uci.edu: ; ~ /pub/mchine-lerning-dtses/zoo). 5

Acknowledgements This work hs been supported in prt by CICYT contrct AEN95-0590 nd by URV project URV96-GNI-3. J.R. lso thnks Ministerio de Educcion y Cultur of Spin for nncil support. References [] T.F. Cox nd M.A.A. Cox, Multidimensionl Scling. Chpmn & Hll, London 994. [] D.E. Rumelhrt, G.E. Hinton nd R.J. Willims 986. Lerning representtions by bckpropgting errors, Nture 33, 533. [3] T.D. Snger 989. Optiml unsupervised lerning in single-lyer liner feedforwrd neurl network, Neurl Networks, 459. [4] M.A. Krmer 99. Non-liner principl component nlysis using utossocitive neurl networks, AICHE Journl 37, 33. [5] D. DeMers nd G. Cottrell 994. Non-Liner Dimensionlity Reduction, Published in NIPS 5; N. Kmbhtl nd T.K. Leen 995. Fst Non-Liner Dimension Reduction, Published in NIPS 6. [6] Ll. Grrido, V. Gitn, M. Serr-Ricrt nd X. Clbet 995. Use of multilyer feedforwrd neurl nets s disply method for multidimensionl distributions, Int. J. Neurl Systems 6, 73; Ll. Grrido, S. Gomez, V. Gitn nd M. Serr-Ricrt 996. A regulriztion term to void the sturtion of the sigmoids in multilyer neurl networks, Int. J. Neurl Systems 7, 57. 6

.5.5 0.5 0.5.5.5.5.5 0.5 0.5.5.5 3 4 5 6 7 8 9 0 3 4 56 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 440 4 43 44 45 46 47 48 49 50 5 5 53 54 55 56 57 58 59 60 6 6 63 64 65 66 67 68 69 70 7 7 73 74 75 76 77 78 79 80 8 8 83 84 85 86 87 88 89 90.5.5 0.5 0.5.5.5.5.5 0.5 0.5.5.5 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 0 3 4 5 6 7 8 9 30 3 3 33 34 35 36 37 38 39 40 4 4 43 44 45 46 47 48 49 50 5 5 53 54 55 56 57 58 59 60 6 6 63 64 65 66 67 68 69 70 7 7 73 74 75 76 77 78 79 80 8 8 83 84 85 86 87 88 89 90 CLS NN Figure. 7