Non-linear Canonical Correlation Analysis Using a RBF Network

Similar documents
Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

LECTURE 9 CANONICAL CORRELATION ANALYSIS

CS4495/6495 Introduction to Computer Vision. 3C-L3 Calibrating cameras

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

Originated from experimental optimization where measurements are very noisy Approximation can be actually more accurate than

Statistical pattern recognition

A kernel method for canonical correlation analysis

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Unified Subspace Analysis for Face Recognition

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Linear Approximation with Regularization and Moving Least Squares

A FAULT DETECTION METHOD USING MULTI-SCALE KERNEL PRINCIPAL COMPONENT ANALYSIS

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Linear Regression Introduction to Machine Learning. Matt Gormley Lecture 5 September 14, Readings: Bishop, 3.1

The Prncpal Component Transform The Prncpal Component Transform s also called Karhunen-Loeve Transform (KLT, Hotellng Transform, oregenvector Transfor

Multigradient for Neural Networks for Equalizers 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

Composite Hypotheses testing

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

Feb 14: Spatial analysis of data fields

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Statistics for Economics & Business

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

Chapter 11: Simple Linear Regression and Correlation

Chapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Regularized Discriminant Analysis for Face Recognition

SIMPLE LINEAR REGRESSION

Lecture 10 Support Vector Machines II

Solving Nonlinear Differential Equations by a Neural Network Method

Comparison of Regression Lines

DETERMINATION OF TEMPERATURE DISTRIBUTION FOR ANNULAR FINS WITH TEMPERATURE DEPENDENT THERMAL CONDUCTIVITY BY HPM

[ ] λ λ λ. Multicollinearity. multicollinearity Ragnar Frisch (1934) perfect exact. collinearity. multicollinearity. exact

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Errors for Linear Systems

Linear Feature Engineering 11

Generalized Linear Methods

Relevance Vector Machines Explained

CSE 252C: Computer Vision III

Lecture Notes on Linear Regression

Application research on rough set -neural network in the fault diagnosis system of ball mill

Which Separator? Spring 1

Support Vector Machines

A Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009

Support Vector Machines

Parameter Estimation for Dynamic System using Unscented Kalman filter

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

Negative Binomial Regression

β0 + β1xi. You are interested in estimating the unknown parameters β

15-381: Artificial Intelligence. Regression and cross validation

10-701/ Machine Learning, Fall 2005 Homework 3

Differentiating Gaussian Processes

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

FORECASTING EXCHANGE RATE USING SUPPORT VECTOR MACHINES

Research Article Green s Theorem for Sign Data

The Expectation-Maximization Algorithm

Chapter 6 Support vector machine. Séparateurs à vaste marge

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Basic Business Statistics, 10/e

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Chapter 9: Statistical Inference and the Relationship between Two Variables

APPLICATION OF RBF NEURAL NETWORK IMPROVED BY PSO ALGORITHM IN FAULT DIAGNOSIS

Comparison of Wiener Filter solution by SVD with decompositions QR and QLP

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

SVMs: Duality and Kernel Trick. SVMs as quadratic programs

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

Supporting Information

A new Approach for Solving Linear Ordinary Differential Equations

Intro to Visual Recognition

EEE 241: Linear Systems

Neural Networks & Learning

The Ordinary Least Squares (OLS) Estimator

Introduction to Regression

T E C O L O T E R E S E A R C H, I N C.

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

2.3 Nilpotent endomorphisms

Lecture 10: Dimensionality reduction

y new = M x old Feature Selection: Linear Transformations Constraint Optimization (insertion)

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

13 Principal Components Analysis

Semi-supervised Classification with Active Query Selection

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Lecture 3: Dual problems and Kernels

Multi-dimensional Central Limit Theorem

on the improved Partial Least Squares regression

Chapter 15 Student Lecture Notes 15-1

Investigation of the Relationship between Diesel Fuel Properties and Emissions from Engines with Fuzzy Linear Regression

Tracking with Kalman Filter

Transcription:

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane B Martn and Julan Morrs Center for Process Analtcs and Control echnolog Unverst of Newcastle, Newcastle upon ne, NE 7RU, England Abstract: A non-lnear verson of the multvarate statstcal technque of canoncal correlaton analss (CCA) s proposed through the ntegraton of a radal bass functon (RBF) network. he advantage of the RBF network s that the soluton of lnear CCA can be used to tran the network and hence the tranng effort s mnmal. Also the canoncal varables can be extracted smultaneousl. It s shown that the proposed technque can be used to extract non-lnear structures nherent wthn a data set.. Introducton Over the past decade, a number of technques have been proposed for the extracton of non-lnear features nherent wthn process data ncludng the multvarate statstcal technque of prncpal component analss [-4]. More recentl a nonlnear varant of Canoncal Correlaton Analss (CCA) has been proposed [5] through the ntegraton of a Mult-Laer Perceptron (MLP) network. A drawback of ths approach s that the optmsaton problem s non-lnear and thus suffers from the potental problem of becomng trapped wthn a local mnmum. Hseh [5] addressed ths ssue b tranng an ensemble of neural networks. Although not a serous lmtaton of the methodolog, t does requre major tranng effort. he other lmtaton s that when usng a MLP network, the canoncal varables cannot be extracted smultaneousl. hs has two repercussons. Frst the number of MLP networks to be traned (hence the tranng effort) ncreases wth the number of canoncal varables and secondl snce the MLP networks are traned on the resduals, the extracton of subsequent canoncal varables becomes dffcult because of the reducton n sgnal to nose rato. In ths paper an alternatve method of mplementng non-lnear CCA usng a Radal Bass Functon (RBF) network s proposed.. Lnear Canoncal Correlaton Analss Canoncal Correlaton Analss (CCA) s a multvarate statstcal technque that dentfes a lnear relatonshp between two sets of varables x R m and R n. Lnear CCA seeks to fnd vectors a R m and b R n such that the lnear combnatons: u = a x and v = b ()

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 have maxmum correlaton. he vectors a and b are the canoncal correlaton vectors and u and v are the canoncal varables. he above problem s solved as follows. Let Σ xx and Σ be the covarance matrces of x and respectvel and Σ x be the cross covarance matrx between x and. Let matrx, K, be defned as: K = Σ xx Σ x Σ () If k s the rank of matrx K, then b sngular value decomposton, K can be decomposed as: K = (,..,. k ) D (,,.., k ) () where α and β are the egenvectors of the matrces KK and K K respectvel and D s a block dagonal matrx comprsng the square root of the k non-zero egenvalues. Lettng: a = Σ xx α and b = Σ then a and b are the (k) canoncal vectors. β for =,... k (4). Non Lnear CCA usng a RBF Network Non-lnear Canoncal Correlaton Analss (CCA) s smlar to lnear CCA except that the lnear transformaton appled to the varables, x and, s replaced b a nonlnear transformaton. In ths paper a RBF network replaces the lnear transformaton. Non-lnear CCA s performed n two stages. Frst the varables, x and, are projected from the hgher dmensonal space down onto a lower dmensonal space and then the latent varables are transformed back to the orgnal varables. he second step s termed self-consstenc [6].. Mappng from Orgnal Data Space to the Canoncal Varables he mappng of x and to canoncal varables u and v s from R m R k and R n R k respectvel. For smplct, k, the number of canoncal varables, s taken to be unt. he stuaton where k s greater than unt s a straghtforward extenson of the descrbed methodolog. Gven the centers c c and the wdths σ x, σ of x, the radal bass functons for the two mappngs, the canoncal varables u and v can be defned as:

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 p u = w f ( x) = w f x = x q (5) and v = w g ( ) = w g j= j where f =[f,f... f p ] and g =[g,g... g q] are RBF vectors and w x = [ w x, w x,... w x p] and w = [ w,w,...w q] are the weght vectors for the mappng from x to u and to v respectvel. Non-lnear CCA then reduces to that of fndng the weght vectors w x and w such that there s maxmum correlaton between u and v. hs problem s smlar to the lnear case except that the vectors x and are replaced b f and g respectvel. If A xx and A denote the covarance matrces of the radal bass functons f and g respectvel and A x s the cross covarance matrx, then smlar to equatons () and (): xx x M = A A A (6) can be decomposed usng sngular value decomposton: M = [ p, p... p ] Λ [q, q,... q ] (7) k k k he weght vectors w x and w are calculated as follows: w = A p x xx w = A q (8) (9) In the case where more than one canoncal varable s requred, the weght vectors for the network for transformng the varables x and nto successve canoncal varables can be obtaned usng vectors p and q for =,.. k n equatons (8) and (9) respectvel. hus the canoncal varables can be obtaned smultaneousl wthout solvng an non-lnear optmzaton problem. he bass functon for the mappng from x to u s chosen such that s predcted from x, that s: p ( x) = γ ε = f where s the predcton error and γ s a coeffcent. Smlarl the bass functons for the mappng from to v are selected so that x s predcted from wth mnmum error. here exst man technques to adjust the centers and wdths σ x and σ of the radal bass functons [7-8]. Here the centres are determned b fttng a Gaussan

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 mxture model wth crcular covarances usng the EM algorthm wth the wdths set equal to the maxmum nner centre dstance.. Mappng from Canoncal Varables to Orgnal Data Space he scores u and v calculated n the frst stage of the algorthm should be a good approxmaton of the orgnal vectors, x and. he next stage s to appl an nverse transformaton, agan usng a RBF network. he parameters of the mappng from the scores u to x and v to are adjusted such that the sum of squared predcton errors are mnmsed: N Ex = x x = $ and E = $ N = () he centres and the wdths are calculated as descrbed n secton. and the weghts are determned b least squares. o avod overfttng, a regularzaton term s added to the sum of squares of the errors whle fndng the parameters of the network. 4. est Example he proposed approach to non-lnear CCA s appled to the test problem gven n [5]. he varables x and are three dmensonal: x = [ x x, x x, x x ] and = [,, ], x = t -. t, x = t.t, x = t ; () = t, = - t.t, = t.t ; () x = -s -.s, x = s -.s ; x 4 = - s ; () = sech(4s), = s.s, = s.s ; (4) where t and s are ndependent and unforml dstrbuted over [, ]. he plots of mode and n the x and - space s shown n Fg.. he data set was generated b adjustng the varance of the canoncal varable as one thrd of the frst canoncal varable. Gaussan nose wth standard devaton equal to % of the sgnal standard devaton was added. he varables were then auto-scaled and non-lnear CCA was appled. he number of neurons n the projecton stage were optmsed through cross-valdaton to reproduce the vectors x and. For the test problem, the number of neurons was ffteen. wo canoncal varables explaned approxmatel 95% of the varance n X and Y. In the nverse mappng, from

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 canoncal varables to orgnal varables, the number of neurons was twelve. he plots of mode and mode n x and space extracted from the data are shown n Fgs. and respectvel. Comparng Fg. wth Fgs. and, the proposed technque s able to extract Mode and Mode reasonabl well from the data. x x - - - - - - - - - -4-4 x - -4-4 x - -4-4 - -4-4 4 x - - - -4-4 x x - x - - x - - - -4-4 - - - -5 5 Fg.. Modes and n x -space (LHS) and -space (RHS). ( - data; o Mode ; - Mode ) Fg.. Extracton of Mode n x -space (LHS) and -space (RHS). ( - Data; o Extracted Mode ; - Actual Mode ). Fg.. Extracton of Mode n x -space (LHS) and -space (RHS). ( - Data; o Extracted Mode ; - Actual Mode ).

ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 he correlaton between u and v s.997 and between u and v s.9844. he MSE of x after the extracton of the frst canoncal varable s.875 and for s.69. After extracton of both canoncal varables, the MSE n x and are.96 and.59 respectvel. After the non-lnear CCA model s bult, the model s used to predct from the gven values of x. he average MSE for the predcton of for new data sets, gven x s.9. hese results are comparable wth those reported n [5]. 6. Conclusons In ths paper non-lnear CCA usng a radal bass functon network has been proposed. For ths method the tranng effort s less because of the near lnear nature of the problem. Also the canoncal varables can be extracted smultaneousl. However, the ssue of how man canoncal varables to be retaned to buld the model remans unresolved. he model has been tested on snthetc data. he am of the methodolog s to use t for fault detecton and dagnoss. 7. Acknowledgements S. Kumar would lke to acknowledge the fnancal support of the Centre for Process Analtcs and Control echnolog (CPAC) and the EPSRC for fnancal support. 8. References. M. A. Kramer: Nonlnear prncpal component analss usng autoassocatve neural networks. AIChE, 7(), -4 [99].. D. Dong,. J. McAvo: Nonlnear prncpal component analss-based on prncpal curves and neural networks. Computers and Chemcal Engneerng, (), 65-78 [996].. S. an, M.L. Mavrovounots: Reducng data dmensonalt through optmzng neural network nputs. AIChE, 4(6), 47-48 [995]. 4. F. Ja, E.B. Martn, A.J. Morrs: Non-lnear prncpal components analss wth applcaton to process fault detecton. Internatonal Journal of Sstem Scence, (), 47-487 []. 5. W. W. Hseh: Nonlnear canoncal correlaton analss b neural networks. Neural Networks,, 95-5 []. 6. D. J. H. Wlson, G.W. Irwn, G. Lghtbod: RBF prncpal manfolds for process montorng. IEEE ransactons on Neural Networks, (6), 44-44 [999]. 7. C. M. Bshop: Neural networks for pattern recognton. Oxford Unverst Press [995]. 8.. Kohonen: Self-organzaton and assocatve memor. Sprnger Seres n Informaton Scences. Sprnger-Verlag [984].