ANALYSIS OF NONLINEAR PARTIAL LEAST SQUARES ALGORITHMS

Similar documents
Dynamic-Inner Partial Least Squares for Dynamic Data Modeling

Contributions to partial least squares regression and supervised principal component analysis modeling

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

AN ONLINE NIPALS ALGORITHM FOR PARTIAL LEAST SQUARES. Alexander E. Stott, Sithan Kanna, Danilo P. Mandic, William T. Pike

Dimensionality Reduction for Time Series Decoding and Forecasting Problems

Introduction to Machine Learning

EXTENDING PARTIAL LEAST SQUARES REGRESSION

SUPPLEMENTAL NOTES FOR ROBUST REGULARIZED SINGULAR VALUE DECOMPOSITION WITH APPLICATION TO MORTALITY DATA

APPLICATION OF INDEPENDENT COMPONENT ANALYSIS TO CHEMICAL REACTIONS. S.Triadaphillou, A. J. Morris and E. B. Martin

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Supplement to A Generalized Least Squares Matrix Decomposition. 1 GPMF & Smoothness: Ω-norm Penalty & Functional Data

MULTIVARIATE STATISTICAL ANALYSIS OF SPECTROSCOPIC DATA. Haisheng Lin, Ognjen Marjanovic, Barry Lennox

FERMENTATION BATCH PROCESS MONITORING BY STEP-BY-STEP ADAPTIVE MPCA. Ning He, Lei Xie, Shu-qing Wang, Jian-ming Zhang

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Linear Models for Regression CS534

Maximum variance formulation

On the Equivalence Between Canonical Correlation Analysis and Orthonormalized Partial Least Squares

Non-negative matrix factorization with fixed row and column sums

ADAPTIVE FILTER THEORY

From dummy regression to prior probabilities in PLS-DA

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Identification of Nonlinear Systems Using Neural Networks and Polynomial Models

A Generalization of Principal Component Analysis to the Exponential Family

Sample questions for Fundamentals of Machine Learning 2018

Kernel Partial Least Squares for Nonlinear Regression and Discrimination

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

Discriminative Direction for Kernel Classifiers

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Variable Learning Rate LMS Based Linear Adaptive Inverse Control *

Linear Models for Regression. Sargur Srihari

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

DS-GA 1002 Lecture notes 12 Fall Linear regression

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Linear Models for Regression

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Chapter 8 Gradient Methods

CLOSE-TO-CLEAN REGULARIZATION RELATES

Introduction to gradient descent

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Linear Regression. S. Sumitra

Linear & nonlinear classifiers

Gene Expression Data Classification with Revised Kernel Partial Least Squares Algorithm

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares

Matching the dimensionality of maps with that of the data

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

Solutions Preliminary Examination in Numerical Analysis January, 2017

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

STA 414/2104: Lecture 8

Introduction to Principal Component Analysis (PCA)

Principal Component Analysis

COS 424: Interacting with Data

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

ADAPTIVE FILTER THEORY

Introduction to Matrix Algebra

Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

Nonlinear Robust PLS Modeling of Wastewater Effluent Quality Indices

Methods for sparse analysis of high-dimensional data, II

Learning with Singular Vectors

Linear Regression and Its Applications

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD

Combine Monte Carlo with Exhaustive Search: Effective Variational Inference and Policy Gradient Reinforcement Learning

Lecture 5 : Projections

Basics of Multivariate Modelling and Data Analysis

NON-LINEAR CONTROL OF OUTPUT PROBABILITY DENSITY FUNCTION FOR LINEAR ARMAX SYSTEMS

Robustness of Principal Components

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

An artificial neural networks (ANNs) model is a functional abstraction of the

Linear & nonlinear classifiers

Developments and Applications of Nonlinear Principal Component Analysis a Review

MODEL PREDICTIVE CONTROL and optimization

Machine Learning Lecture Notes

BATCH PROCESS MONITORING THROUGH THE INTEGRATION OF SPECTRAL AND PROCESS DATA. Julian Morris, Elaine Martin and David Stewart

Building knowledge from plant operating data for process improvement. applications

Machine Learning in Modern Well Testing

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

Recursive Generalized Eigendecomposition for Independent Component Analysis

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Forecast comparison of principal component regression and principal covariate regression

A Least Squares Formulation for Canonical Correlation Analysis

Manning & Schuetze, FSNLP, (c)

Unconstrained Ordination

Principal Component Analysis

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

An improved convergence theorem for the Newton method under relaxed continuity assumptions

Advanced Introduction to Machine Learning

STA 4273H: Sta-s-cal Machine Learning

STAT 100C: Linear models

7 Principal Component Analysis

Transcription:

ANALYSIS OF NONLINEAR PARIAL LEAS SQUARES ALGORIHMS S. Kumar U. Kruger,1 E. B. Martin, and A. J. Morris Centre of Process Analytics and Process echnology, University of Newcastle, NE1 7RU, U.K. Intelligent Systems and ol Group, Queen s University Belfast, B9 5AH, U.K. Abstract: his paper presents an analysis of nonlinear extensions to Partial Least Squares (PLS) using error-based minimization techniques. he analysis revealed that such algorithms are maximizing the accuracy with which the response variables are predicted. herefore, such algorithms are nonlinear reduced ran regression algorithms rather than nonlinear PLS algorithms. Keywords: Nonlinear Systems, Identification Algorithms, Models, Gradient Methods, Prediction 1. INRODUCION A number of extensions to conventional PLS have been proposed for identifying nonlinear relationship between two variable sets, which are usually referred to as the predictor set that is predictive and the response set that is predicted. For these extensions, the iterative nature of PLS is taen advantage of in terms of reducing the modelling tas to the identification of score models in the latent variable space. he score models rely on projecting recorded observations of the predictor and response variables on respective one-dimensional subspaces, defined by weight vectors. he projections of the response variables, further referred to as u-scores, are then predicted by nonlinear functions of the projections of the predictor variable, denoted as t-scores. (Wold et al., 1989) suggested to identify the score models using second order polynomials. More flexible nonlinear models were proposed by 1 Corresponding Author: uwe.ruger@ee.qub.ac.u; el: +44(0)890 74059; Fax: +44(0)890 66703 (Fran, 1990) and (Wold, 199). Whilst the former wor included the use of a smoothing procedure, the technique by (Wold, 199) incorporated spline functions. (Qin and McAvoy, 199; Holcomb and Morari, 199; Wilson et al., 1997) used artificial neural networs and more recently, (Patwardhan and Lashminarayan, 1998) exploited the use of Hammerstein and Wiener models, (Hiden et al., 1998) employed Genetic Programming and (Li et al., 001) utilized the Box-idwell transformation. However, (Berglund and Wold, 1997) argued that artificial neural networs and spline functions (i) inherent a considerable degree of flexibility and (ii) may lead to overfitting. A revision of the wor by (Wold et al., 1989), incorporating low order polynomials, was given by (Baffi et al., 1999a), where a number of modifications were introduced to produce a simpler algorithm. In this article, the wor by (Wold et al., 1989) and the modifications that (Baffi et al., 1999a) introduced are investigated. his analysis revealed the following contributions, summarized in the

presented wor. Firstly, if a linear structure for each score model is considered, the technique by (Baffi et al., 1999a) is equivalent to reduced ran regression (RRR). herefore, each score model maximally reduces the variation of the response variables. Secondly, if nonlinear score models are identified, the algorithm by (Baffi et al., 1999a) is consequently a nonlinear RRR (NLRRR) technique. hus, it should not be considered as a nonlinear PLS technique, since minimizing the variation of the prediction error of the response variables does not guarantee maximum variance between pairs of score variables. A maximum covariance, however, is the criterion with which a conventional PLS model is identified. hirdly, using NLRRR, the direction on which the response variables are projected is constrained to be equivalent to the direction for predicting the response variables by the predicted u-scores.. PARIAL LEAS SQUARES PRELIMINARIES PLS is designed to analyze the relationships between the predictor and the response variables. he observations of each variable are stored in matrices, i.e. the predictor matrix X R K N and response matrix Y R K M. Each column refers to a particular predictor or response variable and each row refers to a particular observation, K represents the number of observations, N is the number of predictor variables and M is the number of response variables. ypically, X and Y are mean centered and appropriately scaled prior to the applications of PLS. It is assumed throughout this article that K N, M and that the ran of X and Y is N and M, respectively. Note, however, that these assumptions are for the convenience of the presentation rather than imposing restrictions onto the presented methods..1 PLS Algorithms PLS determines the th pair of weight vectors, v and w, and score vectors, t and u, i.e. the vectors in which the t- and u-scores are stored, by maximizing J (v,w) : J (v,w) = t u = w X Y v, (1) which is subject to the constraints G (v) (Hösuldsson, 1988): and G (w) G (v) = v 1 = 0 G(w) = w 1 = 0, () with being the squared norm of a vector. For determining subsequent weight and score vectors, PLS uses a deflation procedure, which involves the contribution of the th pair of score vectors to be subtracted or deflated from the predictor and response matrices, (Geladi and Kowalsi, 1986): X +1 = X t p Y +1 = Y t q, (3) where X, Y and X +1, Y +1 are the predictor and response matrices after + 1 and deflation steps, respectively, and p and q are regression or loading vectors that represent the contribution of the t-score vector to the predictor and response matrices, respectively. he loading vectors are determined as follows: p = X t t t q = Y t t t. (4) After computing n pairs of weight, score and loading vectors, often denoted as latent variables (LVs), a parametric regression matrix, B (n) P LS, can be calculated: [ P n W n ] 1 Q n, (5) B (n) P LS = W n where W n, P n and Q n are matrices in which the n w-weight and p- and q-loading vectors are stored as column vectors in successive order. he following properties of PLS are of importance in subsequent sections. Remar 1. he vectors v, q and p are functions of w Remar. he vectors v and q point to the same direction in the response space, i.e. v q Remar 3. he elements of each pair of score vectors, t and u, are determined to have maximum covariance. It follows from Remar 1 that the w-weight vector completely characterizes PLS, i.e. using w, the score, loading and the v-weight vector can be computed. 3. NONLINEAR PARIAL LEAS SQUARES Various nonlinear extensions of PLS have been proposed that rely on a nonlinear mapping, where the u-scores are predicted using a nonlinear function of the t-scores: u = f (t ) + e, (6) where e is the residual vector of u. he following functions, f ( ), were introduced: second order polynomials (Wold et al., 1989; Baffi et al., 1999a), smoothing procedures (Fran, 1990), spline functions (Wold, 199), artificial neural networs (Qin and McAvoy, 199; Holcomb and Morari, 199; Wilson et al., 1997; Baffi et al., 1999b) and Box- idwell transformations (Li et al., 001). It follows from Remar 1 that the algorithms by (Wold et al., 1989; Baffi et al., 1999a; Baffi et

al., 1999b) exhibit a close relation to PLS. More precisely, t, v, u, p, q are linear or nonlinear functions of w. he update procedure of the i th iteration step for the w-weight vector, w i,, is given by (Baffi et al., 1999a): heorem 1. If a linear function of the t-scores is used to predict the u-scores, i.e. u = a t + e with a being a regression coefficient, the weight vector w is the most dominant eigenvector of the matrix expression [ X X ] X Y Y X. w i+1, = [ Z ] i,z i, Z i, e i, w i+1, = w i, + w i+1,, (7) w i, + w i+1, where represents the generalized inverse of a matrix and Z i, is a matrix in which the nonlinear function of the th t-scores are stored. For example, the j th row of this matrix has the following elements for a second order polynomial, u i,j = a i,1 + a i, t i,j + a i,3 t i,j + e i,j, z i,j = ( ) a i, x j a i,3 x j, where ai,1, a i, and a i,3 are coefficients that are determined using a standards least squares procedure for instance and x j is the jth row vector of X. he algorithm by (Baffi et al., 1999a) is based on the wor by (Wold et al., 1989). More precisely, (Baffi et al., 1999a) introduced some simplifications to enhance the algorithm by (Wold et al., 1989). he simplified algorithm was later extended to include the application of neural networs (Baffi et al., 1999b). he error-based algorithm by (Baffi et al., 1999a; Baffi et al., 1999b) is analyzed in the next sections. 4. ANALYSIS OF ERROR-BASED NONLINEAR PLS Since updating w i+1, = w i,+ w i+1, w i, + w i+1, is based on minimizing the residuals of the th score model, e i,, it is not guaranteed that the elements in each pair of score vectors are determined to have maximum covariance after converged. his can be seen by analyzing the cost function with which the w-weight vector is determined: J (w) = min e w = min u f (t ) w = min w Y v f (X w ), (8) which is subject to the following constraints: = v Y f (X w ) Y f (X w ) = 0 G (v) G (w) = w 1 = 0. (9) o simplify the analysis of the above error-based algorithm, the function f (t ) is assumed to be linear. his allows a strict comparison between conventional PLS and linear error-based PLS. Using a linear function gives rise to the following heorem, proven in Appendix A. he solution for obtaining w, however, is equivalent to the solution of the RRR cost function, which gives rise to the following Conclusion. Conclusion 1. If a linear relationship between the score variables is considered, the error-based iteration procedure by (Baffi et al., 1999a; Baffi et al., 1999b) produces, in fact, a RRR solution. For completeness, the cost function of RRR and its solution are provided in Appendix B. Conclusion 1 can be reformulated as follows. Conclusion. Since the error-based iteration procedure converges to produce a RRR solution, it is not guaranteed that the elements in the score vectors, t and u describe a maximum covariance. An examples to support Conclusion is given in Section 6. he equivalence between a linear version of error-based PLS and RRR follows from the v-weight vector being a function of the w- weight vector. heorem. By incorporating the constraint G (v) = v Y û = v Y û Y t Y t = 0, the errorbased cost function minimizes the residuals of the response variables instead of minimizing the residuals of the th score variables. he proof of heorem 1 can also be applied to prove heorem. Note that the error-based procedure complies with Bellman s principle of optimality (Bellman, 1957) for predicting the response variables. By applying a nonlinear mapping between the score variables, as shown in Equation (6), the th u-score variable is predicted more accurately in case the underlying process exhibits nonlinear relationships between the predictor and response variables. On the basis of heorem, this increased accuracy translates into an increased accuracy for predicting the response variables. his is, again, a result of the constraint applied to the v-weight vector, which gives rise to the following conclusion. Conclusion 3. For nonlinear score models, the error-based iteration procedure converges to minimize the residual variance of the response variables.

Conclusion 3 can be argued on the basis of the following Lemmas. Lemma 1. he constraint of the v-weight vector, used to calculated the u-score vector, is determined to maximize the contribution of the predicted u-score vector to the response matrix. Lemma. After the iteration procedure has converged, the q-loading vector is obtained to maximize the contribution of the predicted u-score vector to the response variables and points in the same direction as the v-weight vector. Lemma 3. he error-based iteration procedure converges to minimize the variance of the score model prediction and after conversion under the constraint of the v-weight vector being a function of the w-weight vector. Lemma 1 and are proven in Appendix C and Lemma 3 follows from Equation (9). With respect to PLS, Conclusions and 3 give rise to postulate the following conclusion. Conclusion 4. he error-based iteration procedure by (Baffi et al., 1999a; Baffi et al., 1999b) is a nonlinear extension to RRR rather than a nonlinear extension to conventional PLS. he iterative error-based procedure is, in fact, a gradient descent approach for minimizing Equation (8) by incorporating the constraints of Equation (9). It should therefore be noted that the solution that is computed may be suboptimal, i.e. the solution may present a local minimum rather than a global minima. his might be circumvented by applying a Genetic Algorithm. Although computationally expensive this technique showed to provide more accurate NLRRR models, as shown by (Sharma et al., 003). Another problem is that the series of iteration steps may not converge for each set of LVs. his, however, has not yet been reported and the applications of NLRRR in Section 6 also showed convergence. In contrast, the linear error-based iteration procedure converges to compute the most dominant eigenvector of a matrix expression. his, in turn, implies that (i) the results are optimal and (ii) convergence problems will usually not arise (Golub and van Loan, 1996). 5. APPLICAION SUDIES In this section, the NLRRR algorithm is applied to a simulation study to illustrate the above findings. In addition, the covariance of the elements of each pair of t- and u-scores are determined for PLS and NLRRR. he simulated process contained 4 predictor variables, denoted as x 1, x, x 3 and x 4, and 3 output variables, referred to as y 1, y and y 3. he predictor variables were generated as follows: x 1 = 0.81 t 1 + 0.444 t x = 0.615 t 1 + 0.744 t x 3 = 0.91 t 1 + 0.738 t x 4 = 0.176 t 1 + 0.405 t, (10) where t 1 and t were normally distributed sequences of zero mean and unit variance, i.e. t 1, t N {0, 1}. Using the 4 predictor variables, the response variables were then computed to be: y 1 = exp ( x 1 sin (π x 4 )) + sin (x x 3 ) y = x 1 + sin (π x x 3 ) + x 4 4 y 3 = exp (x cos (π x 4 )) + x 1 x 4. (11) A data set of 1000 samples was obtained as described above and an identically and independently distributed sequence of zero mean and variance 0.01, was superimposed on each variable to represent measurement noise. A division of the recorded data set was then carried out as follows. he first 800 samples were used as reference data to identify a NLRRR model and the remaining 00 samples were used to test the identified model. Prior to the identification of the NLRRR model, the variables of the reference data set were normalized, i.e. mean centered and scaled to unit variance. he nonlinear function was selected to be second order polynomials. o highlight that the NLRRR algorithm may not calculate pairs of t- and u-scores that have a maximum covariance, each pair of score vectors was evaluated for PLS and NLRRR. he results of this comparison are summarized in able (1). able 1. Variance captured by score models for PLS and NLRRR #LV PLS Model NLRRR Model 1.1083 0.1614 0.180 0.0319 3 0.0003 0.000 4 0.0000 0.0000 It could clearly be seen from able (1), that the PLS model determined pairs of t- and u-scores that described a significantly larger covariance than those computed by the NLRRR model. he model performance of the NLRRR model was evaluated on the reference data using the percentage contribution of each set of LVs and their cumulative percentage contribution captured in the predictor and response matrices. Furthermore, the

mean squared prediction error (MSPE) were also analyzed. able () summarize the performance of the identified NLRRR model. able. Performance of NLRRR model #LV to X ot to X to Y ot to Y MSPE 1 88.38 88.38 4.86 4.86 1.711 11.50 99.88 8.09 50.95 1.4698 3 0.07 99.95 0.17 51.1 1.4648 4 0.05 100.00 0.08 51.0 1.466 Given the construction of the predictor variables, the predictor matrix without the introduction of measurement noise has ran. hus, only two sets of LVs were required to represent the structure between the predictor and response variables. his could also be revealed by the NLRRR models as their variance contribution of the 3 rd and 4 th set of LVs was insignificant for (i) predicting the response matrix and (ii) reconstructing the predictor matrix. Hence, only two score models were required to represented the variation of the predictor and response variables, respectively. 6. CONCLUSIONS In this paper, nonlinear extensions of PLS, introduced by (Wold et al., 1989) and (Baffi et al., 1999a), were analyzed. his analysis revealed, firstly, that these algorithms, in fact, determine a linear RRR model if the nonlinear functions, incorporated in the score models, are replaced by linear functions. Secondly, these algorithms should therefore be seen as nonlinear extensions of RRR, instead of nonlinear extensions to PLS. hirdly, the weight and loading vectors of the response variables point in the same direction. he analysis yielded further that the algorithm by (Baffi et al., 1999a) are based on a gradient descent technique and may therefore produce a sub-optimal solution. In addition, the series of iteration steps may not converge for each set of LVs. Convergence problems, however, have not yet been reported and were also not experienced in this wor. o alleviate the determination of an optimal solution for the associated cost functions and their constraints, (Sharma et al., 003) introduced the application of a Genetic Algorithm strategies. he above findings were demonstrated using an application studies to a synthetic example. Appendix A. LINEAR ERROR-BASED PARIAL LEAS SQUARES For simplicity, the subscripts on matrices and vectors, that indicate which LV is currently being determined, are omitted. For the i th iteration step, the dependency of the weight vector v i, the regression coefficient b i and the residual vector e i upon the weight vector w i is as follows: Y Xw i v i = Y (A.1) Xw i wi b i = X YY Xw i wi X Xw i Y Xw i e i = YY Xw i wi Y Xw X YY Xw i i Xw i wi, X Xw i Y Xw i which implies that the update of the weight vector w i is given by: w i+1 = [ X X ] X YY Xw i Xw i Y Xw i w i. (A.) Consequently, the w-weight vector for the (i+1) th iteration step is calculated as: [ X X ] X YY Xw i w i+1 = [X X] X YY. (A.3) Xw i he above iteration is equivalent to the Power method (Golub and van Loan, 1996) for determining the most dominant eigenvector of the positive definite or semi-definite matrix [ X X ] X YY X. On convergence, the most dominant eigenvalue is equal to w [ X X ] X YY Xw. Appendix B. COS FUNCION AND SOLUION FOR REDUCED RANK REGRESSION he residuals of the response matrix are given by: E = Y tt Y, which is based on the following constraints: (B.1) G (t) = t Xw = 0 G (w) = w X Xw 1 = 0 (B.) Using the above equations, the RRR cost function for w is: J (w) = min w trace { E E } (B.3) = min w trace { Y Y } trace { Y tt Y }. he solution of the above cost function can be obtained as follows: J (w) t λ G(w) t w w = 0 (B.4) and is given by: w X YY X λw X X = 0 λw = [ X X ] X YY Xw. (B.5) Consequently, w is the eigenvector associated with the largest eigenvalue of [ X X ] X YY X.

Appendix C. DEERMINAION OF HE V-WEIGH AND Q-LOADING VECOR he response matrix can be predicted using the prediction of the u-score vector, f (t), and the loading vector q: Y = f (t) q + E, (C.1) where E is a residual matrix. Given the case that f (t) is predetermined, q can be regarded as an independent vector and the minimum variance of E is given by: ( 1 q = f (t) f (t)) f (t) Y, (C.) which is equal to: q = Y f (t) f (t) f (t) Y f (t), (C.3) as f (t) f (t) is a scalar.he v-weight vector is also proportional to Y f (t), which implies that (i) the v-weight and the q-loading vector lie in the same direction and (ii) both are obtained to minimize the variance of the residual matrix E. REFERENCES Baffi, G., E. B. Martin and A. J. Morris (1999a). Non-linear projection to latent structures revisited: the quadratic pls algorithm. Computers and Chemical Engineering 3, 395 411. Baffi, G., E. B. Martin and A. J. Morris (1999b). Non-linear projection to latent structures revisited: the neural networ pls algorithm. Computers and Chemical Engineering 3(9), 193 1307. Bellman, R. (1957). Dynamic Programming. Princeton University Press. Priceton, N.J., U.S.A. Berglund, A. and S. Wold (1997). Inlr, implicit non-linear latent variable regression. Journal of Chemometrics 11, 141 156. Fran, I. E. (1990). A nonlinear pls model. Chemometrics and Intelligent Laboratory Systems 8, 109 119. Geladi, P. and B. R. Kowalsi (1986). Partial least squares regression: A tutorial. Analytica Chimica Acta 185, 31 46. Golub, G. H. and C. F. van Loan (1996). Matrix Computation. 3 ed.. John Hopins. Baltimore. Hiden, H., B. McKay, M. Willis and G. Montague (1998). Non-linear partial least squares using genetic programming. In: Proceedings of the nd Annual Conference on Genetic Programming (GP 98). Madison, Wisconsin, U.S.A. Holcomb,. R. and M. Morari (199). Pls/neural networs. Computers and Chemical Engineering 16(4), 393 411. Hösuldsson, A. (1988). Pls regression models. Journal of Chemometrics, 11 8. Li, B., E. B. Martin and A. J. Morris (001). Boxtidwell transformation based partial least squares regression. Computers and Chemical Engineering 5, 119 133. Patwardhan, R. and S. Lashminarayan (1998). Constraint nonlinear mpc using hammerstein and wiener models. AIChE Journal 44(7), 1611 16. Qin, S. J. and. J. McAvoy (199). Nonlinear pls modelling using neural networs. Computers and Chemical Engineering 16(4), 379 391. Sharma, S., U. Kruger and G. W. Irwin (003). Genetic learning methods for enhanced nonlinear partial least squares. In: Proceedings of the IFAC International Conference on Intelligent ol Systems and Signal Processing (ICONS003). pp. 37 4. Faro, Portugal. Wilson, D. J. H., G. W. Irwin and G. Lightbody (1997). Nonlinear pls modelling using radial basis functions. In: Proceedings of the American ol Conference. Albuquerque, New Mexico. Wold, S. (199). Non-linear partial least squares modelling. ii spline inner functions. Chemometrics and Intelligent Laboratory Systems 14, 71 84. Wold, S., N. Kettaneh-Wold and B Sagerberg (1989). Non-linear pls modelling. Chemometrics and Intelligent Laboratory Systems 7, 53 65.