SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING

Size: px

Start display at page:

Download "SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING"

Roger McLaughlin
6 years ago
Views:

1 SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING Antoni Wibowo Farewell Lecture PPI Ibaraki 27 June 2009 EDUCATION BACKGROUNDS Dr.Eng., Social Systems and Management, Graduate School of Systems and Information Engineering, University of Tsukuba M.Eng., Social Systems Engineering, Graduate School of Systems and Information Engineering, University of Tsukuba M.Sc., Computer Science, University of Indonesia B.Sc./B.Eng., Mathematics Engineering, Sebelas Maret University-1995.

2 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Ordinary Linear Regression (OLR) Regression Analysis: a model of relationship of Y (response variable) and x1, x2,, xp (regressor variables). Ordinary Linear Regression (OLR) Let : random error (a random variable) : regression coefficients. : the response variable in the i-th observation, : the i-th observation of regressor (j=1,,p), : random error on the i-th observation (i=1,,n), : set of real numbers.

3 Ordinary Linear Regression (OLR) The standard OLR model corresponding to model (1.1) Regressor Matrix. Assumption: : N ⅹ N identity matrix Ordinary Linear Regression (OLR) The aim of regression analysis: To find the estimator of, say such that: Solution (1.3) is given by

4 Ordinary Linear Regression (OLR) Let y be the observed data corresponding to Y. Let be the value of when Y is replaced by y in (1.4). Under the assumption that the column vectors of X are linearly independent: Prediction value of y Residual between y and Ordinary Linear Regression (OLR) Root Mean Square Errors (RMSE) The prediction by OLR

5 OLR-Limitations OLR does not yields a nonlinear prediction. The existence of multicollinearity (collinearity) in X can seriously deteriorate the prediction by OLR. variance of becomes a large number. We cannot be confident whether x j makes contribution to the prediction by OLR or not. Remarks: Collinearity is said to exist in X if X T X is a singular matrix. Multicollinearity is said to exist in X if X T X is a nearly singular matrix, i.e., some eigenvalues of X T X are close to zero. Eigenvalues of X T X are nonnegative real numbers. vector a 0 is called an eigenvetor of X T X if X T X a=λa for some scalar λ. The scalar λ is called an eigenvalue of X T X. Example 01: The Household Consumption Table 1: The household consumption data y i : the i-th household consumption expenditure, x i1 : the i-th household income, x i2 : the i-th household wealth. The OLR of the household consumption data :

6 Example 01: The Household Consumption Data Table 1: The household consumption data Multicollinearity /collinearity exists in X. Eigenvalues of X T X: λ 1 =3.4032e+7, λ 2 =6.7952e+1, λ 3 = λ 2 /λ 1 =1.9967e-6, λ 3 /λ 1 =2.9868e-8. Applying OLR to the consumption data: 95% Confidence Interval of β 2 : [ ,0.1485] We cannot be confident whether x 2 makes contribution to this prediction or not. TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

7 PCR AND RR To overcome the effects of multicollinearity (collinearity). 1. Principal Component Regression (PCR) 2. Ridge Regression (RR). PCR 1. Principal Component Regression (PCR): OLR + PC A = PC R Principal Component Analysis (PCA) What is PCA?

8 PCA PCA: Orthogonal transformation. PCA s procedure: PCR=OLR+PCA PCR s Procedure: How to choose r? r : the retained number of principal component for PCR. Estimator of PCR s regression coefficients Limitation: prediction by PCR is linear model.

Example 01: The Household Consumption Data Table 1: The household consumption data Eigenvalues of : λ 1 = 3.8525e+5, λ 2 = 7.5329 λ 2 /λ 1 =1.

9 Example 01: The Household Consumption Data Table 1: The household consumption data Eigenvalues of : λ 1 = e+5, λ 2 = λ 2 /λ 1 =1.9953e-5, r =1 Applying PCR to the household consumption data : -The effects of multicollinearity /collinearity are avoided. - -But, linear prediction regression. 95% Confidence Interval of β 1 : [0.0409,0.0581] RR An appropriate q can be obtained by the cross validation/holdout method. 2. Ridge Regression (RR) : for some q>0 (OLR) (RR) Prediction by ridge regression: Limitation: prediction by RR is linear model.

10 Example 01: The Household Consumption Data Table 1: The household consumption data Applying RR to the household consumption data : q=20 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

11 MOTIVATIONS Outliers : residuals of the observed data are large number. OLR, PCR and RR yield linear prediction. Motivation 1. Equal variance of random errors is assumed. What happens if random errors have unequal variance and the observed data contain multicollinearity /collinearity? Motivation 2. What happens if the observed data contain outliers? Motivation 3. Motivation 1: Linearity. To overcome the limitation of PCR: Rosipal et al, Jade et al., Neural Computing and Application [2001], Journal of Machine Learning [2002] Chemical Engineering Sciences [2003] Neurocomputing [2005] }Kernel Principal Component Regression (KPCR). Hoegaerts et al. However, the existing KPCR has theoretical difficulties in the procedure to obtain the prediction of KPCR. We revise the existing KPCR.

12 Motivation 2: Equal Variances W N : a diagonal matrix. (Standard OLR model) (Feasible WLS model) Weighted Least Squares (WLS) is a widely used technique. Limitation: WLS yields a linear prediction. There is no guarantee that multicollinearity can be avoided. KPCR (KRR) can be inappropriate to be used since they are constructed based on the standard OLR model. We propose two methods: a combination of WLS and KPCR (WLS-KPCR), a combination of WLS and KRR (WLS-KRR). Motivation 3: Sensitive to Outliers OLR, PCR, RR, KPCR and KRR can be inappropriate. M-estimation is a widely used technique to eliminate the effect of the outliers. Limitation: M-estimation yields a linear prediction. Famenko et al. [2006] proposed a nonlinear prediction based on M-estimation. It needs a specific nonlinear model in advance. We propose two methods: Kernel Ridge Regression (KRR) is proposed to overcome the limitation of Ridge Regression. a combination of M-estimation and KPCR (R-KPCR), a combination of M-estimation and KRR (R-KRR). No need to specify a nonlinear model in advance.

13 Remarks: R-KPCR=Robust -Kernel Principal Components Regression, R-KRR = Robust -Kernel Ridge Regression. MOTIVATIONS Model Linear Nonlinear Method (Non Kernel) OLS Rigde Weighted Least Squares (WLS) Ordinary Linear Regression (OLR) Ridge Regression (RR) WLS Linear Regression (WLS-LR) Jukic s regression [2004] Robust M Estimation Famenko [2006] M Estimation Nonparametric Nadaraya [ 1964] - Watson[1964] Nonlinear (Kernel) KPCR, Revised KPCR (Chapter 4) KRR WLS-KPCR, WLS-KRR (Chapter 4-5) R-KPCR, R-KRR (Chapter 4-5) TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

14 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) F is assumed to be an Euclidean space of higher dimension, say p F >> p. Conceptual KPCA PCA KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Unknown explicitly

15 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Problem: we don t know K explicitly. Use Mercer s Theorem:

16 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Choose a symmetric, continuous and p.s.d function κ, There exists φ such that κ(x,z)= φ(x) T φ(z) for any x,z R p. Instead of choosing ψ explicitly, we employ φ as ψ. employ φ as ψ κ is called the kernel function. K is known explicitly now. KPCA Finding eigenvalues/ eigenvectors of, Conceptual KPCA s procedure: Conceptual KPCA via kernel κ. normalized eigenvectors of via kernel κ, it is known explicitly

17 KPCA Let s consider: via kernel κ It is known explicitly Actual KPCA s procedure: When the assumption does not hold, K is replaced by K N =K-EK-KE+EKE, where E NxN =[1/N] The nonlinear principal component corresponding to κ.

18 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. REVISED KPCR Conceptual Revised KPCR via kernel κ, they are known explicitly. PCR Estimator of PCR s regression coefficients Estimator of the revised KPCR s regression coefficients

19 REVISED KPCR (via kernel κ) : the retained number of principal component for the revised KPCR. K and are known explicitly. (via kernel κ) Eq. (3.3) and (3.5) are known explicitly. REVISED KPCR Actual Revised KPCR Summary of the revised KPCR s procedure:

20 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. EXAMPLES: Kernels : Gaussian: Sigmoid: Polynomial :

EXAMPLE 01: The Household Consumption Data Applying the Revised KPCR (Gaussian kernel and =5 ) to the household consumption data: Selection of the best model by AIC: (Model with the smallest AIC is

21 EXAMPLE 01: The Household Consumption Data Applying the Revised KPCR (Gaussian kernel and =5 ) to the household consumption data: Selection of the best model by AIC: (Model with the smallest AIC is the best model) RMSE OLR=5.6960, - Nonlinear prediction AIC OLR= , regression. RMSE PCR=6.2008, -The effects of multicollinearity AIC PCR= , RMSE RR= , (collinearity) are AIC avoided. RR= , AIC The Revised KPCR= RMSE The Revised KPCR= EXAMPLES: The prediction by Nadaraya-Watson Regression: where In our examples: p=1 h 1 is estimated by Bowman-Azzalini s method (h 1ba ). Silverman s method (h 1s ).

22 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).

23 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ) Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).

[2003] used the Gompertz function to fit this data: Table 3: The stock of cars (expressed in Thousand) in the Netherlands. Black circles: original data.

24 EXAMPLE 02: Sinc Function Table 2: Comparison OLR, Nadaraya-Watson regression and the revised KPCR for the sinc function. (#: N-W with Bowman-Azzalini s method; : N-W with Silverman s method) The retained number of PC for the Revised KPCR. EXAMPLE 03: Stock of Cars Jukic et al. [2003] used the Gompertz function to fit this data: Table 3: The stock of cars (expressed in Thousand) in the Netherlands. Black circles: original data. Table 4: Comparison OLR, Nadaraya-Watson regression (#: N- Green: OLR. W with Bowman-Azzalini s method; : N-W with Silverman s Blue: method) and the revised KPCR for the stock (a) of N-W cars. with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian=5.

25 Jukic et al. [2003] used the Gompertz function to fit this data: EXAMPLE 04: The Weight of Chickens Table 5: The weight of female chickens. Table 6: Comparison OLR, Nadaraya-Watson regression Black circles: (#: N-W original with data. Bowman-Azzalini s method; : N-W with Silverman s method) and the Green: OLR. revised KPCR for the female chickens. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.4715). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 05: Growth of the Son Table 8: 7: Comparison Growth of OLR, the Son Nadaraya-Watson [Seber et al., regression 1998, Nonlinear (#: N-W with Programming] Bowman-Azzalini s method; : N-W with Silverman s method) and the revised KPCR for the growth of son. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.8747). Red: Revised KPCR with parameter Gaussian=5.

26 EXAMPLE 06: The Puromicyn Data Table 9: The Puromicyn [Montgomery, 2006, Introduction To Linear Regression Analysis] Table 10: Comparison OLR, Nadaraya-Watson x i : the i-th substrate concentration of the puromycin, regression (#: N-W with Bowman-Azzalini s method; y i : the i-th : reaction N-W with velocity Silverman s of the puromycin, method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =0.2571). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 07: Radioactive Tracer Data Table 11: Radioactive Tracer [Seber et al., 1998, Nonlinear Programming] Table 12: Comparison OLR, Nadaraya-Watson xregression i : the i-th time, (#: N-W with Bowman-Azzalini s ymethod; i : the i-th radioactive : N-W with tracer, Silverman s method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =1.1079). Red: Revised KPCR with parameter Gaussian=5.

27 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. CONCLUSIONS Remark: KPCR=Kernel Principal Components Regression. KRR=Kernel Ridge Regression. KPCR is a novel method to perform nonlinear prediction in regression analysis. We showed that the previous works of KPCR have theoretical difficulty to derive the prediction and to obtain the retained numbers of PCs. We revised the previous KPCR and showed that the difficulties of the previous KPCR were eliminated by the revised KPCR. In our case studies, the revised KPCR together with the Gaussian kernel gives the better results than Jukic s regression does. The revised KPCR together with appropriate parameter of the Gaussian kernel gives better results than Nadaraya- Watson Regression does.

28 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Thank you for your attention.

29 EXAMPLE 08: Sinc Function+Outliers Robust-KPCR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5. EXAMPLE 09: Sine Function+Outliers Robust-KRR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5.

Department of Social Systems and Management. Discussion Paper Series

Department of Social Systems and Management Discussion Paper Series No. 1217 An Algorithm for Nonlinear Weighted Least Squares Regression by Antoni Wibowo September 2008 UNIVERSITY OF TSUKUBA Tsukuba,