SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING
|
|
- Roger McLaughlin
- 6 years ago
- Views:
Transcription
1 SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING Antoni Wibowo Farewell Lecture PPI Ibaraki 27 June 2009 EDUCATION BACKGROUNDS Dr.Eng., Social Systems and Management, Graduate School of Systems and Information Engineering, University of Tsukuba M.Eng., Social Systems Engineering, Graduate School of Systems and Information Engineering, University of Tsukuba M.Sc., Computer Science, University of Indonesia B.Sc./B.Eng., Mathematics Engineering, Sebelas Maret University-1995.
2 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Ordinary Linear Regression (OLR) Regression Analysis: a model of relationship of Y (response variable) and x1, x2,, xp (regressor variables). Ordinary Linear Regression (OLR) Let : random error (a random variable) : regression coefficients. : the response variable in the i-th observation, : the i-th observation of regressor (j=1,,p), : random error on the i-th observation (i=1,,n), : set of real numbers.
3 Ordinary Linear Regression (OLR) The standard OLR model corresponding to model (1.1) Regressor Matrix. Assumption: : N ⅹ N identity matrix Ordinary Linear Regression (OLR) The aim of regression analysis: To find the estimator of, say such that: Solution (1.3) is given by
4 Ordinary Linear Regression (OLR) Let y be the observed data corresponding to Y. Let be the value of when Y is replaced by y in (1.4). Under the assumption that the column vectors of X are linearly independent: Prediction value of y Residual between y and Ordinary Linear Regression (OLR) Root Mean Square Errors (RMSE) The prediction by OLR
5 OLR-Limitations OLR does not yields a nonlinear prediction. The existence of multicollinearity (collinearity) in X can seriously deteriorate the prediction by OLR. variance of becomes a large number. We cannot be confident whether x j makes contribution to the prediction by OLR or not. Remarks: Collinearity is said to exist in X if X T X is a singular matrix. Multicollinearity is said to exist in X if X T X is a nearly singular matrix, i.e., some eigenvalues of X T X are close to zero. Eigenvalues of X T X are nonnegative real numbers. vector a 0 is called an eigenvetor of X T X if X T X a=λa for some scalar λ. The scalar λ is called an eigenvalue of X T X. Example 01: The Household Consumption Table 1: The household consumption data y i : the i-th household consumption expenditure, x i1 : the i-th household income, x i2 : the i-th household wealth. The OLR of the household consumption data :
6 Example 01: The Household Consumption Data Table 1: The household consumption data Multicollinearity /collinearity exists in X. Eigenvalues of X T X: λ 1 =3.4032e+7, λ 2 =6.7952e+1, λ 3 = λ 2 /λ 1 =1.9967e-6, λ 3 /λ 1 =2.9868e-8. Applying OLR to the consumption data: 95% Confidence Interval of β 2 : [ ,0.1485] We cannot be confident whether x 2 makes contribution to this prediction or not. TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.
7 PCR AND RR To overcome the effects of multicollinearity (collinearity). 1. Principal Component Regression (PCR) 2. Ridge Regression (RR). PCR 1. Principal Component Regression (PCR): OLR + PC A = PC R Principal Component Analysis (PCA) What is PCA?
8 PCA PCA: Orthogonal transformation. PCA s procedure: PCR=OLR+PCA PCR s Procedure: How to choose r? r : the retained number of principal component for PCR. Estimator of PCR s regression coefficients Limitation: prediction by PCR is linear model.
9 Example 01: The Household Consumption Data Table 1: The household consumption data Eigenvalues of : λ 1 = e+5, λ 2 = λ 2 /λ 1 =1.9953e-5, r =1 Applying PCR to the household consumption data : -The effects of multicollinearity /collinearity are avoided. - -But, linear prediction regression. 95% Confidence Interval of β 1 : [0.0409,0.0581] RR An appropriate q can be obtained by the cross validation/holdout method. 2. Ridge Regression (RR) : for some q>0 (OLR) (RR) Prediction by ridge regression: Limitation: prediction by RR is linear model.
10 Example 01: The Household Consumption Data Table 1: The household consumption data Applying RR to the household consumption data : q=20 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.
11 MOTIVATIONS Outliers : residuals of the observed data are large number. OLR, PCR and RR yield linear prediction. Motivation 1. Equal variance of random errors is assumed. What happens if random errors have unequal variance and the observed data contain multicollinearity /collinearity? Motivation 2. What happens if the observed data contain outliers? Motivation 3. Motivation 1: Linearity. To overcome the limitation of PCR: Rosipal et al, Jade et al., Neural Computing and Application [2001], Journal of Machine Learning [2002] Chemical Engineering Sciences [2003] Neurocomputing [2005] }Kernel Principal Component Regression (KPCR). Hoegaerts et al. However, the existing KPCR has theoretical difficulties in the procedure to obtain the prediction of KPCR. We revise the existing KPCR.
12 Motivation 2: Equal Variances W N : a diagonal matrix. (Standard OLR model) (Feasible WLS model) Weighted Least Squares (WLS) is a widely used technique. Limitation: WLS yields a linear prediction. There is no guarantee that multicollinearity can be avoided. KPCR (KRR) can be inappropriate to be used since they are constructed based on the standard OLR model. We propose two methods: a combination of WLS and KPCR (WLS-KPCR), a combination of WLS and KRR (WLS-KRR). Motivation 3: Sensitive to Outliers OLR, PCR, RR, KPCR and KRR can be inappropriate. M-estimation is a widely used technique to eliminate the effect of the outliers. Limitation: M-estimation yields a linear prediction. Famenko et al. [2006] proposed a nonlinear prediction based on M-estimation. It needs a specific nonlinear model in advance. We propose two methods: Kernel Ridge Regression (KRR) is proposed to overcome the limitation of Ridge Regression. a combination of M-estimation and KPCR (R-KPCR), a combination of M-estimation and KRR (R-KRR). No need to specify a nonlinear model in advance.
13 Remarks: R-KPCR=Robust -Kernel Principal Components Regression, R-KRR = Robust -Kernel Ridge Regression. MOTIVATIONS Model Linear Nonlinear Method (Non Kernel) OLS Rigde Weighted Least Squares (WLS) Ordinary Linear Regression (OLR) Ridge Regression (RR) WLS Linear Regression (WLS-LR) Jukic s regression [2004] Robust M Estimation Famenko [2006] M Estimation Nonparametric Nadaraya [ 1964] - Watson[1964] Nonlinear (Kernel) KPCR, Revised KPCR (Chapter 4) KRR WLS-KPCR, WLS-KRR (Chapter 4-5) R-KPCR, R-KRR (Chapter 4-5) TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.
14 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) F is assumed to be an Euclidean space of higher dimension, say p F >> p. Conceptual KPCA PCA KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Unknown explicitly
15 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Problem: we don t know K explicitly. Use Mercer s Theorem:
16 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Choose a symmetric, continuous and p.s.d function κ, There exists φ such that κ(x,z)= φ(x) T φ(z) for any x,z R p. Instead of choosing ψ explicitly, we employ φ as ψ. employ φ as ψ κ is called the kernel function. K is known explicitly now. KPCA Finding eigenvalues/ eigenvectors of, Conceptual KPCA s procedure: Conceptual KPCA via kernel κ. normalized eigenvectors of via kernel κ, it is known explicitly
17 KPCA Let s consider: via kernel κ It is known explicitly Actual KPCA s procedure: When the assumption does not hold, K is replaced by K N =K-EK-KE+EKE, where E NxN =[1/N] The nonlinear principal component corresponding to κ.
18 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. REVISED KPCR Conceptual Revised KPCR via kernel κ, they are known explicitly. PCR Estimator of PCR s regression coefficients Estimator of the revised KPCR s regression coefficients
19 REVISED KPCR (via kernel κ) : the retained number of principal component for the revised KPCR. K and are known explicitly. (via kernel κ) Eq. (3.3) and (3.5) are known explicitly. REVISED KPCR Actual Revised KPCR Summary of the revised KPCR s procedure:
20 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. EXAMPLES: Kernels : Gaussian: Sigmoid: Polynomial :
21 EXAMPLE 01: The Household Consumption Data Applying the Revised KPCR (Gaussian kernel and =5 ) to the household consumption data: Selection of the best model by AIC: (Model with the smallest AIC is the best model) RMSE OLR=5.6960, - Nonlinear prediction AIC OLR= , regression. RMSE PCR=6.2008, -The effects of multicollinearity AIC PCR= , RMSE RR= , (collinearity) are AIC avoided. RR= , AIC The Revised KPCR= RMSE The Revised KPCR= EXAMPLES: The prediction by Nadaraya-Watson Regression: where In our examples: p=1 h 1 is estimated by Bowman-Azzalini s method (h 1ba ). Silverman s method (h 1s ).
22 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).
23 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ) Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).
24 EXAMPLE 02: Sinc Function Table 2: Comparison OLR, Nadaraya-Watson regression and the revised KPCR for the sinc function. (#: N-W with Bowman-Azzalini s method; : N-W with Silverman s method) The retained number of PC for the Revised KPCR. EXAMPLE 03: Stock of Cars Jukic et al. [2003] used the Gompertz function to fit this data: Table 3: The stock of cars (expressed in Thousand) in the Netherlands. Black circles: original data. Table 4: Comparison OLR, Nadaraya-Watson regression (#: N- Green: OLR. W with Bowman-Azzalini s method; : N-W with Silverman s Blue: method) and the revised KPCR for the stock (a) of N-W cars. with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian=5.
25 Jukic et al. [2003] used the Gompertz function to fit this data: EXAMPLE 04: The Weight of Chickens Table 5: The weight of female chickens. Table 6: Comparison OLR, Nadaraya-Watson regression Black circles: (#: N-W original with data. Bowman-Azzalini s method; : N-W with Silverman s method) and the Green: OLR. revised KPCR for the female chickens. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.4715). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 05: Growth of the Son Table 8: 7: Comparison Growth of OLR, the Son Nadaraya-Watson [Seber et al., regression 1998, Nonlinear (#: N-W with Programming] Bowman-Azzalini s method; : N-W with Silverman s method) and the revised KPCR for the growth of son. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.8747). Red: Revised KPCR with parameter Gaussian=5.
26 EXAMPLE 06: The Puromicyn Data Table 9: The Puromicyn [Montgomery, 2006, Introduction To Linear Regression Analysis] Table 10: Comparison OLR, Nadaraya-Watson x i : the i-th substrate concentration of the puromycin, regression (#: N-W with Bowman-Azzalini s method; y i : the i-th : reaction N-W with velocity Silverman s of the puromycin, method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =0.2571). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 07: Radioactive Tracer Data Table 11: Radioactive Tracer [Seber et al., 1998, Nonlinear Programming] Table 12: Comparison OLR, Nadaraya-Watson xregression i : the i-th time, (#: N-W with Bowman-Azzalini s ymethod; i : the i-th radioactive : N-W with tracer, Silverman s method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =1.1079). Red: Revised KPCR with parameter Gaussian=5.
27 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. CONCLUSIONS Remark: KPCR=Kernel Principal Components Regression. KRR=Kernel Ridge Regression. KPCR is a novel method to perform nonlinear prediction in regression analysis. We showed that the previous works of KPCR have theoretical difficulty to derive the prediction and to obtain the retained numbers of PCs. We revised the previous KPCR and showed that the difficulties of the previous KPCR were eliminated by the revised KPCR. In our case studies, the revised KPCR together with the Gaussian kernel gives the better results than Jukic s regression does. The revised KPCR together with appropriate parameter of the Gaussian kernel gives better results than Nadaraya- Watson Regression does.
28 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Thank you for your attention.
29 EXAMPLE 08: Sinc Function+Outliers Robust-KPCR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5. EXAMPLE 09: Sine Function+Outliers Robust-KRR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5.
Department of Social Systems and Management. Discussion Paper Series
Department of Social Systems and Management Discussion Paper Series No. 1217 An Algorithm for Nonlinear Weighted Least Squares Regression by Antoni Wibowo September 2008 UNIVERSITY OF TSUKUBA Tsukuba,
More informationChemometrics. Matti Hotokka Physical chemistry Åbo Akademi University
Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationNonparametric Principal Components Regression
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS031) p.4574 Nonparametric Principal Components Regression Barrios, Erniel University of the Philippines Diliman,
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 7: Multicollinearity Egypt Scholars Economic Society November 22, 2014 Assignment & feedback Multicollinearity enter classroom at room name c28efb78 http://b.socrative.com/login/student/
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationSupport Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar
Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationLinear Algebra Practice Problems
Linear Algebra Practice Problems Math 24 Calculus III Summer 25, Session II. Determine whether the given set is a vector space. If not, give at least one axiom that is not satisfied. Unless otherwise stated,
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationAccounting for measurement uncertainties in industrial data analysis
Accounting for measurement uncertainties in industrial data analysis Marco S. Reis * ; Pedro M. Saraiva GEPSI-PSE Group, Department of Chemical Engineering, University of Coimbra Pólo II Pinhal de Marrocos,
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationEach new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationRobustness of Principal Components
PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationRidge Regression Revisited
Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationRegression Analysis By Example
Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationFace Recognition and Biometric Systems
The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face
More informationDistance Preservation - Part 2
Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More informationLearning with Singular Vectors
Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationTheorems. Least squares regression
Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationUnsupervised Learning Methods
Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational
More informationKernel-Based Retrieval of Atmospheric Profiles from IASI Data
Kernel-Based Retrieval of Atmospheric Profiles from IASI Data Gustavo Camps-Valls, Valero Laparra, Jordi Muñoz-Marí, Luis Gómez-Chova, Xavier Calbet Image Processing Laboratory (IPL), Universitat de València.
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationCS 340 Lec. 15: Linear Regression
CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationKernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA
Kernel-Based Principal Component Analysis (KPCA) and Its Applications 4//009 Based on slides originaly from Dr. John Tan 1 Nonlinear PCA Natural phenomena are usually nonlinear and standard PCA is intrinsically
More informationLearning SVM Classifiers with Indefinite Kernels
Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in
More informationApplied Linear Algebra in Geoscience Using MATLAB
Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationCourse in Data Science
Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationCOMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR
Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study
More informationDIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux
Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple
More informationII. Linear Models (pp.47-70)
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Agree or disagree: Regression can be always reduced to classification. Explain, either way! A certain classifier scores 98% on the training set,
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More information1 Kernel methods & optimization
Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationLECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity
LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationSupport Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature
Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].
More informationEcon 510 B. Brown Spring 2014 Final Exam Answers
Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity
More informationResponse Surface Methodology
Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona
More informationLecture 6 Sept Data Visualization STAT 442 / 890, CM 462
Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components
More informationMultilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)
Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Pepi Zulvia, Anang Kurnia, and Agus M. Soleh Citation: AIP Conference
More informationLinear Regression and Its Applications
Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationRegression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression
Simple Linear Multiple Linear Polynomial Linear Decision Tree Random Forest Computational Intelligence in Complex Decision Systems 1 / 28 analysis In statistical modeling, regression analysis is a set
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More information4 Bias-Variance for Ridge Regression (24 points)
2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationLinear Algebra Practice Problems
Linear Algebra Practice Problems Page of 7 Linear Algebra Practice Problems These problems cover Chapters 4, 5, 6, and 7 of Elementary Linear Algebra, 6th ed, by Ron Larson and David Falvo (ISBN-3 = 978--68-78376-2,
More informationMultiple Regression Analysis
1 OUTLINE Basic Concept: Multiple Regression MULTICOLLINEARITY AUTOCORRELATION HETEROSCEDASTICITY REASEARCH IN FINANCE 2 BASIC CONCEPTS: Multiple Regression Y i = β 1 + β 2 X 1i + β 3 X 2i + β 4 X 3i +
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationNonrobust and Robust Objective Functions
Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLinear Models 1. Isfahan University of Technology Fall Semester, 2014
Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and
More informationComputational Methods. Eigenvalues and Singular Values
Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationLinear Regression Linear Regression with Shrinkage
Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle
More informationLinear Algebra for Machine Learning. Sargur N. Srihari
Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it
More informationAn Introduction to Independent Components Analysis (ICA)
An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce
More information