SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING

Size: px
Start display at page:

Download "SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING"

Transcription

1 SOME APPLICATIONS: NONLINEAR REGRESSIONS BASED ON KERNEL METHOD IN SOCIAL SCIENCES AND ENGINEERING Antoni Wibowo Farewell Lecture PPI Ibaraki 27 June 2009 EDUCATION BACKGROUNDS Dr.Eng., Social Systems and Management, Graduate School of Systems and Information Engineering, University of Tsukuba M.Eng., Social Systems Engineering, Graduate School of Systems and Information Engineering, University of Tsukuba M.Sc., Computer Science, University of Indonesia B.Sc./B.Eng., Mathematics Engineering, Sebelas Maret University-1995.

2 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Ordinary Linear Regression (OLR) Regression Analysis: a model of relationship of Y (response variable) and x1, x2,, xp (regressor variables). Ordinary Linear Regression (OLR) Let : random error (a random variable) : regression coefficients. : the response variable in the i-th observation, : the i-th observation of regressor (j=1,,p), : random error on the i-th observation (i=1,,n), : set of real numbers.

3 Ordinary Linear Regression (OLR) The standard OLR model corresponding to model (1.1) Regressor Matrix. Assumption: : N ⅹ N identity matrix Ordinary Linear Regression (OLR) The aim of regression analysis: To find the estimator of, say such that: Solution (1.3) is given by

4 Ordinary Linear Regression (OLR) Let y be the observed data corresponding to Y. Let be the value of when Y is replaced by y in (1.4). Under the assumption that the column vectors of X are linearly independent: Prediction value of y Residual between y and Ordinary Linear Regression (OLR) Root Mean Square Errors (RMSE) The prediction by OLR

5 OLR-Limitations OLR does not yields a nonlinear prediction. The existence of multicollinearity (collinearity) in X can seriously deteriorate the prediction by OLR. variance of becomes a large number. We cannot be confident whether x j makes contribution to the prediction by OLR or not. Remarks: Collinearity is said to exist in X if X T X is a singular matrix. Multicollinearity is said to exist in X if X T X is a nearly singular matrix, i.e., some eigenvalues of X T X are close to zero. Eigenvalues of X T X are nonnegative real numbers. vector a 0 is called an eigenvetor of X T X if X T X a=λa for some scalar λ. The scalar λ is called an eigenvalue of X T X. Example 01: The Household Consumption Table 1: The household consumption data y i : the i-th household consumption expenditure, x i1 : the i-th household income, x i2 : the i-th household wealth. The OLR of the household consumption data :

6 Example 01: The Household Consumption Data Table 1: The household consumption data Multicollinearity /collinearity exists in X. Eigenvalues of X T X: λ 1 =3.4032e+7, λ 2 =6.7952e+1, λ 3 = λ 2 /λ 1 =1.9967e-6, λ 3 /λ 1 =2.9868e-8. Applying OLR to the consumption data: 95% Confidence Interval of β 2 : [ ,0.1485] We cannot be confident whether x 2 makes contribution to this prediction or not. TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

7 PCR AND RR To overcome the effects of multicollinearity (collinearity). 1. Principal Component Regression (PCR) 2. Ridge Regression (RR). PCR 1. Principal Component Regression (PCR): OLR + PC A = PC R Principal Component Analysis (PCA) What is PCA?

8 PCA PCA: Orthogonal transformation. PCA s procedure: PCR=OLR+PCA PCR s Procedure: How to choose r? r : the retained number of principal component for PCR. Estimator of PCR s regression coefficients Limitation: prediction by PCR is linear model.

9 Example 01: The Household Consumption Data Table 1: The household consumption data Eigenvalues of : λ 1 = e+5, λ 2 = λ 2 /λ 1 =1.9953e-5, r =1 Applying PCR to the household consumption data : -The effects of multicollinearity /collinearity are avoided. - -But, linear prediction regression. 95% Confidence Interval of β 1 : [0.0409,0.0581] RR An appropriate q can be obtained by the cross validation/holdout method. 2. Ridge Regression (RR) : for some q>0 (OLR) (RR) Prediction by ridge regression: Limitation: prediction by RR is linear model.

10 Example 01: The Household Consumption Data Table 1: The household consumption data Applying RR to the household consumption data : q=20 TABLE OF CONTENTS Introduction. Ordinary Linear Regression (OLR). Principal Component Regression and Ridge Regression. Motivations. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

11 MOTIVATIONS Outliers : residuals of the observed data are large number. OLR, PCR and RR yield linear prediction. Motivation 1. Equal variance of random errors is assumed. What happens if random errors have unequal variance and the observed data contain multicollinearity /collinearity? Motivation 2. What happens if the observed data contain outliers? Motivation 3. Motivation 1: Linearity. To overcome the limitation of PCR: Rosipal et al, Jade et al., Neural Computing and Application [2001], Journal of Machine Learning [2002] Chemical Engineering Sciences [2003] Neurocomputing [2005] }Kernel Principal Component Regression (KPCR). Hoegaerts et al. However, the existing KPCR has theoretical difficulties in the procedure to obtain the prediction of KPCR. We revise the existing KPCR.

12 Motivation 2: Equal Variances W N : a diagonal matrix. (Standard OLR model) (Feasible WLS model) Weighted Least Squares (WLS) is a widely used technique. Limitation: WLS yields a linear prediction. There is no guarantee that multicollinearity can be avoided. KPCR (KRR) can be inappropriate to be used since they are constructed based on the standard OLR model. We propose two methods: a combination of WLS and KPCR (WLS-KPCR), a combination of WLS and KRR (WLS-KRR). Motivation 3: Sensitive to Outliers OLR, PCR, RR, KPCR and KRR can be inappropriate. M-estimation is a widely used technique to eliminate the effect of the outliers. Limitation: M-estimation yields a linear prediction. Famenko et al. [2006] proposed a nonlinear prediction based on M-estimation. It needs a specific nonlinear model in advance. We propose two methods: Kernel Ridge Regression (KRR) is proposed to overcome the limitation of Ridge Regression. a combination of M-estimation and KPCR (R-KPCR), a combination of M-estimation and KRR (R-KRR). No need to specify a nonlinear model in advance.

13 Remarks: R-KPCR=Robust -Kernel Principal Components Regression, R-KRR = Robust -Kernel Ridge Regression. MOTIVATIONS Model Linear Nonlinear Method (Non Kernel) OLS Rigde Weighted Least Squares (WLS) Ordinary Linear Regression (OLR) Ridge Regression (RR) WLS Linear Regression (WLS-LR) Jukic s regression [2004] Robust M Estimation Famenko [2006] M Estimation Nonparametric Nadaraya [ 1964] - Watson[1964] Nonlinear (Kernel) KPCR, Revised KPCR (Chapter 4) KRR WLS-KPCR, WLS-KRR (Chapter 4-5) R-KPCR, R-KRR (Chapter 4-5) TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions.

14 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) F is assumed to be an Euclidean space of higher dimension, say p F >> p. Conceptual KPCA PCA KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Unknown explicitly

15 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Problem: we don t know K explicitly. Use Mercer s Theorem:

16 KERNEL PRINCIPAL COMPONENT ANALYSIS (KPCA) Choose a symmetric, continuous and p.s.d function κ, There exists φ such that κ(x,z)= φ(x) T φ(z) for any x,z R p. Instead of choosing ψ explicitly, we employ φ as ψ. employ φ as ψ κ is called the kernel function. K is known explicitly now. KPCA Finding eigenvalues/ eigenvectors of, Conceptual KPCA s procedure: Conceptual KPCA via kernel κ. normalized eigenvectors of via kernel κ, it is known explicitly

17 KPCA Let s consider: via kernel κ It is known explicitly Actual KPCA s procedure: When the assumption does not hold, K is replaced by K N =K-EK-KE+EKE, where E NxN =[1/N] The nonlinear principal component corresponding to κ.

18 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. REVISED KPCR Conceptual Revised KPCR via kernel κ, they are known explicitly. PCR Estimator of PCR s regression coefficients Estimator of the revised KPCR s regression coefficients

19 REVISED KPCR (via kernel κ) : the retained number of principal component for the revised KPCR. K and are known explicitly. (via kernel κ) Eq. (3.3) and (3.5) are known explicitly. REVISED KPCR Actual Revised KPCR Summary of the revised KPCR s procedure:

20 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. EXAMPLES: Kernels : Gaussian: Sigmoid: Polynomial :

21 EXAMPLE 01: The Household Consumption Data Applying the Revised KPCR (Gaussian kernel and =5 ) to the household consumption data: Selection of the best model by AIC: (Model with the smallest AIC is the best model) RMSE OLR=5.6960, - Nonlinear prediction AIC OLR= , regression. RMSE PCR=6.2008, -The effects of multicollinearity AIC PCR= , RMSE RR= , (collinearity) are AIC avoided. RR= , AIC The Revised KPCR= RMSE The Revised KPCR= EXAMPLES: The prediction by Nadaraya-Watson Regression: where In our examples: p=1 h 1 is estimated by Bowman-Azzalini s method (h 1ba ). Silverman s method (h 1s ).

22 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Bowman - Azzalini s method (h 1ba =0.6967). Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).

23 EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian= 5. Training data (Standard deviation of noise=0.2). EXAMPLE 02: Sinc Function Black circles: original data. Black dots: original data with noise. Green: OLR. Blue: Nadaraya-Watson with Silverman s method (h 1s = ) Red: Revised KPCR with parameter Gaussian= 5. Testing data (Standard deviation of noise=0.5).

24 EXAMPLE 02: Sinc Function Table 2: Comparison OLR, Nadaraya-Watson regression and the revised KPCR for the sinc function. (#: N-W with Bowman-Azzalini s method; : N-W with Silverman s method) The retained number of PC for the Revised KPCR. EXAMPLE 03: Stock of Cars Jukic et al. [2003] used the Gompertz function to fit this data: Table 3: The stock of cars (expressed in Thousand) in the Netherlands. Black circles: original data. Table 4: Comparison OLR, Nadaraya-Watson regression (#: N- Green: OLR. W with Bowman-Azzalini s method; : N-W with Silverman s Blue: method) and the revised KPCR for the stock (a) of N-W cars. with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s = ). Red: Revised KPCR with parameter Gaussian=5.

25 Jukic et al. [2003] used the Gompertz function to fit this data: EXAMPLE 04: The Weight of Chickens Table 5: The weight of female chickens. Table 6: Comparison OLR, Nadaraya-Watson regression Black circles: (#: N-W original with data. Bowman-Azzalini s method; : N-W with Silverman s method) and the Green: OLR. revised KPCR for the female chickens. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.4715). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 05: Growth of the Son Table 8: 7: Comparison Growth of OLR, the Son Nadaraya-Watson [Seber et al., regression 1998, Nonlinear (#: N-W with Programming] Bowman-Azzalini s method; : N-W with Silverman s method) and the revised KPCR for the growth of son. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =2.8747). Red: Revised KPCR with parameter Gaussian=5.

26 EXAMPLE 06: The Puromicyn Data Table 9: The Puromicyn [Montgomery, 2006, Introduction To Linear Regression Analysis] Table 10: Comparison OLR, Nadaraya-Watson x i : the i-th substrate concentration of the puromycin, regression (#: N-W with Bowman-Azzalini s method; y i : the i-th : reaction N-W with velocity Silverman s of the puromycin, method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =0.2571). Red: Revised KPCR with parameter Gaussian=5. EXAMPLE 07: Radioactive Tracer Data Table 11: Radioactive Tracer [Seber et al., 1998, Nonlinear Programming] Table 12: Comparison OLR, Nadaraya-Watson xregression i : the i-th time, (#: N-W with Bowman-Azzalini s ymethod; i : the i-th radioactive : N-W with tracer, Silverman s method) and the revised KPCR for the puromicyn. Black circles: original data. Green: OLR. Blue: (a) N-W with Bowman- Azzalini s method (h 1s = ). (b) N-W with Silverman s method (h 1s =1.1079). Red: Revised KPCR with parameter Gaussian=5.

27 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. CONCLUSIONS Remark: KPCR=Kernel Principal Components Regression. KRR=Kernel Ridge Regression. KPCR is a novel method to perform nonlinear prediction in regression analysis. We showed that the previous works of KPCR have theoretical difficulty to derive the prediction and to obtain the retained numbers of PCs. We revised the previous KPCR and showed that the difficulties of the previous KPCR were eliminated by the revised KPCR. In our case studies, the revised KPCR together with the Gaussian kernel gives the better results than Jukic s regression does. The revised KPCR together with appropriate parameter of the Gaussian kernel gives better results than Nadaraya- Watson Regression does.

28 TABLE OF CONTENTS Introduction. Kernel Principal Component Analysis. Kernel Principal Component Regression (KPCR). Kernel Ridge Regression (KRR). Weighted Least Squares-KPCR Weighted Least Squares-KRR. Robust KPCR. Robust KRR. Numerical Examples. Conclusions. Thank you for your attention.

29 EXAMPLE 08: Sinc Function+Outliers Robust-KPCR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5. EXAMPLE 09: Sine Function+Outliers Robust-KRR Black dots: original data+noise. Green: OLR. Magenta:M-Estimation Blue: Revised KPCR with par. Gaussian=5 Red: Robust KPCR with par. Gaussian=5.

Department of Social Systems and Management. Discussion Paper Series

Department of Social Systems and Management. Discussion Paper Series Department of Social Systems and Management Discussion Paper Series No. 1217 An Algorithm for Nonlinear Weighted Least Squares Regression by Antoni Wibowo September 2008 UNIVERSITY OF TSUKUBA Tsukuba,

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Nonparametric Principal Components Regression

Nonparametric Principal Components Regression Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS031) p.4574 Nonparametric Principal Components Regression Barrios, Erniel University of the Philippines Diliman,

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 7: Multicollinearity Egypt Scholars Economic Society November 22, 2014 Assignment & feedback Multicollinearity enter classroom at room name c28efb78 http://b.socrative.com/login/student/

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Linear Algebra Practice Problems

Linear Algebra Practice Problems Linear Algebra Practice Problems Math 24 Calculus III Summer 25, Session II. Determine whether the given set is a vector space. If not, give at least one axiom that is not satisfied. Unless otherwise stated,

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Accounting for measurement uncertainties in industrial data analysis

Accounting for measurement uncertainties in industrial data analysis Accounting for measurement uncertainties in industrial data analysis Marco S. Reis * ; Pedro M. Saraiva GEPSI-PSE Group, Department of Chemical Engineering, University of Coimbra Pólo II Pinhal de Marrocos,

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Ridge Regression Revisited

Ridge Regression Revisited Ridge Regression Revisited Paul M.C. de Boer Christian M. Hafner Econometric Institute Report EI 2005-29 In general ridge (GR) regression p ridge parameters have to be determined, whereas simple ridge

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Regression Analysis By Example

Regression Analysis By Example Regression Analysis By Example Third Edition SAMPRIT CHATTERJEE New York University ALI S. HADI Cornell University BERTRAM PRICE Price Associates, Inc. A Wiley-Interscience Publication JOHN WILEY & SONS,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Face Recognition and Biometric Systems

Face Recognition and Biometric Systems The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Chapter 3 Transformations

Chapter 3 Transformations Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Unsupervised Learning Methods

Unsupervised Learning Methods Structural Health Monitoring Using Statistical Pattern Recognition Unsupervised Learning Methods Keith Worden and Graeme Manson Presented by Keith Worden The Structural Health Monitoring Process 1. Operational

More information

Kernel-Based Retrieval of Atmospheric Profiles from IASI Data

Kernel-Based Retrieval of Atmospheric Profiles from IASI Data Kernel-Based Retrieval of Atmospheric Profiles from IASI Data Gustavo Camps-Valls, Valero Laparra, Jordi Muñoz-Marí, Luis Gómez-Chova, Xavier Calbet Image Processing Laboratory (IPL), Universitat de València.

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

CS 340 Lec. 15: Linear Regression

CS 340 Lec. 15: Linear Regression CS 340 Lec. 15: Linear Regression AD February 2011 AD () February 2011 1 / 31 Regression Assume you are given some training data { x i, y i } N where x i R d and y i R c. Given an input test data x, you

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA

Kernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA Kernel-Based Principal Component Analysis (KPCA) and Its Applications 4//009 Based on slides originaly from Dr. John Tan 1 Nonlinear PCA Natural phenomena are usually nonlinear and standard PCA is intrinsically

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

Applied Linear Algebra in Geoscience Using MATLAB

Applied Linear Algebra in Geoscience Using MATLAB Applied Linear Algebra in Geoscience Using MATLAB Contents Getting Started Creating Arrays Mathematical Operations with Arrays Using Script Files and Managing Data Two-Dimensional Plots Programming in

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR

COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Noname manuscript No. (will be inserted by the editor) COMBINING THE LIU-TYPE ESTIMATOR AND THE PRINCIPAL COMPONENT REGRESSION ESTIMATOR Deniz Inan Received: date / Accepted: date Abstract In this study

More information

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux

DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION. P. Filzmoser and C. Croux Pliska Stud. Math. Bulgar. 003), 59 70 STUDIA MATHEMATICA BULGARICA DIMENSION REDUCTION OF THE EXPLANATORY VARIABLES IN MULTIPLE LINEAR REGRESSION P. Filzmoser and C. Croux Abstract. In classical multiple

More information

II. Linear Models (pp.47-70)

II. Linear Models (pp.47-70) Notation: Means pencil-and-paper QUIZ Means coding QUIZ Agree or disagree: Regression can be always reduced to classification. Explain, either way! A certain classifier scores 98% on the training set,

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity LECTURE 10 Introduction to Econometrics Multicollinearity & Heteroskedasticity November 22, 2016 1 / 23 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature

Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature Support Vector Regression (SVR) Descriptions of SVR in this discussion follow that in Refs. (2, 6, 7, 8, 9). The literature suggests the design variables should be normalized to a range of [-1,1] or [0,1].

More information

Econ 510 B. Brown Spring 2014 Final Exam Answers

Econ 510 B. Brown Spring 2014 Final Exam Answers Econ 510 B. Brown Spring 2014 Final Exam Answers Answer five of the following questions. You must answer question 7. The question are weighted equally. You have 2.5 hours. You may use a calculator. Brevity

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Process and Product Optimization Using Designed Experiments Second Edition RAYMOND H. MYERS Virginia Polytechnic Institute and State University DOUGLAS C. MONTGOMERY Arizona

More information

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462

Lecture 6 Sept Data Visualization STAT 442 / 890, CM 462 Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components

More information

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java)

Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Multilevel modeling and panel data analysis in educational research (Case study: National examination data senior high school in West Java) Pepi Zulvia, Anang Kurnia, and Agus M. Soleh Citation: AIP Conference

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Regression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression

Regression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression Simple Linear Multiple Linear Polynomial Linear Decision Tree Random Forest Computational Intelligence in Complex Decision Systems 1 / 28 analysis In statistical modeling, regression analysis is a set

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) 2 count = 0 3 for x in self.x_test_ridge: 4 5 prediction = np.matmul(self.w_ridge,x) 6 ###ADD THE COMPUTED MEAN BACK TO THE PREDICTED VECTOR### 7 prediction = self.ss_y.inverse_transform(prediction) 8

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Linear Algebra Practice Problems

Linear Algebra Practice Problems Linear Algebra Practice Problems Page of 7 Linear Algebra Practice Problems These problems cover Chapters 4, 5, 6, and 7 of Elementary Linear Algebra, 6th ed, by Ron Larson and David Falvo (ISBN-3 = 978--68-78376-2,

More information

Multiple Regression Analysis

Multiple Regression Analysis 1 OUTLINE Basic Concept: Multiple Regression MULTICOLLINEARITY AUTOCORRELATION HETEROSCEDASTICITY REASEARCH IN FINANCE 2 BASIC CONCEPTS: Multiple Regression Y i = β 1 + β 2 X 1i + β 3 X 2i + β 4 X 3i +

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Nonrobust and Robust Objective Functions

Nonrobust and Robust Objective Functions Nonrobust and Robust Objective Functions The objective function of the estimators in the input space is built from the sum of squared Mahalanobis distances (residuals) d 2 i = 1 σ 2(y i y io ) C + y i

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

An Introduction to Independent Components Analysis (ICA)

An Introduction to Independent Components Analysis (ICA) An Introduction to Independent Components Analysis (ICA) Anish R. Shah, CFA Northfield Information Services Anish@northinfo.com Newport Jun 6, 2008 1 Overview of Talk Review principal components Introduce

More information