Content. Learning Goal. Regression vs Classification. Support Vector Machines. SVM Context

Size: px
Start display at page:

Download "Content. Learning Goal. Regression vs Classification. Support Vector Machines. SVM Context"

Transcription

1 Content Andrew Kusiak 39 Seamans Center Iowa City, IA (Based on the material provided by Professor. Kecman) Introduction to learning from eamples Support ector Machines vs Neural Networks Quadratic Programming (QP)-based learning Linear programming based learning Regression and classification by Linear Programming Eamples Learning Goal Learning from data, i.e., eamples, samples, measurements, records, observations, patterns. Getting the data, transferring it, filtering it, compressing it, using it, reusing it, etc. Regression vs Classification Regression a.k.a. function approimation and Classification a.k.a. pattern recognition 3 Support ector Machines SMs for multi-class problems (Weston and Watkins 998, Kindermann and Paass ) SMs for density estimation (Smola & Schoelkopf998) he theory of C bounds (apnik 995 and 998) SM Contet Relationship between SMs NNs Classical techniques such as Fourier series and polynomial approimations 5

2 Fourier Series Fourier Series Represented in NN Form AMPLIUDES and PHASES of sine (cosine) waves are not known, but frequencies are known [because Joseph Fourier has selected frequencies for us] and they are INEGER multiplies of some pre -selected base frequency. 7 v is prescribed v ji N F() = ak sin( k ), or bk cos( k ), orboth k= n Amplitude Frequency + y w y y w j j y j+ Linear learning Note: Learning frequencies is nonlinear 8 Eample () Assume the following model y =.5 sin(.5 ) is to be learned as the Fourier series model o = y = w sin(w ) Eample () Known he function is sinus Not Known Frequency and Amplitude. o = y = w sin( w ) o HL o w net HL net w o - d 9 Eample (3) Use NN model with a single neuron in the hidden layer (having sinusas an activation function) Use training data set {, d} Learn the Fourier series model o = y = w sin(w ) Cost fumction J 5 5 o HL o w net HL net w o - d he cost function J dependence upon A (dashed) and w (solid) 5 J = sum(e ) = sum(d - o) sum(d -w sin(w )) Cost fumction J 3 Eample (5) o HL o w net HL net w o - d 5 = w A = w Weight w = [A; w] Frequency w - - Amplitude A

3 Eample (5) F() is prescribed = N w i i i = y w F() = N w i i i = y w is prescribed Eample (5) Prescribing (integer) eponents results in a LINEAR APPROXIMAION SCHEME. v ji 3 5 y y j w j y j+ v ji 3 5 y y j w j y j+ Linear in parameters (w i ) to learn but not in terms of the resulting approimation function, which is nonlinear (NL) for i >. + RBF= Radial basis function + 3 Approimation of a D NL function by a Gaussian Radial Basis Function (RBF) In D case ignore the two inputs. hey are just to denote that the basic structure of the NN which is the same for ANY- DIMENSIONAL INPU wi+ϕ i+, wi+ > y k y * * * * F() * * * * * 3 * * * * * * * * σ w i+ϕ i+, w i+ < c i n+ * * * ϕ ϕ ϕ i ϕ i+ ϕ i+ ϕ N v ji + ci σ i y y y j y j+ w w j Eample (5) k F() = wiϕ i(, c i ) 5 i N = Measurements Images Records Observations Data Approimation of a D NL function by the Gaussian Radial Basis Function (RBF) APPROXIMAION OF SOME D NONLINEAR FCN BY GAUSSIAN RBF NN 8 3 Eample (5) 3 Measurements Images Records Observations Data Approimation of a D NL function by the APPROXIMAION Gaussian OF Radial SOME D NONLINEAR Basis Functions FCN BY GAUSSIAN (RBF) NN 8 3 Eample (5) 3 SMs and NNs he learning machine that determines APPROXIMAION FUNCION (regression) or the SEPARAION BOUNDARY (classification, pattern recognition), is the same in highdimensional data sets For the FIXED Gaussian RBFs, LEARNING is LINEAR. If the centers and covariance matrices are to be learned, problem becomes NONLINEAR (etremely difficult). 7 8

4 Neural Network y w F() = J w j j j c j Σ = j ϕ (,, ) Support ector Machine y w F() = J w j j j c j Σ = j ϕ (,, ) y y i v ji y j w j i v ji y j w j n y j+ n y j y w Comparison J F() = j w j ϕ j (, c j, Σ j ) = NN vs SM i v ji y y j y j+ w j y w J F() = j w j ϕ j (, c j, Σ j ) = No structural differences between NNs and SMs, i.e., in the representational capacity n y y j Important differences in LEARNING + i v ji y j+ w j n + Note Identification Estimation Regression Classification Pattern recognition Function approimation Curve fitting Surface fitting Question Do new learning concepts differ from the classical statistical inference? etc. 3

5 Classical Regression he classical regression and (Bayesian) classification statistical techniques are based on the strict assumption that probability distribution models (probability-density functions)are known. Statistical Inference Data can be modeled by a set of linear in parameter functions; this is a foundation of a parametric paradigm in learning from eperimental data. In the most of real-life problems, a stochastic component of data is the normal probability distribution law, i.e., the underlying joint probability distribution is Gaussian. Due to the second assumption above, the induction paradigm for parameter estimation is the maimum likelihood method that is reduced to the minimization of the sum-of-errors-squares cost function in most engineering applications. 5 Why SM? All three assumptions of the classical statistical paradigm, are inappropriate for many contemporary real-life problems (apnik 998). Reasons for SMs Modern problems are of high-dimensionality. he underlying mapping is often not smooth and therefore the linear paradigm calls for an eponentially increasing number of terms with an increasing dimensionality of the input space X, i.e., with an increase in the number of independent variables. his is known as the curse of dimensionality. he underlying application data generation laws may not follow the normal distribution and a model-builder must consider this in the construction of an effective learning algorithm. 7 From the first two reasons it follows that the maimum likelihood estimator (and consequently the sum-of-errorsquares cost function) should be replaced by a new induction paradigm that is uniformly better, in order to model non-gaussian distributions. 8 It Is Also rue hat () he probability-density functions are unknown, and a question arises HOW O PERFORM a distributionfree REGRESSION or CLASSIFICAION? It Is Also rue hat () Available are EXPERIMENAL DAA (eamples, training patterns, samples, observations, records) that are high-dimensional and scarce. High-dimensional spaces are often terrifyingly empty and the learning algorithms (i.e., machines) should be able to operate in such spaces and to learn from sparse data. here is an old saying that redundancy provides knowledge. Stated simpler the more data available at hand the better results will be produced. 9 3

6 errifying emptiness and/or data sparsenes Consider D y = f(), D z = f(, y), and 3D u = f(, y, z), functions for samples (points) in the domain (, ) y Illustrative Eample he density of spaces of D, D and 3D functions are decreases as D increases, and the average distance between the points increases with the dimensionality! y z Error Final error Error Analysis Dependency of the modeling error on the size of the training data set. Small sample Medium sample Large sample Noisy data set Noiseless data set Data size l 3 3 Error Analysis Error Analysis Glivenko-Cantelli-Kolmogorov results Glivenko-Cantelli theorem states that: Distribution function Pemp() fi P() as the number of data l fi. However, for both regression and classification we need probability density functions p(), i.e., p( w) rather than distribution P(). herefore, a question arises whether probability density p emp () p() as the number of data l. he answer is neither straightforward nor guaranteed, despite the fact that p()d = P(). Analogy to the classical problem A = y, =?, =A - y 33 3 Error Analysis he theory of UNIFORM CONERGENCE is needed for the set of functions implemented by a model, i.e., learning machine (apnik, Chervonenkis in 9-7 ties ) Nonlinear and nonparametricmodels illustrated by NNs and SMs are discussed. Nonlinear implies: ) he model class is not restricted to linear input-output maps, and ) he cost function that measures the goodness of a model, is nonlinear in respect to the unknown parameters. 35 3

7 Note that the second nonlinearity is the component of modeling that causes most of the computational problems. Nonparametric does not imply that the models do not have parameters at all. On the contrary, parameter learning (meaning selection, identification, estimation, fitting or tuning) is the crucial issue here However, unlike in the classical statistical inference, the parameters are not predefined but rather their number depends on the training data used. In other words, parametersthat define the capacity of the model are data driven in such a way as to match the model capacity with the data compleity. his is a basic paradigm of the structural risk minimization (SRM) introduced by apnik and Chervonenkis and their coworkers. he main characteristics of all MODERN problems is the mapping between highdimensional spaces. 39 Gender recognition problem: Are these two faces female or male? F or M? Pattern Recognition M or F? Gender recognition problem: Are these two faces female or male? F or M? Each face is represented by 8 input variables, (features). M or F? Problem from Brunelli and Poggio 993. Problem from Brunelli and Poggio 993

8 Approimation and classification are same for any dimensionality of the input space. Nothing but size changes. But the change is DRAMAIC. High dimensionality means both EXPLOSION in a number OF PARAMEERS to learn and the SPARCIY of the training data set. High dimensional spaces appear to be terrifyingly empty. Approimation P H Classification PLAN P H PLAN However, for inputs ( and P) N data is N or N data needed? N data 3 Approimation P H PLAN Classification CURSE of DIMENSIONALIY and SPARSIY OF DAA he recent promising tool FOR WORKING UNDER HESE CONSRAINS are the SUPPOR ECOR MACHINES based on the SAISICAL LEARNING HEORY (APNIK and CHERONENKIS). P WHA IS HE contemporary BASIC LEARNING PROBLEM? LEARN HE DEPENDENCY (FUNCION, MAPPING) from n N data N data N n data SPARSE DAA, under NOISE, in HIGH DIMENSIONAL SPACE! Recall - the redundancy provides the knowledge! A lot of data - easy problem. 5 CURSE of DIMENSIONALIY Illustrate HE IMPAC OF A DAA SE SIZE ON HE SIMPLES RECOGNIION PROBLEM BINARY CLASSIFICAION, i.e., DICHOOMIZAION. CLASSIFICAION (PAERN RECOGNIION) EXAMPLE Assume - Normally distributed classes, same covariance matrices. Solution is easy the decision boundary is linear and defined by parameter w = X * D in the case there is plenty of data (infinity). X * denotes the PSEUDOINERSE. = w + w d = + d = - 7 8

9 CLASSIFICAION (PAERN RECOGNIION) EXAMPLE Assume -Normally distributed classes, same covariance matrices. Solution is easy - decision boundary is linear and defined by parameter w = X * D in the case there is plenty of data (infinity). X * denotes the PSEUDOINERSE. d = + d = - Note that this solutionfollows from the last two assumptions of classical inference. Gaussian data and minimization of the sum-of-errorssquares w = X * D Eample (3) X w opt = [ ], and the separation boundary equals = D However, for a small sample, the solution defined by w = X * D is NO LONGER GOOD ONE because, for this data set this separation line is obtained. Eample (3) Eample 3(3) For a different data set another separation line is obtained. Again, for a small sample the solution defined by w = X * D is NO LONGER GOOD ONE. 5 5 What is common for both separation lines the red and the blue one. Both have a SMALL MARGIN. WHA S WRONG WIH SMALL MARGIN? Look at the BLUE line! It is very likely that the new eamples (, ) will be wrongly classified. SM he SAISICAL LEARNING HEORY IS DEELOPED O SOLE PROBLEMS of FINDING HE OPIMAL SEPARAION HYPERPLANE for small samples. SM he question is how to FIND the OPIMAL SEPARAION HYPERPLANE GIEN (scarce) DAA SAMPLES? 53 5

10 he SAISICAL LEARNING HEORY IS DEELOPED O SOLE PROBLEMS of FINDING HE OPIMAL SEPARAION HYPERPLANE for small samples. SM MAXIMAL MARGIN CLASSIFIER he maimal margin classifier is an alternative to the perceptron: it also assumes that the data are linearly separable SM it aims at finding the separating hyperplane with the maimal geometric margin (and not any one, that is typical of perceptron solutions) OPIMAL SEPARAION HYPERPLANE is the one that has the LARGES MARGIN on given DAA SE 55 Small margin Class, y = + Class, y = - Class, y = - Class, y = + Separating lines, i.e., decision boundaries, Large i.e., hyperplanes margin he larger the margin, the smaller the probability of misclassification. 5

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Statistical Learning Reading Assignments

Statistical Learning Reading Assignments Statistical Learning Reading Assignments S. Gong et al. Dynamic Vision: From Images to Face Recognition, Imperial College Press, 2001 (Chapt. 3, hard copy). T. Evgeniou, M. Pontil, and T. Poggio, "Statistical

More information

;TPa]X]V Ua^\ SPcP. ]TdaP[ ]Tcf^aZb. BX\X[PaXcXTb P]S SXUUTaT]RTb. Vojislav Kecman, The University of Auckland, Auckland, NZ

;TPa]X]V Ua^\ SPcP. ]TdaP[ ]Tcf^aZb. BX\X[PaXcXTb P]S SXUUTaT]RTb. Vojislav Kecman, The University of Auckland, Auckland, NZ ;TPa]X]V Ua^\ SPcP Bd ^ac etrc^a \PRWX]Tb P]S ]TdaP[ ]Tcf^aZb BX\X[PaXcXTb P]S SXUUTaT]RTb Vojislav Kecman, The University of Auckland, Auckland, NZ Slides accompanying The MIT Press book: Learning and

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Soft and hard models. Jiří Militky Computer assisted statistical modeling in the

Soft and hard models. Jiří Militky Computer assisted statistical modeling in the Soft and hard models Jiří Militky Computer assisted statistical s modeling in the applied research Monsters Giants Moore s law: processing capacity doubles every 18 months : CPU, cache, memory It s more

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

LEARNING & LINEAR CLASSIFIERS

LEARNING & LINEAR CLASSIFIERS LEARNING & LINEAR CLASSIFIERS 1/26 J. Matas Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality Weiqiang Dong 1 The goal of the work presented here is to illustrate that classification error responds to error in the target probability estimates

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

What is Fuzzy Logic? Fuzzy logic is a tool for embedding human knowledge (experience, expertise, heuristics) Fuzzy Logic

What is Fuzzy Logic? Fuzzy logic is a tool for embedding human knowledge (experience, expertise, heuristics) Fuzzy Logic Fuzz Logic Andrew Kusiak 239 Seamans Center Iowa Cit, IA 52242 527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak (Based on the material provided b Professor V. Kecman) What is Fuzz Logic?

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Gaussian Processes (10/16/13)

Gaussian Processes (10/16/13) STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks

Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks Yoshua Bengio Dept. IRO Université de Montréal Montreal, Qc, Canada, H3C 3J7 bengioy@iro.umontreal.ca Samy Bengio IDIAP CP 592,

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Support Vector Machine. Natural Language Processing Lab lizhonghua

Support Vector Machine. Natural Language Processing Lab lizhonghua Support Vector Machine Natural Language Processing Lab lizhonghua Support Vector Machine Introduction Theory SVM primal and dual problem Parameter selection and practical issues Compare to other classifier

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Bayesian Inference of Noise Levels in Regression

Bayesian Inference of Noise Levels in Regression Bayesian Inference of Noise Levels in Regression Christopher M. Bishop Microsoft Research, 7 J. J. Thomson Avenue, Cambridge, CB FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Chapter 6 Classification and Prediction (2)

Chapter 6 Classification and Prediction (2) Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes

CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions

More information