Issues and Techniques in Pattern Classification
|
|
- Crystal Goodwin
- 5 years ago
- Views:
Transcription
1 Issues and Techniques in Pattern Classification Carlotta Domeniconi Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated the data in a general and simple fashion. Different learning paradigms: supervised learning unsupervised learning semi-supervised learning reinforcement learning
2 Supervised Learning Sample data comprises input vectors along with the corresponding target values (labeled data) Supervised learning uses the given labeled data to find a model (hypothesis) that predicts the target values for previously unseen data Supervised Learning: Classification Each element in the sample is labeled as belonging to some class (e.g., apple or orange). The learner builds a model to predict classes for all input data. There is no order among classes.
3 Classification Eample: Handwriting Recognition You've been given a set of N pictures of digits. For each picture, you're told the digit number Discover a set of rules which, when applied to pictures you've never seen, correctly identifies the digits in those pictures Supervised Learning: Regression Each element in the sample is associated with one or more continuous variables The learner builds a model to predict the value(s) for all input data Unlike classes, values have an order among them 3
4 Regression Eample: Automated Steering A CMU team trained a neural network to drive a car by feeding in many pictures of roads, plus the value according to which the graduate student was turning the steering wheel at the time. Eventually the neural network learned to predict the correct value for previously unseen pictures of roads. "No Hands Across America" Unsupervised Learning The given data consists of input vectors without any corresponding target values The goal is to discover groups of similar eamples within the data (clustering), or to determine the distribution of data within the input space (density estimation) 4
5 Clustering Goal: Grouping a collection of objects (data points) into subsets or clusters, such that those within each cluster are more closely related to one another than objects assigned to different clusters. Fundamental to all clustering techniques is the choice of distance or dissimilarity measure between two objects. 5
6 Semi-supervised learning In many data mining applications, unlabeled data are easily available, while labeled ones are epensive to obtain because they require human effort For eample: Web page classification; Contentbased image retrieval Semi-supervised learning is a recent learning paradigm: it eploits unlabeled eamples, in addition to labeled ones, to improve the generalization ability of the resulting classifier Reinforcement Learning The problem here is to find suitable actions to take in a given situation in order to maimize a reward Trial and error: no eamples of optimal outputs are given Trade-off between eploration (try new actions to see how effective they are) and eploitation (use actions that are known to give high reward) 6
7 A Classification Eample (from Pattern Classification by Duda & Hart & Stork Second Edition, 00) A fish-packing plant wants to automate the process of sorting incoming fish according to species As a pilot project, it is decided to try to separate sea bass from salmon using optical sensing 7
8 To solve this problem, we adopt a machine learning approach: We use a training set to tune the parameters of an adaptive model Length Lightness Width Position of the mouth Features to eplore for use in our classifier 8
9 Preprocessing: Images of different fishes are isolated from one another and from background; Feature etraction: The information of a single fish is then sent to a feature etractor, that measure certain features or properties ; Classification: The values of these features are passed to a classifier that evaluates the evidence presented, and build a model to discriminate between the two species Domain knowledge: a sea bass is generally longer than a salmon Feature: Length Model: Sea bass have some typical length, and this is greater than the length of a salmon 9
10 Classification rule: If Length >= l* then sea bass otherwise salmon How to choose l*? Use length values of sample data (training data) Histograms of the length feature for the two categories Leads to the smallest number of errors on average We cannot reliably separate sea bass from salmon by length alone! 0
11 New Feature: Average lightness of the fish scales Histograms of the lightness feature for the two categories Leads to the smallest number of errors on average The two classes are much better separated!
12 Histograms of the lightness feature for the two categories Classifying a sea bass as salmon costs more. Thus we reduce the number of sea bass classified as salmon. Our actions are equally costly In Summary: The overall task is to come up with a decision rule (i.e., a decision boundary) so as to minimize the cost (which is equal to the average number of mistakes for equally costly actions).
13 No single feature gives satisfactory results We must rely on using more than one feature We observe that: sea bass usually are wider than salmon Two features: Lightness and Width Resulting fish representation: = Decision rule: Classify the fish as a sea bass if its feature vector falls above the decision boundary shown, and as salmon otherwise Should we be satisfied with this result? 3
14 Options we have: Consider additional features: Which ones? Some features may be redundant (e.g., if eye color perfectly correlates with width, then we gain no information by adding eye color as feature.) It may be costly to attain more features Too many features may hurt the performance! Use a more comple model All training data are perfectly separated Should we be satisfied now?? We must consider: Which decisions will the classifier take on novel patterns, i.e. fish not yet seen? Will the classifier suggest the correct actions? This is the issue of GENERALIZATION 4
15 Generalization A good classifier should be able to generalize, i.e. perform well on unseen data The classifier should capture the underlying characteristics of the categories The classifier should NOT be tuned to the specific (accidental) characteristics of the training data Training data in practice contain some noise As a consequence: We are better off with a slightly poorer performance on the training eamples if this means that our classifier will have better performance on novel patterns. 5
16 The decision boundary shown may represent the optimal tradeoff between accuracy on the training set and on new patterns How can we determine automatically when the optimal tradeoff has been reached? Generalization The idea is to use a model with an intermediate compleity, which gets most of the points right, without putting too much trust in any individual point. The goal of statistical learning theory is to formalize these arguments by studying mathematical properties of learning machines, i.e. properties of the class of functions that the learning machine can implement (formalization of the compleity of the model). 6
17 Tradeoff between performance on training and novel eamples error Generalization error Error on training data Optimal compleity Compleity of the model Evaluation of the classifier on novel data is important to avoid over-fitting Eample: Regression Estimation ( ) t = sin π Data: input-output pairs Regularity: ( i, t i ) R R (, t ),, (, t ) drawn from L N N P (, t) Learning: choose a function error function is minimized f : R R such that a given 7
18 Polynomial Curve Fitting We fit the data using a polynomial function of the form: y M (, w) = w0 + w + w + L+ wm = It is linear in the unknown parameters: linear model M j= 0 w j j Fit the polynomial to the training data so that a given error function is minimized: E N ( w) = ( y( n, w) t n ) n= Polynomial Curve Fitting Geometrical interpretation of the sum-of-squares error function E N ( w) = ( y( n, w) t n ) n= 8
19 Polynomial Curve Fitting Model selection: We need to choose the order M of the polynomial: y M (, w) = w0 + w + w + L+ wm = M j= 0 w j j Over-fitting Polynomial Curve Fitting Model selection: we can investigate the dependence of the generalization performance on the order M * E( w ) N E RMS / = Root-mean-square error 9
20 Polynomial Curve Fitting We can also eamine the behavior of a given model as the size of the data set changes M = 9 For a given model, the over-fitting problem becomes less severe as the number of training data increases How to dodge the over-fitting problem? Assumption: we have a limited number of training data available, and we wish to use relatively comple and fleible models. One possible solution: add a penalty term to the error function to discourage the coefficients from reaching large values, e.g.: ~ E N = ( w) ( y(, w) ) n= n t n λ + w Sum of all squared coefficients This is an eample of a regularization technique. [shrinkage methods, ridge regression, weight decay] 0
21 The impact of Regularization N ~ λ E( w) = ( y( n, w) t n ) + w n= M = 9 The impact of Regularization N ~ λ E( w) = ( y( n, w) t n ) + w n= M = 9 λ controls the effective compleity of the model and hence determines the degree of over-fitting
22 Model selection Partition available data into a training set used to determine the coefficients w, and into a separate validation set (also called hold-out) used to optimize the M or λ model compleity ( ) If the model design is iterated several times using a limited number of data, some over-fitting to the validation data can occur. Solution: Set aside a third test set on which performance of the selected model is finally evaluated. Model selection Often we have limited data available, and we wish to use as much of the available data as possible for training to build good models. However, if we use a small set for validation, we obtain noisy estimates of predictive performance. One solution to this problem is cross-validation.
23 Cross-validation: Eample 4-fold cross-validation It allows to use ¾ of the available data for training, while making use of all of the data to assess performance k-fold Cross-validation In general: we perform k runs. Each run uses (k-)/k of the available data for training. If the number of data is very limited, we can set k=n (total number of data points). This gives the leave-oneout cross-validation technique. 3
24 k-fold Cross-validation: Drawbacks Computationally epensive: number of training runs is increased by a factor of k. A single models may have multiple compleity parameters: eploring combinations of settings could require a number of training runs that is eponential in the number of parameters. Curse of Dimensionality Real world applications deal with spaces with high dimensionality High dimensionality poses serious challenges for the design of pattern recognition techniques Oil flaw data in two dimensions Red = homogeneous Green = annular Blue = laminar 4
25 Curse of Dimensionality A simple classification approach Curse of Dimensionality Going higher in dimensionality 3 The number of regions of a regular grid grows eponentially with the dimensionality D of the space
26 Curse of Dimensionality DMAX DMAX/DMIN DMIN Sample of size N=500 uniformly distributed in q [0,] Curse of Dimensionality dim The distribution of the ratio DMAX/DMIN converges to as the dimensionality increases 6
27 Curse of Dimensionality dim Variance of distances from a given point Curse of Dimensionality dim The variance of distances from a given point converges to 0 as the dimensionality increases 7
28 Curse of Dimensionality Distance values from a given point Values flatten out as dimensionality increases Computing radii of nearest neighborhoods 8
29 median radius of a nearest neighborhood uniform distribution in the unit cube [-.5,.5] q Curse-of-Dimensionality As dimensionality increases, the distance from the closest point increases faster Random sample of size N ~ uniform distribution in the q-dimensional unit hypercube Diameter of a K = neighborhood using Euclidean / q distance: d( q, N) = O( N ) q N d(q,n) Large d( q, N ) Highly biased estimations 9
30 Curse-of-Dimensionality In high dimensional spaces data become etremely sparse and are far apart from each other The curse of dimensionality affects any estimation problem with high dimensionality Curse-of-Dimensionality It is a serious problem in many real-world applications Microarray data: 3,000-4,000 genes; Documents: 0,000-0,000 words in dictionary; Images, face recognition,, etc. 30
31 How can we deal with the curse of dimensionality? Curse-of-Dimensionality Effective techniques applicable to high dimensional spaces eist. Real data are often confined to regions of lower dimensionality 3
32 ( )( ) [ ] ( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( ) ( )( ) ( ) = = µ µ µ µ µ µ = µ µ µ µ µ µ = µ µ µ µ = = µ µ = = N i i i i i i i T N i i N E E E N, : µ µ µ covariance matri
33 N N i i i ( µ ) ( µ )( µ ) i i i i= ( µ )( µ ) ( µ ) = variance covariance N N N N i i i ( µ ) [ ( µ )( µ )] N N i i i [ ( µ )( µ )] ( µ ) i= i= N i= N i= covariance variance
34 Dimensionality Reduction Many dimensions are often interdependent (correlated); We can: Reduce the dimensionality of problems; Transform interdependent coordinates into significant and independent ones; Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations of the given features; The objective is to consider independent dimensions along which data have largest variance (i.e., greatest variability); 34
35 Principal Component Analysis -- PCA PCA involves a linear algebra procedure that transforms a number of possibly correlated variables into a smaller number of uncorrelated variables called principal components; The first principal component accounts for as much of the variability in the data as possible; Each succeeding component (orthogonal to the previous ones) accounts for as much of the remaining variability as possible. Principal Component Analysis -- PCA So: PCA finds n linearly transformed components s, s, L, s n so that they eplain the maimum amount of variance; We can define PCA in an intuitive way using a recursive formulation: 35
36 Principal Component Analysis -- PCA Suppose data are first centered at the origin (i.e., their mean is 0 ); We define the direction of the first principal component, say, as follows w w T w = arg ma E[( w ) w = where is of the same dimensionality q as the data vector Thus: the first principal component is the projection on the direction along which the variance of the projection is maimized. ] Principal Component Analysis -- PCA Having determined the first k- principal components, the k-th principal component is determined as the principal component of the data residual: T T wk = arg ma E{[ w ( wiwi )] w = i= The principal components are then given by: s i = w T i k } 36
37 Simple illustration of PCA First principal component of a two-dimensional data set. Simple illustration of PCA Second principal component of a two- dimensional data set. 37
38 PCA Geometric interpretation Basically: PCA rotates the data (centered at the origin) in such a way that the maimum variability is visible (i.e., aligned with the aes.) Eample: PCA does not always work 38
39 Eample Eample First principal component 39
40 Eample First principal component Eample First principal component Confused miture of samples from both classes 40
41 How to classify a new data point? First principal component How to classify a new data point? First principal component Points from both classes are intermingled. We cannot predict with accuracy the class of the new unlabeled point! 4
42 Can we find a better projection? How about the horizontal dimension? 4
43 How about the horizontal dimension? How about the horizontal dimension? Projected samples are well separated now 43
44 How to classify a new data point? Projected samples are well separated now How to classify a new data point? The new point is much closer to the red samples than to the green ones. Projected samples are well separated now 44
45 How to classify a new data point? The new point is much closer to the red samples than to the green ones. We can label the new point as red. Projected samples are well separated now What did we do? Find an orientation along which the projected samples are well separated; This is eactly the goal of linear discriminant analysis (LDA); In other words: we are after the linear projection that best separate the data, i.e. best discriminate data of different classes. 45
46 PCA vs. LDA Both PCA and LDA are linear techniques for dimensionality reduction: they project the data along directions that can be epressed as linear combination of the input features. The appropriate transformation depends on the data and on the task we want to perform on the data. Note that LDA uses class labels, PCA does not. Non-linear etensions of PCA and LDA eist (e.g., Kernel-PCA, generalized LDA). 46
EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4731 Dr. Mihail Fall 2017 Slide content based on books by Bishop and Barber. https://www.microsoft.com/en-us/research/people/cmbishop/ http://web4.cs.ucl.ac.uk/staff/d.barber/pmwiki/pmwiki.php?n=brml.homepage
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More information6.867 Machine learning
6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMachine Learning 2nd Edition
INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010
More informationMetric-based classifiers. Nuno Vasconcelos UCSD
Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major
More informationRidge Regression 1. to which some random noise is added. So that the training labels can be represented as:
CS 1: Machine Learning Spring 15 College of Computer and Information Science Northeastern University Lecture 3 February, 3 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu 1 Introduction Ridge
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationMRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet
MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More information9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients
What our model needs to do regression Usually, we are not just trying to explain observed data We want to uncover meaningful trends And predict future observations Our questions then are Is β" a good estimate
More informationLinear Discriminant Functions
Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationBayes Decision Theory - I
Bayes Decision Theory - I Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Statistical Learning from Data Goal: Given a relationship between a feature vector and a vector y, and iid data samples ( i,y i ), find
More informationMachine Learning Lecture 3
Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First
More informationLecture 3: Pattern Classification. Pattern classification
EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and
More informationChemometrics: Classification of spectra
Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationIntroduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones
Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSoft and hard models. Jiří Militky Computer assisted statistical modeling in the
Soft and hard models Jiří Militky Computer assisted statistical s modeling in the applied research Monsters Giants Moore s law: processing capacity doubles every 18 months : CPU, cache, memory It s more
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationAdvanced Methods for Fault Detection
Advanced Methods for Fault Detection Piero Baraldi Agip KCO Introduction Piping and long to eploration distance pipelines activities Piero Baraldi Maintenance Intervention Approaches & PHM Maintenance
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationPattern Recognition 2
Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationClassifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier
Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationSVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION
International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2
More informationINTRODUCTION TO PATTERN
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning
CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining
More informationKernel-Based Principal Component Analysis (KPCA) and Its Applications. Nonlinear PCA
Kernel-Based Principal Component Analysis (KPCA) and Its Applications 4//009 Based on slides originaly from Dr. John Tan 1 Nonlinear PCA Natural phenomena are usually nonlinear and standard PCA is intrinsically
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationWolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 14 Indexes for Multimedia Data 14 Indexes for Multimedia
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationProbabilistic Methods in Bioinformatics. Pabitra Mitra
Probabilistic Methods in Bioinformatics Pabitra Mitra pabitra@cse.iitkgp.ernet.in Probability in Bioinformatics Classification Categorize a new object into a known class Supervised learning/predictive
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationDimension Reduction Methods
Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationGeometric Modeling Summer Semester 2010 Mathematical Tools (1)
Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationMachine Learning for Software Engineering
Machine Learning for Software Engineering Dimensionality Reduction Prof. Dr.-Ing. Norbert Siegmund Intelligent Software Systems 1 2 Exam Info Scheduled for Tuesday 25 th of July 11-13h (same time as the
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationLearning Theory Continued
Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationPattern Recognition and Machine Learning. Learning and Evaluation of Pattern Recognition Processes
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lesson 1 5 October 2016 Learning and Evaluation of Pattern Recognition Processes Outline Notation...2 1. The
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationDATA MINING LECTURE 10
DATA MINING LECTURE 10 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning 10 10 Illustrating Classification Task Tid Attrib1
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationLecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan
Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step
More informationTutorial on Machine Learning for Advanced Electronics
Tutorial on Machine Learning for Advanced Electronics Maxim Raginsky March 2017 Part I (Some) Theory and Principles Machine Learning: estimation of dependencies from empirical data (V. Vapnik) enabling
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationM.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011
.S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More information