Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

Similar documents
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Learning Vector Quantization

Artificial Neural Networks. Edward Gatt

Learning Vector Quantization (LVQ)

Artificial Neural Networks Examination, March 2004

Data Mining Part 5. Prediction

Introduction to Machine Learning Midterm Exam

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Mining Classification Knowledge

Metric-based classifiers. Nuno Vasconcelos UCSD

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Logic and machine learning review. CS 540 Yingyu Liang

COMS 4771 Introduction to Machine Learning. Nakul Verma

Artificial Intelligence

Part 8: Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Artificial Neural Networks Examination, June 2004

CSC242: Intro to AI. Lecture 21

Introduction to Machine Learning Midterm Exam Solutions

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Artificial Neural Network and Fuzzy Logic

Neural Network to Control Output of Hidden Node According to Input Patterns

Machine Learning and Deep Learning! Vincent Lepetit!

Introduction to Neural Networks

CS534 Machine Learning - Spring Final Exam

18.6 Regression and Classification with Linear Models

Artificial Neural Networks Examination, June 2005

Mining Classification Knowledge

Neural Networks Lecture 2:Single Layer Classifiers

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Neural Networks and the Back-propagation Algorithm

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Bayesian Networks Inference with Probabilistic Graphical Models

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

The Perceptron algorithm

Nonlinear Classification

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

L11: Pattern recognition principles

Lecture 4: Feed Forward Neural Networks

Holdout and Cross-Validation Methods Overfitting Avoidance

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Multilayer Perceptron

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

Lecture 6. Notes on Linear Algebra. Perceptron

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

ECE521 Lecture 7/8. Logistic Regression

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

6.036 midterm review. Wednesday, March 18, 15

Data Preprocessing. Cluster Similarity

10-701/ Machine Learning, Fall

Latent Variable Models and Expectation Maximization

Linear & nonlinear classifiers

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Latent Variable Models and Expectation Maximization

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Multilayer Neural Networks

Geometric View of Machine Learning Nearest Neighbor Classification. Slides adapted from Prof. Carpuat

Issues and Techniques in Pattern Classification

CMU-Q Lecture 24:

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Neural Networks DWML, /25

CSCI-567: Machine Learning (Spring 2019)

EEE 241: Linear Systems

Machine Learning. Boris

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Artificial Neural Networks

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

UNSUPERVISED LEARNING

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

The Perceptron. Volker Tresp Summer 2016

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Course 395: Machine Learning - Lectures

Introduction. Chapter 1

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Classification of Ordinal Data Using Neural Networks

Introduction to Logistic Regression

Introduction Biologically Motivated Crude Model Backpropagation

Final Exam, Machine Learning, Spring 2009

Logistic Regression & Neural Networks

Training the linear classifier

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Lecture 7 Artificial neural networks: Supervised learning

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Pattern Classification

Transcription:

Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017 www.u4learn.it Ing. Giuseppe La Tona

Sommario Machine Learning definition Machine Learning Problems Artificial Neural Networks (ANN) Nearest Neighbor classification Mixture Models and k-means Graphical Models

Machine Learning "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. (Tom M. Mitchell)

Example Count salmon sea bass 22 20 18 16 12 10 8 6 4 2 0 5 10 15 20 25 l* Length

Example Width 22 21 20 19 18 17 16 salmon sea bass 15 14 Lightness 2 4 6 8 10

Machine Learning Sub-Problems Overfitting Noise Feature Extraction Model Selection Prior Knowledge Missing Features Width 22 21 20 19 18 17 16 15 14 salmon sea bass? Lightness 2 4 6 8 10

Styles of Machine Learning Supervised Learning Unsupervised Learning Anomaly detection On-line learning Semi-supervised learning

Supervised Learning Given a set of data D = {(x n,y n ),n=1,...,n} the task is to learn the relationship between the input x and output y such that, when given a novel input x the predicted output y is accurate. The pair (x,y ) is not in D but assumed to be generated by the same unknown process that generated D. To specify explicitly what accuracy means one defines a loss function L(ypred, ytrue) or, conversely, a utility function U = L.

Supervised Learning Example: A father decides to teach his young son what a sports car is. Finding it difficult to explain in words, he decides to give some examples. They stand on a motorway bridge and, as each car passes underneath, the father cries out that s a sports car! when a sports car passes by. After ten minutes, the father asks his son if he s understood what a sports car is. The son says, sure, it s easy. An old red VW Beetle passes by, and the son shouts that s a sports car!. Dejected, the father asks why do you say that?. Because all sports cars are red!, replies the son.

Unsupervised Learning Given a set of data D = {x n,n=1,...,n} in unsupervised learning we aim to find a plausible compact description of the data. An objective is used to quantify the accuracy of the description. In unsupervised learning there is no special prediction variable so that, from a probabilistic perspective, we are interested in modelling the distribution p(x). The likelihood of the model to generate the data is a popular measure of the accuracy of the description.

Unsupervised Learning

Other Types of Learning Anomaly Detection Detec%ng anomalous events in industrial processes (plant monitoring), engine monitoring and unexpected buying behaviour pa;erns in customers all fall under the area of anomaly detec%on. Online Learning (supervised and unsupervised) In online learning data arrives sequen%ally and we con%nually update our model as new data becomes available. Semi-supervised learning

Machine Learning Problems Classification Regression Clustering Density Estimation Dimensionality Reduction

Exercise A blog platform needs an automatic tagging service. From the text of a blog article recommend a list of tags How would you proceed? Which questions should you first ask?

Machine Learning Steps

Datasets Training set Validation set Test set

Artificial Neural Networks Neuron or network node Black box representation x 1 w 1 x 1 y 1 x 2 x n w 2. w n f f (w 1 x 1 + w 2 x 2 +... + w n x n ) x 2 x n... F... y 2 y m

Artificial Neural Networks General network node Binary threshold function x 1 1 x 2 g f f (g(x 1, x 2,...,x n )) x n 0 θ

Artificial Neural Networks Input space separation Binary threshold function 1 OR AND 1 1 0 1 0 0 1 0 0 0 1

Feed-Forward ANN k hidden units n input sites... m output units site n+1 1 (1) w w n+1, k k +1, m 1 (2) connection matrix W 1 connection matrix W 2

Recurrent ANN

Recurrent ANN Dealing with Time Series Meteorological forecast Energy consump%on Order request forecast Traffic forecast Financial market forecast

Nonlinear Autoregressive Exogenous model (NARX) Exogenous input Temperature Hour of day

Self Organizing Maps Nature-inspired Autonomous units organizing to adapt to a space input Organization maintaining topology

Kohonen s model Multi-dimensional lattices of computing units Each unit has associated a weight w also called prototype vector w has the dimension of the input space Each unit has lateral connections to several neighbors

Kohonen s model We have a train set D of vectors sampled from the input space The network learns to adapt to the input space updating the weights of its computing units

Learning algorithm Consider an n-dimensional input space A one-dimensional SOM is a chain of computing units When an input x is received each unit m i computes the Euclidean distance between x and its weight w i The unit k with the smallest value(highest excitement) is selected(fires)

Learning algorithm The neighbors of k are also updated We define a neighborhood function ϕ(i,k) i.e. ϕ(i,k)=1 if d(i,k)<r otherwise ϕ(i,k)=0 neighborhood of unit 2 with radius 1 1 2 3 m... w w w w w 1 2 3 m-1 m x

Learning algorithm Init: a learning constant η, a neighborhood function ϕ are selected. The m weight vectors are initialized randomly Select an input vector ξ using the desired probability distribution over the input space. The unit k with the maximum excitation is selected (that is, for which the distance between wi and ξ is minimal, i = 1,...,m). The weight vectors are updated using the neighborhood function and the update rule w i w i + ηφ(i, k)(ξ w i ), for i =1,...,m. Stop if the maximum number of iterations has been reached; otherwise modify η and φ as scheduled and continue with step 1.

Learning algorithm Each step attracts the weight of the excited unit toward the input Repeating this process, we expect to arrive at a uniform distribution of weight vectors in input space (if the inputs have also been uniformly selected).

Effect on neighbors The radius of the neighborhood is reduced according to a schedule Each time a unit is updated, neighboring units are also updated If the weight vector of a unit is attracted to a region in input space, the neighbors are also attracted, but to a lesser degree During the learning process both the size of the neighborhood and the value of φ fall gradually, so that the influence of each unit upon its neighbors is reduced.

Schedule and learning constant The learning constant controls the magnitude of the weight updates and is reduced gradually The net effect of the selected schedule is to produce larger corrections at the beginning of training than at the end

Linear SOM example......... The weight vectors reach a distribution which transforms each unit into a representative of a small region of input space. The unit in the lower corner responds with the largest excitation to vectors in the shaded region.

Bi-dimensional networks 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 1 1 0.3 0.3 0.9 0.9 0.2 0.2 0.8 0.8 0.1 0.1 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 1 1 0.9 0.9 0.8 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.7 0.6 0.5 0.4 0.3 0.2 0.7 0.7 0.1 0.1 0.6 0.6 0 0 0.2 0.4 0.6 0.8 1 0 0 0.2 0.4 0.6 0.8 1 0.5 0.4 0.5 0.4 Fig. 15.8. Planar network with a knot 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Several proofs of convergence have been given for one-dimensional Kohonen networks in one-dimensional domains. There is no general proof of convergence for multidimensional networks. 15.2.2 Mapping high-dimensional spaces Usually, when an empirical data set is selected, we do not know its real dimension. Even if the input vectors are of dimension n, itcouldbethatthedata concentrates on a manifold of lower dimension. In general it is not obvious which network dimension should be used for a given data set. This general problem led Kohonen to consider what happens when a low-dimensional network is used to map a higher-dimensional space. In this case the network must fold in order to fill the available space. Figure 15.9 shows, in the middle, the result of an experiment in which a two-dimensional network was used to chart athree-dimensionalbox.ascanbeseen,thenetworkextendsinthex and y dimensions and folds in the z direction. The units in the network try as hard Anima%on: h;ps://www.youtube.com/watch?v=qvi6l-kqst4

Mapping high-dimensional spaces How a network of dimension n adapts to a space input of higher dimension It must fold to fill the space 0.4 0.2 0 0.2 0.4 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 15.9. Two-dimensional map of a three-dimensional region map alternately to one side or the other of input space (for the z dimension). Acommonlycitedexampleforthiskindofstructureinthehumanbrainis the visual cortex. The brain actually processes not one but two visual images, one displaced with respect to the other. In this case the input domain consists of two planar regions (the two sides of the box of Figure 15.9). The planar cortex must fold in the same way in order to respond optimally to input from one or other side of the input domain. The result is the appearance of the stripes of ocular dominance studied by neurobiologists in recent years. Figure 15.10 shows a representation of the ocular dominance columns in LeVays reconstruction [205]. It is interesting to compare these stripes with the ones found in our simple experiment with the Kohonen network. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

What dimension for the network? In many cases we have experimental data which is coded using n real values, but whose effective dimension is much lower. Points in the surface of a sphere in threedimensional space. The input vectors have three components, but a two-dimensional Kohonen network will do a better job of charting this input space

Application: function approximation Apply planar grid to a surface P {(x,y,f(x,y)) x,y in [0,1]} After the learning algorithm is started, the planar network moves in the direction of P and distributes itself to cover the domain.

Application: function approximation θ f n or the other. The necf(θ) =α sin θ+β dθ/dt andthevertical,and The network is a kind of look-up table of the values of f. The table can be made as sparse or as dense as needed

Nearest Neighbour Classification Supervised method Assign to a new input the class of the Figure 14.1: In nearest neighbour classification a new vector nearest is assigned theinput label of thein nearest the vector in the training set. Here there are three classes, with training points training given by the circles, set along with their class. The dots indicate the class of the nearest training vector. The Distances: decision boundary piecewise linear with each segment corresponding to the perpendicular bisector between two datapoints belonging to di erent classes, Euclidean giving rise to a Voronoi tessellation of the input space. mahalanobis Algorithm 14.1 Nearest neighbour algorithm to classify a vector x, given train data D = {(x n,c n ),n=1,...,n}: 1: Calculate the dissimilarity of the test point x to each of the train points, d n = d (x, x n ), n =1,...,N.

Nearest Neighbor Classification Entire dataset must be stored Distance calculation may be expensive How to deal with missing data? How to incorporate prior knowledge?

K Nearest Neighbors More robust classifier Consider hypersphere that contains k train inputs and centered on test point How to choose k? Cross valida%on

Mixture models A mixture model is one in which a set of component models is combined to produce a richer model: 0.25 0.2 0.15 0.25 0.2 0.15 p(v) = HX p(v h)p(h) h=1 0.1 0.05 0 10 8 6 4 2 0 2 4 6 8 10 (a) 0.1 0.05 0 10 8 6 4 2 0 2 4 6 8 10 (b)

K-means clustering Partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

Graphical models