Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017

Size: px

Start display at page:

Download "Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto 2017"

Jonah Russell
5 years ago
Views:

1 Scuola di Calcolo Scientifico con MATLAB (SCSM) 2017 Palermo 31 Luglio - 4 Agosto Ing. Giuseppe La Tona

2 Sommario Machine Learning definition Machine Learning Problems Artificial Neural Networks (ANN) Nearest Neighbor classification Mixture Models and k-means Graphical Models

3 Machine Learning "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. (Tom M. Mitchell)

4 Example Count salmon sea bass l* Length

5 Example Width salmon sea bass Lightness

6 Machine Learning Sub-Problems Overfitting Noise Feature Extraction Model Selection Prior Knowledge Missing Features Width salmon sea bass? Lightness

7 Styles of Machine Learning Supervised Learning Unsupervised Learning Anomaly detection On-line learning Semi-supervised learning

8 Supervised Learning Given a set of data D = {(x n,y n ),n=1,...,n} the task is to learn the relationship between the input x and output y such that, when given a novel input x the predicted output y is accurate. The pair (x,y ) is not in D but assumed to be generated by the same unknown process that generated D. To specify explicitly what accuracy means one defines a loss function L(ypred, ytrue) or, conversely, a utility function U = L.

9 Supervised Learning Example: A father decides to teach his young son what a sports car is. Finding it difficult to explain in words, he decides to give some examples. They stand on a motorway bridge and, as each car passes underneath, the father cries out that s a sports car! when a sports car passes by. After ten minutes, the father asks his son if he s understood what a sports car is. The son says, sure, it s easy. An old red VW Beetle passes by, and the son shouts that s a sports car!. Dejected, the father asks why do you say that?. Because all sports cars are red!, replies the son.

10 Unsupervised Learning Given a set of data D = {x n,n=1,...,n} in unsupervised learning we aim to find a plausible compact description of the data. An objective is used to quantify the accuracy of the description. In unsupervised learning there is no special prediction variable so that, from a probabilistic perspective, we are interested in modelling the distribution p(x). The likelihood of the model to generate the data is a popular measure of the accuracy of the description.

11 Unsupervised Learning

12 Other Types of Learning Anomaly Detection Detec%ng anomalous events in industrial processes (plant monitoring), engine monitoring and unexpected buying behaviour pa;erns in customers all fall under the area of anomaly detec%on. Online Learning (supervised and unsupervised) In online learning data arrives sequen%ally and we con%nually update our model as new data becomes available. Semi-supervised learning

13 Machine Learning Problems Classification Regression Clustering Density Estimation Dimensionality Reduction

14 Exercise A blog platform needs an automatic tagging service. From the text of a blog article recommend a list of tags How would you proceed? Which questions should you first ask?

15 Machine Learning Steps

16 Datasets Training set Validation set Test set

17 Artificial Neural Networks Neuron or network node Black box representation x 1 w 1 x 1 y 1 x 2 x n w 2. w n f f (w 1 x 1 + w 2 x w n x n ) x 2 x n... F... y 2 y m

18 Artificial Neural Networks General network node Binary threshold function x 1 1 x 2 g f f (g(x 1, x 2,...,x n )) x n 0 θ

19 Artificial Neural Networks Input space separation Binary threshold function 1 OR AND

20 Feed-Forward ANN k hidden units n input sites... m output units site n+1 1 (1) w w n+1, k k +1, m 1 (2) connection matrix W 1 connection matrix W 2

21 Recurrent ANN

22 Recurrent ANN Dealing with Time Series Meteorological forecast Energy consump%on Order request forecast Traffic forecast Financial market forecast

23 Nonlinear Autoregressive Exogenous model (NARX) Exogenous input Temperature Hour of day

24 Self Organizing Maps Nature-inspired Autonomous units organizing to adapt to a space input Organization maintaining topology

25 Kohonen s model Multi-dimensional lattices of computing units Each unit has associated a weight w also called prototype vector w has the dimension of the input space Each unit has lateral connections to several neighbors

26 Kohonen s model We have a train set D of vectors sampled from the input space The network learns to adapt to the input space updating the weights of its computing units

27 Learning algorithm Consider an n-dimensional input space A one-dimensional SOM is a chain of computing units When an input x is received each unit m i computes the Euclidean distance between x and its weight w i The unit k with the smallest value(highest excitement) is selected(fires)

28 Learning algorithm The neighbors of k are also updated We define a neighborhood function ϕ(i,k) i.e. ϕ(i,k)=1 if d(i,k)<r otherwise ϕ(i,k)=0 neighborhood of unit 2 with radius m... w w w w w m-1 m x

29 Learning algorithm Init: a learning constant η, a neighborhood function ϕ are selected. The m weight vectors are initialized randomly Select an input vector ξ using the desired probability distribution over the input space. The unit k with the maximum excitation is selected (that is, for which the distance between wi and ξ is minimal, i = 1,...,m). The weight vectors are updated using the neighborhood function and the update rule w i w i + ηφ(i, k)(ξ w i ), for i =1,...,m. Stop if the maximum number of iterations has been reached; otherwise modify η and φ as scheduled and continue with step 1.

30 Learning algorithm Each step attracts the weight of the excited unit toward the input Repeating this process, we expect to arrive at a uniform distribution of weight vectors in input space (if the inputs have also been uniformly selected).

31 Effect on neighbors The radius of the neighborhood is reduced according to a schedule Each time a unit is updated, neighboring units are also updated If the weight vector of a unit is attracted to a region in input space, the neighbors are also attracted, but to a lesser degree During the learning process both the size of the neighborhood and the value of φ fall gradually, so that the influence of each unit upon its neighbors is reduced.

32 Schedule and learning constant The learning constant controls the magnitude of the weight updates and is reduced gradually The net effect of the selected schedule is to produce larger corrections at the beginning of training than at the end

33 Linear SOM example The weight vectors reach a distribution which transforms each unit into a representative of a small region of input space. The unit in the lower corner responds with the largest excitation to vectors in the shaded region.

34 Bi-dimensional networks Fig Planar network with a knot Several proofs of convergence have been given for one-dimensional Kohonen networks in one-dimensional domains. There is no general proof of convergence for multidimensional networks Mapping high-dimensional spaces Usually, when an empirical data set is selected, we do not know its real dimension. Even if the input vectors are of dimension n, itcouldbethatthedata concentrates on a manifold of lower dimension. In general it is not obvious which network dimension should be used for a given data set. This general problem led Kohonen to consider what happens when a low-dimensional network is used to map a higher-dimensional space. In this case the network must fold in order to fill the available space. Figure 15.9 shows, in the middle, the result of an experiment in which a two-dimensional network was used to chart athree-dimensionalbox.ascanbeseen,thenetworkextendsinthex and y dimensions and folds in the z direction. The units in the network try as hard Anima%on: h;ps://

Mapping high-dimensional spaces How a network of dimension n adapts to a space input of higher dimension It must fold to fill the space 0.4 0.2 0 0.2 0.4 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0 0.

35 Mapping high-dimensional spaces How a network of dimension n adapts to a space input of higher dimension It must fold to fill the space Fig Two-dimensional map of a three-dimensional region map alternately to one side or the other of input space (for the z dimension). Acommonlycitedexampleforthiskindofstructureinthehumanbrainis the visual cortex. The brain actually processes not one but two visual images, one displaced with respect to the other. In this case the input domain consists of two planar regions (the two sides of the box of Figure 15.9). The planar cortex must fold in the same way in order to respond optimally to input from one or other side of the input domain. The result is the appearance of the stripes of ocular dominance studied by neurobiologists in recent years. Figure shows a representation of the ocular dominance columns in LeVays reconstruction [205]. It is interesting to compare these stripes with the ones found in our simple experiment with the Kohonen network

36 What dimension for the network? In many cases we have experimental data which is coded using n real values, but whose effective dimension is much lower. Points in the surface of a sphere in threedimensional space. The input vectors have three components, but a two-dimensional Kohonen network will do a better job of charting this input space

37 Application: function approximation Apply planar grid to a surface P {(x,y,f(x,y)) x,y in [0,1]} After the learning algorithm is started, the planar network moves in the direction of P and distributes itself to cover the domain.

38 Application: function approximation θ f n or the other. The necf(θ) =α sin θ+β dθ/dt andthevertical,and The network is a kind of look-up table of the values of f. The table can be made as sparse or as dense as needed

39 Nearest Neighbour Classification Supervised method Assign to a new input the class of the Figure 14.1: In nearest neighbour classification a new vector nearest is assigned theinput label of thein nearest the vector in the training set. Here there are three classes, with training points training given by the circles, set along with their class. The dots indicate the class of the nearest training vector. The Distances: decision boundary piecewise linear with each segment corresponding to the perpendicular bisector between two datapoints belonging to di erent classes, Euclidean giving rise to a Voronoi tessellation of the input space. mahalanobis Algorithm 14.1 Nearest neighbour algorithm to classify a vector x, given train data D = {(x n,c n ),n=1,...,n}: 1: Calculate the dissimilarity of the test point x to each of the train points, d n = d (x, x n ), n =1,...,N.

40 Nearest Neighbor Classification Entire dataset must be stored Distance calculation may be expensive How to deal with missing data? How to incorporate prior knowledge?

41 K Nearest Neighbors More robust classifier Consider hypersphere that contains k train inputs and centered on test point How to choose k? Cross valida%on

42 Mixture models A mixture model is one in which a set of component models is combined to produce a richer model: p(v) = HX p(v h)p(h) h= (a) (b)

43 K-means clustering Partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

44 Graphical models

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000