Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3
|
|
- Joshua Robbins
- 6 years ago
- Views:
Transcription
1 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 1 / 37
2 Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 2 / 37
3 Introduction Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 3 / 37
4 Introduction Background and notations Background Purpose: predict Y from X ; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37
5 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37
6 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37
7 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37
8 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Y can be: a numeric variable (Y R) (supervised) regression régression; a factor (supervised) classication discrimination. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37
9 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
10 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
11 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
12 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
13 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
14 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conicting objectives!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37
15 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Function x y to be estimated Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
16 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Observations we might have Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
17 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Observations we do have Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
18 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage First estimation from the observations: undertting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
19 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Second estimation from the observations: accurate estimation Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
20 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Third estimation from the observations: overtting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
21 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Summary Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37
22 Introduction Errors Errors training error (measures the accuracy to the observations) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
23 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} n Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
24 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
25 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
26 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) test error: a way to prevent overtting (estimates the generalization error) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
27 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) test error: a way to prevent overtting (estimates the generalization error) 1 split the data into training/test sets (usually 80%/20%) 2 train Φ n from the training dataset 3 calculate the test error from the remaining data simple validation Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37
28 Introduction Errors Example Observations Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37
29 Introduction Errors Example Training/Test datasets Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37
30 Introduction Errors Example Training/Test errors Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37
31 Introduction Errors Example Summary Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37
32 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37
33 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Here: nonparametric methods: where Φ is totally unknown. Y = Φ(X ) + ɛ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37
34 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Here: nonparametric methods: Y = Φ(X ) + ɛ where Φ is totally unknown. (ML) Objective: Build Φ n from the observations such that its generalization error ELΦ n is (asymptotically) optimal. Example (regression framework) whatever (X, Y ) distribution. ELΦ n := E [ (Φ n (X ) Y ) 2] n + inf Φ ELΦ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37
35 Introduction Use case Use case description Data kindly provided by Laurence Liaubet described in [Liaubet et al., 2011]: microarray data: expression of 272 selected genes over 56 individuals (pigs); a phenotype of interest (muscle ph) measured over the 56 invididuals (numerical variable). le 1: genes expressions le 2: muscle ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 10 / 37
36 Neural networks Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 11 / 37
37 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37
38 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; combination (network) of simple elements (neurons). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37
39 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; combination (network) of simple elements (neurons). Example of graphical representation: INPUTS OUTPUTS Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37
40 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37
41 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37
42 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37
43 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM): dedicated to unsupervised problems (clustering), self-organized;... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37
44 Neural networks Overview MLP: Advantages/Drawbacks Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; accurate (universal approximation). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 37
45 Neural networks Overview MLP: Advantages/Drawbacks Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; accurate (universal approximation). Drawbacks hard to train (high computational cost, especially when d is large); overt easily; black box models (hard to interpret) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 37
46 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37
47 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37
48 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37
49 Neural networks Multilayer perceptron Analogy to the brain If > activation threshold then Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37
50 Neural networks Multilayer perceptron Analogy to the brain If < activation threshold then Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37
51 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37
52 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS Weighted links genes expressions Layer 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37
53 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Layer 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37
54 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Layer 1 Layer 2 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37
55 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: 2 hidden layers MLP INPUTS OUTPUTS genes expressions Layer 1 Layer 2 ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37
56 Neural networks Multilayer perceptron A neuron v 3 w 3 v 2 v 1 w 2 w 1 + w 0 (Bias Biais) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37
57 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37
58 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Standard activation functions fonctions de lien / d'activation Biologically inspired: Heaviside function Seuil { 0 if t < threshold; S(t) = 1 if not. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37
59 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Standard activation functions Main issue with the Heaviside function: not continuous! Logistic function S(t) = 1 1+e t Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37
60 Neural networks Multilayer perceptron Summary If Y is numeric, linear output: p x R d, F w (x) = j=1 w (2) S j ( d k=1 w (1) x k + w (0) kj j ). w (1) neuron 1 INPUTS x 1 x 2... x d w (1) dp No analytical expression!! +w (0) p neuron p w (2) 1 w (2) p F w (x) OUTPUTS Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 37
61 Neural networks Multilayer perceptron Summary If Y is a factor, logistic output: p x R d, P(X = C x = x) F w (x) = S j=1 w (2) S j ( d k=1 (with a maximum probability rule for the nal classication) ) w (1) x k + w (0) kj j w (1) neuron 1 INPUTS x 1 x 2... x d w (1) dp +w (0) p neuron p w (2) 1 w (2) p F w (x) OUTPUTS No analytical expression!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 37
62 Neural networks Multilayer perceptron Universal approximation [Hornik et al., 1989] (among others) For any given Φ, smooth enough and any precision ɛ, there exists a one-hidden layer perceptron (with sigmoïd activation functions) that approximates Φ with a precision at most ɛ. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 19 / 37
63 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) w n = arg min w 1 n i=1 n (y i F w (x i )) 2. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37
64 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) Main issues: w n = arg min w 1 n i=1 n (y i F w (x i )) 2. 1 no exact solution ( approximation algorithms, e.g., Newton's method + backpropagation principle): local minima; Erreur Poids Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37
65 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) Main issues: w n = arg min w 1 n i=1 n (y i F w (x i )) 2. 1 no exact solution ( approximation algorithms, e.g., Newton's method + backpropagation principle): local minima; 2 overtting: the larger p is, the more exible the perceptron is and the more it can overt the data. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37
66 Neural networks Learning/Tuning Overtting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 37
67 Neural networks Learning/Tuning Overtting Weight decay can help improve the generalization ability: w n 1 n = arg min (y i F w (x i )) 2 +λ w 2 w n i=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 37
68 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37
69 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). grid search using a simple validation; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37
70 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). grid search using a simple validation; grid search using a (K -fold) cross validation (better but computationally expensive when K is large). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37
71 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37
72 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ n-fold cross validation is called Leave-One-Out (LOO). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37
73 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ n-fold cross validation is called Leave-One-Out (LOO). Standard choice for K : LOO or 10-fold CV. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37
74 CART Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 24 / 37
75 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37
76 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; can deal with a large number of input variables, either numeric variables or factors (a variable selection is included in the method); provide an intuitive interpretation. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37
77 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; can deal with a large number of input variables, either numeric variables or factors (a variable selection is included in the method); provide an intuitive interpretation. Drawbacks require a large training dataset to be ecient; as a consequence, are often too simple to provide accurate predictions. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37
78 CART Introduction Example X = (Gender, Age, Height) and Y = Weight Root Height<1.60m Height>1.60m a split a node Gender=M Gender=F Age<30 Age>30 Y 1 Y 2 Y 3 a leaf (terminal node) Gender=M Gender=F Y 4 Y 5 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 26 / 37
79 CART Learning CART learning process Algorithm 1: Start from root 2: repeat 3: move to a new node 4: if the node is homogeneous or small enough then 5: STOP 6: else 7: split the node into two child nodes with maximal homogeneity 8: end if 9: until all nodes are processed Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 27 / 37
80 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37
81 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37
82 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37
83 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Hyperparameters can be tuned by cross-validation using a grid search. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37
84 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Hyperparameters can be tuned by cross-validation using a grid search. An alternative approach is pruning... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37
85 CART Learning Choosing an optimal subtree Algorithm 1: Train the maximal tree, T 2: Pruning: Find an optimal subtrees sequence (T k ) k=1,...,k 3: By cross validation, nd the errors L(T k ) + λc(t k ) for k = 1,..., K where L is the error and C is a complexity measure (number of leafs) 4: Select the subtree s.t. L(T k ) + λc(t k ) is minimum Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 29 / 37
86 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37
87 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); the corresponding predicted ŷ new is if Y is numeric, the mean value of the observations (training set) assigned to the same leaf; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37
88 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); the corresponding predicted ŷ new is if Y is numeric, the mean value of the observations (training set) assigned to the same leaf; if Y is a factor, the majority class of the observations (training set) assigned to the same leaf. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37
89 Random forest Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 31 / 37
90 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37
91 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method (no prior assumption needed) and accurate; can deal with a large number of input variables, either numeric variables or factors; can deal with small samples. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37
92 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method (no prior assumption needed) and accurate; can deal with a large number of input variables, either numeric variables or factors; can deal with small samples. Drawbacks black box model; is not supported by strong mathematical results (consistency...) until now. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37
93 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37
94 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37
95 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Bagging: combination of simple (and underecient) regression (or classication) functions; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37
96 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Bagging: combination of simple (and underecient) regression (or classication) functions; Random forest CART bagging. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37
97 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
98 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
99 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) 1 Build P bootstrap samples from (x i ) i 2 Use them to estimate X P times 3 The condence interval is based on the percentiles of the empirical distribution of X Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
100 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
101 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
102 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Also useful to estimate p-values, residuals,... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37
103 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37
104 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Bagging with regression trees 1: for b = 1,..., B do 2: Construct a bootstrap sample ξ b 3: Train a regression tree from ξ b, ˆφ b 4: end for 5: Estimate the regression function by ˆΦ n (x) = 1 B B ˆφ b (x). b=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37
105 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Bagging with regression trees 1: for b = 1,..., B do 2: Construct a bootstrap sample ξ b 3: Train a regression tree from ξ b, ˆφ b 4: end for 5: Estimate the regression function by ˆΦ n (x) = 1 B B ˆφ b (x). b=1 For classication, the predicted class is the majority vote class. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37
106 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37
107 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). 2 the tree is restricted to the rst l nodes with l small. Hyperparameters those of the CART algorithm; those that are specic to the random forest: q, bootstrap sample size, number of trees. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37
108 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). 2 the tree is restricted to the rst l nodes with l small. Hyperparameters those of the CART algorithm; those that are specic to the random forest: q, bootstrap sample size, number of trees. Random forest are not very sensitive to hyper-parameters setting: default values for q and bootstrap sample size (2n/3) should work in most cases. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37
109 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37
110 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Stabilization of OOB error is a good indication that there is enough trees in the forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37
111 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Importance of a variable to help interpretation: for a given variable X j 1: randomize the values of the variable 2: make predictions from this new dataset 3: the importance is the mean decrease in accuracy (MSE or misclassication rate) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37
112 References Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA. Breiman, L. (2001). Random forests. Machine Learning, 45(1):532. Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984). Classication and Regression Trees. Chapman and Hall, New York, USA. Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feed-forward networks are universal approximators. Neural Networks, 2: Liaubet, L., Lobjois, V., Faraut, T., Tircazes, A., Benne, F., Iannuccelli, N., Pires, J., Glénisson, J., Robic, A., Le Roy, P., SanCristobal, M., and Cherel, P. (2011). Genetic variability or transcript abundance in pig peri-mortem skeletal muscle: eqtl localized genes involved in stress response, cell death, muscle disorders and metabolism. BMC Genomics, 12(548). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37
Machine Learning - TP
Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationIntroduction to machine learning
1/59 Introduction to machine learning Victor Kitov v.v.kitov@yandex.ru 1/59 Course information Instructor - Victor Vladimirovich Kitov Tasks of the course Structure: Tools lectures, seminars assignements:
More informationPerceptron. (c) Marcin Sydow. Summary. Perceptron
Topics covered by this lecture: Neuron and its properties Mathematical model of neuron: as a classier ' Learning Rule (Delta Rule) Neuron Human neural system has been a natural source of inspiration for
More informationRetrieval of Cloud Top Pressure
Master Thesis in Statistics and Data Mining Retrieval of Cloud Top Pressure Claudia Adok Division of Statistics and Machine Learning Department of Computer and Information Science Linköping University
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More informationAnalysis of Fast Input Selection: Application in Time Series Prediction
Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationDeconstructing Data Science
econstructing ata Science avid Bamman, UC Berkeley Info 290 Lecture 6: ecision trees & random forests Feb 2, 2016 Linear regression eep learning ecision trees Ordinal regression Probabilistic graphical
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationMachine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment
Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Andrew Crane-Droesch FCSM, March 2018 The views expressed are those of the authors and should not be attributed
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationLinks between Perceptrons, MLPs and SVMs
Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:
More informationArtificial Neural Networks. Edward Gatt
Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very
More informationOptmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)
Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationFundamentals of Neural Network
Chapter 3 Fundamentals of Neural Network One of the main challenge in actual times researching, is the construction of AI (Articial Intelligence) systems. These systems could be understood as any physical
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationArtificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso
Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationBANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1
BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture
More information1/sqrt(B) convergence 1/B convergence B
The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More informationHierarchical Boosting and Filter Generation
January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationCombination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters
Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationError Empirical error. Generalization error. Time (number of iteration)
Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationRegression tree methods for subgroup identification I
Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem
More informationNeural Networks Lecture 4: Radial Bases Function Networks
Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationRadial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition
Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationCombining estimates in regression and. classication. Michael LeBlanc. and. Robert Tibshirani. and. Department of Statistics. cuniversity of Toronto
Combining estimates in regression and classication Michael LeBlanc and Robert Tibshirani Department of Preventive Medicine and Biostatistics and Department of Statistics University of Toronto December
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationIn the Name of God. Lectures 15&16: Radial Basis Function Networks
1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationArtificial neural networks
Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationWidths. Center Fluctuations. Centers. Centers. Widths
Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.
More informationLeast Absolute Shrinkage is Equivalent to Quadratic Penalization
Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationis used in the empirical trials and then discusses the results. In the nal section we draw together the main conclusions of this study and suggest fut
Estimating Conditional Volatility with Neural Networks Ian T Nabney H W Cheng y 1 Introduction It is well known that one of the obstacles to eective forecasting of exchange rates is heteroscedasticity
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationCMSC 421: Neural Computation. Applications of Neural Networks
CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationIn: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999
In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based
More informationArtificial Neural Networks (ANN)
Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationMachine Learning Recitation 8 Oct 21, Oznur Tastan
Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy
More informationECE662: Pattern Recognition and Decision Making Processes: HW TWO
ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are
More information