Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Size: px
Start display at page:

Download "Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3"

Transcription

1 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 1 / 37

2 Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 2 / 37

3 Introduction Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 3 / 37

4 Introduction Background and notations Background Purpose: predict Y from X ; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37

5 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37

6 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37

7 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37

8 Introduction Background and notations Background Purpose: predict Y from X ; What we have: n observations of (X, Y ), (x 1, y 1 ),..., (x n, y n ); What we want: estimate unknown Y from new X : x n+1,..., x m. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Y can be: a numeric variable (Y R) (supervised) regression régression; a factor (supervised) classication discrimination. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 37

9 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

10 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

11 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

12 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

13 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

14 Introduction Undertting / Overtting Basics From (x i, y i ) i, denition of a machine, Φ n s.t.: ŷ new = Φ n (x new ). if Y is numeric, Φ n is called a regression function fonction de classication; if Y is a factor, Φ n is called a classier classieur; Φ n is said to be trained or learned from the observations (x i, y i ) i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conicting objectives!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 37

15 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Function x y to be estimated Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

16 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Observations we might have Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

17 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Observations we do have Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

18 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage First estimation from the observations: undertting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

19 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Second estimation from the observations: accurate estimation Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

20 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Third estimation from the observations: overtting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

21 Introduction Undertting / Overtting Undertting/Overtting sous/sur - apprentissage Summary Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 37

22 Introduction Errors Errors training error (measures the accuracy to the observations) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

23 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} n Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

24 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

25 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

26 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) test error: a way to prevent overtting (estimates the generalization error) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

27 Introduction Errors Errors training error (measures the accuracy to the observations) if y is a factor: misclassication rate {ŷ i y i, i = 1,..., n} if y is numeric: mean square error (MSE) n 1 n n (ŷ i y i ) 2 i=1 or root mean square error (RMSE) or pseudo-r 2 : 1 MSE/Var((y i ) i ) test error: a way to prevent overtting (estimates the generalization error) 1 split the data into training/test sets (usually 80%/20%) 2 train Φ n from the training dataset 3 calculate the test error from the remaining data simple validation Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 37

28 Introduction Errors Example Observations Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37

29 Introduction Errors Example Training/Test datasets Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37

30 Introduction Errors Example Training/Test errors Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37

31 Introduction Errors Example Summary Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 37

32 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37

33 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Here: nonparametric methods: where Φ is totally unknown. Y = Φ(X ) + ɛ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37

34 Introduction Errors Linear vs Nonparametric Linear methods: Y = β T X + ɛ (a priori on the type of link between X and Y ) Here: nonparametric methods: Y = Φ(X ) + ɛ where Φ is totally unknown. (ML) Objective: Build Φ n from the observations such that its generalization error ELΦ n is (asymptotically) optimal. Example (regression framework) whatever (X, Y ) distribution. ELΦ n := E [ (Φ n (X ) Y ) 2] n + inf Φ ELΦ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 37

35 Introduction Use case Use case description Data kindly provided by Laurence Liaubet described in [Liaubet et al., 2011]: microarray data: expression of 272 selected genes over 56 individuals (pigs); a phenotype of interest (muscle ph) measured over the 56 invididuals (numerical variable). le 1: genes expressions le 2: muscle ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 10 / 37

36 Neural networks Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 11 / 37

37 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37

38 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; combination (network) of simple elements (neurons). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37

39 Neural networks Overview Basics [Bishop, 1995] Common properties (articial) Neural networks: general name for supervised and unsupervised methods developed in analogy to the brain; combination (network) of simple elements (neurons). Example of graphical representation: INPUTS OUTPUTS Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 37

40 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37

41 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37

42 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37

43 Neural networks Overview Main features A neural network is dened by: 1 the network structure; 2 the neuron type. Standard examples Multilayer perceptrons (MLP) Perceptron multi-couches: dedicated to supervised problems (classication and regression); Radial basis function networks (RBF): same purpose but based on local smoothing; Self-organizing maps (SOM): dedicated to unsupervised problems (clustering), self-organized;... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 37

44 Neural networks Overview MLP: Advantages/Drawbacks Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; accurate (universal approximation). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 37

45 Neural networks Overview MLP: Advantages/Drawbacks Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; accurate (universal approximation). Drawbacks hard to train (high computational cost, especially when d is large); overt easily; black box models (hard to interpret) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 37

46 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37

47 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37

48 Neural networks Multilayer perceptron Analogy to the brain Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37

49 Neural networks Multilayer perceptron Analogy to the brain If > activation threshold then Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37

50 Neural networks Multilayer perceptron Analogy to the brain If < activation threshold then Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 37

51 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37

52 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS Weighted links genes expressions Layer 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37

53 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Layer 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37

54 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: INPUTS genes expressions Layer 1 Layer 2 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37

55 Neural networks Multilayer perceptron (articial) Perceptron Layers MLP have one input layer (X values), one output layer (Y values) and several hidden layers (only 1 is necessary); no connections within a layer; connections between two consecutive layers (feedforward). Example: 2 hidden layers MLP INPUTS OUTPUTS genes expressions Layer 1 Layer 2 ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 37

56 Neural networks Multilayer perceptron A neuron v 3 w 3 v 2 v 1 w 2 w 1 + w 0 (Bias Biais) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37

57 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37

58 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Standard activation functions fonctions de lien / d'activation Biologically inspired: Heaviside function Seuil { 0 if t < threshold; S(t) = 1 if not. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37

59 Neural networks Multilayer perceptron A neuron v 3 w 3 w 2 v 2 w 2 + v 1 w 1 w 0 (Bias Biais) w 1 Standard activation functions Main issue with the Heaviside function: not continuous! Logistic function S(t) = 1 1+e t Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 37

60 Neural networks Multilayer perceptron Summary If Y is numeric, linear output: p x R d, F w (x) = j=1 w (2) S j ( d k=1 w (1) x k + w (0) kj j ). w (1) neuron 1 INPUTS x 1 x 2... x d w (1) dp No analytical expression!! +w (0) p neuron p w (2) 1 w (2) p F w (x) OUTPUTS Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 37

61 Neural networks Multilayer perceptron Summary If Y is a factor, logistic output: p x R d, P(X = C x = x) F w (x) = S j=1 w (2) S j ( d k=1 (with a maximum probability rule for the nal classication) ) w (1) x k + w (0) kj j w (1) neuron 1 INPUTS x 1 x 2... x d w (1) dp +w (0) p neuron p w (2) 1 w (2) p F w (x) OUTPUTS No analytical expression!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 37

62 Neural networks Multilayer perceptron Universal approximation [Hornik et al., 1989] (among others) For any given Φ, smooth enough and any precision ɛ, there exists a one-hidden layer perceptron (with sigmoïd activation functions) that approximates Φ with a precision at most ɛ. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 19 / 37

63 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) w n = arg min w 1 n i=1 n (y i F w (x i )) 2. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37

64 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) Main issues: w n = arg min w 1 n i=1 n (y i F w (x i )) 2. 1 no exact solution ( approximation algorithms, e.g., Newton's method + backpropagation principle): local minima; Erreur Poids Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37

65 Neural networks Learning/Tuning Learning weights p is given Chose w s.t.: (MSE minimization) Main issues: w n = arg min w 1 n i=1 n (y i F w (x i )) 2. 1 no exact solution ( approximation algorithms, e.g., Newton's method + backpropagation principle): local minima; 2 overtting: the larger p is, the more exible the perceptron is and the more it can overt the data. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 37

66 Neural networks Learning/Tuning Overtting Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 37

67 Neural networks Learning/Tuning Overtting Weight decay can help improve the generalization ability: w n 1 n = arg min (y i F w (x i )) 2 +λ w 2 w n i=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 37

68 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37

69 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). grid search using a simple validation; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37

70 Neural networks Learning/Tuning Tuning p and λ p and λ are called hyperparameters hyper-paramètres (not learned from the data but tuned). grid search using a simple validation; grid search using a (K -fold) cross validation (better but computationally expensive when K is large). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 37

71 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37

72 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ n-fold cross validation is called Leave-One-Out (LOO). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37

73 Neural networks Learning/Tuning Cross validation Algorithm 1: Set the grid search for p, G p, and λ, G λ 2: Split the data into K groups 3: for p G p and λ G λ do 4: for group = 1..K do 5: Train model p,λ,group without observations in group 6: Test error, MSE p,λ,group for observations in group 7: end for 8: Average MSE p,λ,group over group MSE p,λ 9: end for 10: Select p and λ with minimum MSE p,λ n-fold cross validation is called Leave-One-Out (LOO). Standard choice for K : LOO or 10-fold CV. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 37

74 CART Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 24 / 37

75 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37

76 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; can deal with a large number of input variables, either numeric variables or factors (a variable selection is included in the method); provide an intuitive interpretation. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37

77 CART Introduction Overview CART: Classication And Regression Trees introduced by [Breiman et al., 1984]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method: no prior assumption needed; can deal with a large number of input variables, either numeric variables or factors (a variable selection is included in the method); provide an intuitive interpretation. Drawbacks require a large training dataset to be ecient; as a consequence, are often too simple to provide accurate predictions. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 37

78 CART Introduction Example X = (Gender, Age, Height) and Y = Weight Root Height<1.60m Height>1.60m a split a node Gender=M Gender=F Age<30 Age>30 Y 1 Y 2 Y 3 a leaf (terminal node) Gender=M Gender=F Y 4 Y 5 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 26 / 37

79 CART Learning CART learning process Algorithm 1: Start from root 2: repeat 3: move to a new node 4: if the node is homogeneous or small enough then 5: STOP 6: else 7: split the node into two child nodes with maximal homogeneity 8: end if 9: until all nodes are processed Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 27 / 37

80 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37

81 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37

82 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37

83 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Hyperparameters can be tuned by cross-validation using a grid search. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37

84 CART Learning Further details Homogeneity? if Y is a numeric variable, variance of (y i ) i for the observations assigned to the node (Gini index is also sometimes used); if Y is a factor, node purity: % of observations assigned to the node whose Y values are not the node majority class. Stopping criteria? Minimum size node (generally 1 or 5) Minimum node purity or variance Maximum tree depth Hyperparameters can be tuned by cross-validation using a grid search. An alternative approach is pruning... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 37

85 CART Learning Choosing an optimal subtree Algorithm 1: Train the maximal tree, T 2: Pruning: Find an optimal subtrees sequence (T k ) k=1,...,k 3: By cross validation, nd the errors L(T k ) + λc(t k ) for k = 1,..., K where L is the error and C is a complexity measure (number of leafs) 4: Select the subtree s.t. L(T k ) + λc(t k ) is minimum Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 29 / 37

86 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37

87 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); the corresponding predicted ŷ new is if Y is numeric, the mean value of the observations (training set) assigned to the same leaf; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37

88 CART Prediction Making new predictions A new observation, x new is assigned to a leaf (straightforward); the corresponding predicted ŷ new is if Y is numeric, the mean value of the observations (training set) assigned to the same leaf; if Y is a factor, the majority class of the observations (training set) assigned to the same leaf. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 37

89 Random forest Outline 1 Introduction Background and notations Undertting / Overtting Errors Use case 2 Neural networks Overview Multilayer perceptron Learning/Tuning 3 CART Introduction Learning Prediction 4 Random forest Overview Bootstrap/Bagging Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 31 / 37

90 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37

91 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method (no prior assumption needed) and accurate; can deal with a large number of input variables, either numeric variables or factors; can deal with small samples. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37

92 Random forest Overview Advantages/Drawbacks Random Forest: introduced by [Breiman, 2001]. Advantages classication OR regression (i.e., Y can be a numeric variable or a factor); non parametric method (no prior assumption needed) and accurate; can deal with a large number of input variables, either numeric variables or factors; can deal with small samples. Drawbacks black box model; is not supported by strong mathematical results (consistency...) until now. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 37

93 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37

94 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37

95 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Bagging: combination of simple (and underecient) regression (or classication) functions; Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37

96 Random forest Bootstrap/Bagging Basic description A fact: When the sample size is small, you might be unable to estimate properly. This issue is commonly tackled by bootstrapping and, more specically, bagging (Boostrap Aggregating). Bagging: combination of simple (and underecient) regression (or classication) functions; Random forest CART bagging. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 37

97 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

98 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

99 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) 1 Build P bootstrap samples from (x i ) i 2 Use them to estimate X P times 3 The condence interval is based on the percentiles of the empirical distribution of X Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

100 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

101 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

102 Random forest Bootstrap/Bagging Bootstrap Bootstrap sample: random sampling (with replacement) of the training dataset. Standard bootstrap sample size: 2/3n. General (and robust) approach to solve several problems: Estimating condence intervals (of X with no prior assumption on the distribution of X ) Also useful to estimate p-values, residuals,... Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 37

103 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37

104 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Bagging with regression trees 1: for b = 1,..., B do 2: Construct a bootstrap sample ξ b 3: Train a regression tree from ξ b, ˆφ b 4: end for 5: Estimate the regression function by ˆΦ n (x) = 1 B B ˆφ b (x). b=1 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37

105 Random forest Bootstrap/Bagging Bagging Average the estimates of the regression (or the classication) function obtained from B bootstrap samples. Bagging with regression trees 1: for b = 1,..., B do 2: Construct a bootstrap sample ξ b 3: Train a regression tree from ξ b, ˆφ b 4: end for 5: Estimate the regression function by ˆΦ n (x) = 1 B B ˆφ b (x). b=1 For classication, the predicted class is the majority vote class. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 37

106 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37

107 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). 2 the tree is restricted to the rst l nodes with l small. Hyperparameters those of the CART algorithm; those that are specic to the random forest: q, bootstrap sample size, number of trees. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37

108 Random forest Random forest Random forests CART bagging with additional disturbances 1 each node is based on a random (and dierent) subset of q variables (an advisable choice for q is p for classication and p/3 for regression). 2 the tree is restricted to the rst l nodes with l small. Hyperparameters those of the CART algorithm; those that are specic to the random forest: q, bootstrap sample size, number of trees. Random forest are not very sensitive to hyper-parameters setting: default values for q and bootstrap sample size (2n/3) should work in most cases. Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 37

109 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37

110 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Stabilization of OOB error is a good indication that there is enough trees in the forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37

111 Random forest Random forest Additional tools OOB (Out-Of Bags) error: error based on the observations not included in the bag Importance of a variable to help interpretation: for a given variable X j 1: randomize the values of the variable 2: make predictions from this new dataset 3: the importance is the mean decrease in accuracy (MSE or misclassication rate) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37

112 References Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA. Breiman, L. (2001). Random forests. Machine Learning, 45(1):532. Breiman, L., Friedman, J., Olsen, R., and Stone, C. (1984). Classication and Regression Trees. Chapman and Hall, New York, USA. Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer feed-forward networks are universal approximators. Neural Networks, 2: Liaubet, L., Lobjois, V., Faraut, T., Tircazes, A., Benne, F., Iannuccelli, N., Pires, J., Glénisson, J., Robic, A., Le Roy, P., SanCristobal, M., and Cherel, P. (2011). Genetic variability or transcript abundance in pig peri-mortem skeletal muscle: eqtl localized genes involved in stress response, cell death, muscle disorders and metabolism. BMC Genomics, 12(548). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 37

Machine Learning - TP

Machine Learning - TP Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Introduction to machine learning

Introduction to machine learning 1/59 Introduction to machine learning Victor Kitov v.v.kitov@yandex.ru 1/59 Course information Instructor - Victor Vladimirovich Kitov Tasks of the course Structure: Tools lectures, seminars assignements:

More information

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Perceptron. (c) Marcin Sydow. Summary. Perceptron Topics covered by this lecture: Neuron and its properties Mathematical model of neuron: as a classier ' Learning Rule (Delta Rule) Neuron Human neural system has been a natural source of inspiration for

More information

Retrieval of Cloud Top Pressure

Retrieval of Cloud Top Pressure Master Thesis in Statistics and Data Mining Retrieval of Cloud Top Pressure Claudia Adok Division of Statistics and Machine Learning Department of Computer and Information Science Linköping University

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Deconstructing Data Science

Deconstructing Data Science econstructing ata Science avid Bamman, UC Berkeley Info 290 Lecture 6: ecision trees & random forests Feb 2, 2016 Linear regression eep learning ecision trees Ordinal regression Probabilistic graphical

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment

Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Machine Learning Approaches to Crop Yield Prediction and Climate Change Impact Assessment Andrew Crane-Droesch FCSM, March 2018 The views expressed are those of the authors and should not be attributed

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Fundamentals of Neural Network

Fundamentals of Neural Network Chapter 3 Fundamentals of Neural Network One of the main challenge in actual times researching, is the construction of AI (Articial Intelligence) systems. These systems could be understood as any physical

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

1/sqrt(B) convergence 1/B convergence B

1/sqrt(B) convergence 1/B convergence B The Error Coding Method and PICTs Gareth James and Trevor Hastie Department of Statistics, Stanford University March 29, 1998 Abstract A new family of plug-in classication techniques has recently been

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Hierarchical Boosting and Filter Generation

Hierarchical Boosting and Filter Generation January 29, 2007 Plan Combining Classifiers Boosting Neural Network Structure of AdaBoost Image processing Hierarchical Boosting Hierarchical Structure Filters Combining Classifiers Combining Classifiers

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters

Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Combination of M-Estimators and Neural Network Model to Analyze Inside/Outside Bark Tree Diameters Kyriaki Kitikidou, Elias Milios, Lazaros Iliadis, and Minas Kaymakis Democritus University of Thrace,

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Regression tree methods for subgroup identification I

Regression tree methods for subgroup identification I Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Combining estimates in regression and. classication. Michael LeBlanc. and. Robert Tibshirani. and. Department of Statistics. cuniversity of Toronto

Combining estimates in regression and. classication. Michael LeBlanc. and. Robert Tibshirani. and. Department of Statistics. cuniversity of Toronto Combining estimates in regression and classication Michael LeBlanc and Robert Tibshirani Department of Preventive Medicine and Biostatistics and Department of Statistics University of Toronto December

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Artificial neural networks

Artificial neural networks Artificial neural networks Chapter 8, Section 7 Artificial Intelligence, spring 203, Peter Ljunglöf; based on AIMA Slides c Stuart Russel and Peter Norvig, 2004 Chapter 8, Section 7 Outline Brains Neural

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

is used in the empirical trials and then discusses the results. In the nal section we draw together the main conclusions of this study and suggest fut

is used in the empirical trials and then discusses the results. In the nal section we draw together the main conclusions of this study and suggest fut Estimating Conditional Volatility with Neural Networks Ian T Nabney H W Cheng y 1 Introduction It is well known that one of the obstacles to eective forecasting of exchange rates is heteroscedasticity

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

CMSC 421: Neural Computation. Applications of Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks CMSC 42: Neural Computation definition synonyms neural networks artificial neural networks neural modeling connectionist models parallel distributed processing AI perspective Applications of Neural Networks

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999

In: Advances in Intelligent Data Analysis (AIDA), International Computer Science Conventions. Rochester New York, 1999 In: Advances in Intelligent Data Analysis (AIDA), Computational Intelligence Methods and Applications (CIMA), International Computer Science Conventions Rochester New York, 999 Feature Selection Based

More information

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information