Machine Learning - TP
|
|
- Posy Owen
- 5 years ago
- Views:
Transcription
1 Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 1 / 42
2 Packages in R is provided with basic functions but more than 3, 000 packages are available on the CRAN (Comprehensive R Archive Network) for additionnal functions (see also the project Bioconductor). Installing new packages (has to be done only once) with the command line: install. packages ( c ( " nnet "," e1071 "," rpart "," car ", " randomforest " )) with the menu (Windows or Mac OS X) Loading a package (has to be done each time R is re-started) library ( nnet ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 2 / 42
3 Working directory All les in proper directories/subdirectories can be downloaded at: either as individual les or as a full zip le inra-package.zip. The companion script ML-scriptR.R is made to be run from the subdirectory ML/TP of the provided material. For the computers in the classroom, this can be set by the following command line: setwd ( " / home / fp / Bureau / inra - package / ML / TP " ) If you are using Windows or Mac OS X, you can also choose to do it from the menu (in Fichier / Dénir le répertoire de travail). In any case, you must adapt the working directory if you want to run the script on another computer! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 3 / 42
4 Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 42
5 Introduction: Data importation and exploration Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 42
6 Introduction: Data importation and exploration Use case description Data kindly provided by Laurence Liaubet described in [Liaubet et al., 2011]: microarray data: expression of 272 selected genes over 57 individuals (pigs); a phenotype of interest (muscle ph) measured over the 57 invididuals (numerical variable). le 1: genes expressions le 2: muscle ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 42
7 Introduction: Data importation and exploration Load data # L o a d i n g g e n e s e x p r e s s i o n s d <- read. table ( ".. /.. / Data / sel _ data. csv ", sep = " ; ", header =T, row. names =1, dec = "," ) dim ( d ) names ( d ) summary ( d ) # L o a d i n g p H ph <- scan ( ".. /.. / Data / sel _ ph. csv ", dec = "," ) length ( ph ) summary ( ph ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 42
8 Introduction: Data importation and exploration Basic analysis # D a t a d i s t r i b u t i o n layout ( matrix ( c (1,1,2), ncol =3)) boxplot (d, main = " genes expressions " ) boxplot ( ph, main = " ph " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 42
9 Introduction: Data importation and exploration Correlation analysis # C o r r e l a t i o n a n a l y s i s heatmap ( as. matrix ( d )) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 42
10 Introduction: Data importation and exploration Correlation with ph # C o r r e l a t i o n w i t h p H ph. cor <- function ( x ) { cor (x, ph )} totalcor <- apply (d,1, ph. cor ) hist ( totalcor, main = " Correlations between genes expression \ n and ph ", xlab = " Correlations " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 10 / 42
11 Introduction: Data importation and exploration Classication and regression tasks 1 Regression: predict ph (numerical variable) from genes expressions. Useful to: help verify the strength of the relation between genes expressions and ph; help understand the nature of the relation. 2 Classication: predict whether the ph is smaller or greater than 5.7 from genes expressions. (toy example) # C l a s s e s d e f i n i t i o n ph. classes <- rep (0, length ( ph )) ph. classes [ ph >5.7] <- 1 table ( ph. classes ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 11 / 42
12 Introduction: Data importation and exploration Train/Test split: regression framework # I n i t i a l i z a t i o n a n d t r a i n i n g s a m p l e s e l e c t i o n set. seed ( ) training <- sample (1: ncol ( d ), round (0.8 * ncol ( d )), replace = F ) # M a t r i c e s d e f i n i t i o n d. train <- t ( d [, training ]) ph. train <- ph [ training ] d. test <- t ( d [, - training ]) ph. test <- ph [ - training ] # D a t a f r a m e s d e f i n i t i o n r. train <- cbind ( d. train, ph. train ) r. test <- cbind ( d. test, ph. test ) colnames ( r. train ) <- c ( colnames ( d. train ), " ph " ) colnames ( r. test ) <- c ( colnames ( d. train ), " ph " ) r. train <- data. frame ( r. train ) r. test <- data. frame ( r. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 42
13 Introduction: Data importation and exploration Train/Test split: classication framework # V e c t o r s d e f i n i t i o n phc. train <- ph. classes [ training ] phc. test <- ph. classes [ - training ] # D a t a f r a m e s d e f i n i t i o n c. train <- cbind ( d. train, phc. train ) c. test <- cbind ( d. test, phc. test ) colnames ( c. train ) <- c ( colnames ( d. train ), " phc " ) colnames ( c. test ) <- c ( colnames ( d. train ), " phc " ) c. train <- data. frame ( c. train ) c. test <- data. frame ( c. test ) # T r a n s f o r m i n g p H c i n t o a f a c t o r c. train $ phc <- factor ( c. train $ phc ) c. test $ phc <- factor ( c. test $ phc ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 42
14 Introduction: Data importation and exploration Training / Test sets (short) analysis layout ( matrix ( c (1,1,2,3,3,4), ncol =3, nrow =2, byrow = T )) boxplot ( d. train, main = " genes expressions ( train ) " ) boxplot ( ph. train, main = " ph ( train ) " ) boxplot ( d. test, main = " genes expressions ( test ) " ) boxplot ( ph. test, main = " ph ( test ) " ) table ( phc. train ) table ( phc. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 42
15 Neural networks Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 42
16 Neural networks Loading the data (to be consistent) and nnet library # L o a d i n g t h e d a t a load ( ".. /.. / Data / train - test. Rdata " ) MLP are unusable with a large number of predictors (here: 272 predictors for 45 observations only in the training set) selection of a relevant subset of variables (with LASSO): # L o a d i n g t h e s u b s e t o f p r e d i c t o r s load ( ".. /.. / Data / selected. Rdata " ) MLPs are provided in the nnet package: # L o a d i n g n n e t l i b r a r y library ( nnet ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 42
17 Neural networks Simple use: MLP in the regression framework Training # S i m p l e u s e : M L P w i t h p = 3 a n d n o d e c a y set. seed ( ) nn1 <- nnet ( d. train [, selected ], ph. train, size =3, Analysis print ( nn1 ) summary ( nn1 ) decay =0, maxit =500, linout = T ) # T r a i n i n g e r r o r a n d p s e u d o - R 2 mean (( ph. train - nn1 $ fitted )^2) 1 - mean (( ph. train - nn1 $ fitted )^2) / var ( ph. train ) # P r e d i c t i o n s ( t e s t s e t ) pred. test <- predict ( nn1, d. test [, selected ]) # T e s t e r r o r a n d p s e u d o - R 2 mean (( ph. test - pred. test )^2) 1 - mean (( ph. test - pred. test )^2) / var ( ph. train ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 42
18 Neural networks Predictions vs observations plot ( ph. train, nn1 $ fitted, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 42
19 Neural networks Check results variability set. seed ( ) nn2 <- nnet ( d. train [, selected ], ph. train, size =3, decay =0, maxit =500, linout = T )... Summary Train Test nn1 30.8% 44.9% nn2 99.8% -34!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 19 / 42
20 Neural networks Simple use: MLP in the classication framework Training # S i m p l e u s e : M L P w i t h p = 3 a n d n o d e c a y set. seed ( ) nnc1 <- nnet ( phc ~., data = c. train, size =3, decay =0, Analysis print ( nnc1 ) summary ( nnc1 ) # P r e d i c t i o n s nnc1 $ fitted # R e c o d i n g maxit =500, linout = F ) pred. train <- rep (0, length ( nnc1 $ fitted )) pred. train [ nnc1 $ fitted >0.5] <- 1 # T r a i n i n g e r r o r table ( pred. train, c. train $ phc ) sum ( pred. train! = c. train $ phc ) / length ( pred. train ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 42
21 Neural networks Test error # P r e d i c t i o n s a n d r e c o d i n g raw. pred. test <- predict ( nnc1, c. test ) pred. test <- rep (0, length ( raw. pred. test )) pred. test [ raw. pred. test >0.5] <- 1 # T e s t e r r o r table ( pred. test, c. test $ phc ) sum ( pred. test! = c. test $ phc ) / length ( pred. test ) Summary Train Observations 0 1 Predictions Test Observations 0 1 Predictions Overall misclassication rate: 2.22% 45.45% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 42
22 Neural networks Tuning MLP with the e1071 package Search for the best parameters with 10-fold CV library ( e1071 ) set. seed (1643) t. nnc2 <- tune. nnet ( phc ~., data = c. train [, c ( selected, ncol ( c. train ))], size = c (2,5,10), decay =10^( - c (10,8,6,4,2)), maxit =500, linout =F, tunecontrol = tune. control ( nrepeat =5, sampling = " cross ", cross =10)) Basic analysis of the output # L o o k i n g f o r t h e b e s t p a r a m e t e r s plot ( t. nnc2 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 42
23 Neural networks Best parameters? summary ( t. nnc2 ) t. nnc2 $ best. parameters # S e l e c t i n g t h e b e s t M L P nnc2 <- t. nnc2 $ best. model Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 42
24 Neural networks Results analysis # T r a i n i n g e r r o r pred. train <- rep (0, length ( nnc2 $ fitted )) pred. train [ nnc2 $ fitted >0.5] <- 1 table ( pred. train, c. train $ phc ) sum ( pred. train! = c. train $ phc ) / length ( pred. train ) # P r e d i c t i o n s a n d t e s t e r r o r raw. pred. test <- predict ( nnc2, c. test ) pred. test <- rep (0, length ( raw. pred. test )) pred. test [ raw. pred. test >0.5] <- 1 table ( pred. test, c. test $ phc ) sum ( pred. test! = c. test $ phc ) / length ( pred. test ) Summary Train Test nnc1 2.22% 45.45% nnc2 0% 22.27% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 24 / 42
25 CART Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 42
26 CART Regression tree training Training with the rpart package library ( rpart ) # R e g r e s s i o n t r e e tree1 <- rpart ( ph ~., data = r. train, control = rpart. control ( minsplit =2)) Basic analysis of the output print ( tree1 ) summary ( tree1 ) plot ( tree1 ) text ( tree1 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 26 / 42
27 CART Resulting tree tree1 $ where # l e a f n u m b e r tree1 $ frame # n o d e s f e a t u r e s Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 27 / 42
28 CART Performance analysis # T r a i n i n g p r e d i c t i o n s a n d e r r o r pred. train <- tree1 $ frame $ yval [ tree1 $ where ] mean (( pred. train - r. train $ ph )^2) 1 - mean (( pred. train - r. train $ ph )^2) / var ( ph. train ) # T e s t p r e d i c t i o n s a n d e r r o r pred. test <- predict ( tree1, r. test ) mean (( pred. test - r. test $ ph )^2) 1 - mean (( pred. test - r. test $ ph )^2) / var ( ph. train ) # F i t t e d v a l u e s v s T r u e v a l u e s plot ( ph. train, pred. train, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 42
29 CART Summary Numerical performance Train Test tree1 96.6% 11.6% nn2 99.8% -34!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 29 / 42
30 CART Classication tree training and tuning Training with the rpart package and tuning with the e1071 package # R a n d o m s e e d i n i t i a l i z a t i o n set. seed ( ) # T u n i n g t h e m i n i m a l n u m b e r o f o b s e r v a t i o n s i n a # n o d e ( m i n s p l i t ; d e f a u l t v a l u e i s 2 0 ) t. treec1 <- tune. rpart ( phc ~., data = c. train, minsplit = c (4,6,8,10), tunecontrol = tune. control ( sampling = " bootstrap ", nboot =20)) Basic analysis of the output plot ( t. treec1 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 42
31 CART Tuning results t. treec1 $ best. parameters treec1 <- t. treec1 $ best. model Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 31 / 42
32 CART Basic analysis of the best tree summary ( treec1 ) plot ( treec1 ) text ( treec1 ) treec1 $ where # l e a f n u m b e r treec1 $ frame # n o d e s f e a t u r e s Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 42
33 CART Predictions and errors Training set # M a k e t h e p r e d i c t i o n pred. train <- predict ( treec1, c. train ) # F i n d o u t w h i c h c l a s s i s p r e d i c t e d pred. train <- apply ( pred. train,1, which. max ) library ( car ) # t o h a v e t h e " r e c o d e " f u n c t i o n pred. train <- recode ( pred. train, " 2=1;1=0 " ) # C a l c u l a t e m i s c l a s s i f i c a t i o n e r r o r table ( pred. train, phc. train ) sum ( pred. train! = phc. train ) / length ( pred. train ) Test set pred. test <- predict ( treec1, c. test ) pred. test <- apply ( pred. test,1, which. max ) pred. test <- recode ( pred. test, " 2=1;1=0 " ) table ( pred. test, phc. test ) sum ( pred. test! = phc. test ) / length ( pred. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 42
34 CART Classication performance summary Overall misclassication rates Train Test nnc1 2.22% 45.45% nnc % treec % Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 42
35 Random forest Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 42
36 Random forest Random forest training (regression framework) Training with the randomforest package library ( randomforest ) set. seed ( ) rf1 <- randomforest ( d. train, ph. train, importance =T, keep. forest = T ) Basic analysis of the output plot ( rf1 ) rf1 $ ntree rf1 $ mtry rf1 $ importance Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 42
37 Random forest Predictions and errors # T r a i n i n g s e t ( o o b e r r o r a n d t r a i n i n g e r r o r ) mean (( ph. train - rf1 $ predicted )^2) rf1 $ mse 1 - mean (( ph. train - rf1 $ predicted )^2) / var ( ph. train ) 1 - mean (( ph. train - predict ( rf1, d. train ))^2) # T e s t s e t / var ( ph. train ) pred. test <- predict ( rf1, d. test ) mean (( ph. test - pred. test )^2) 1 - mean (( ph. test - pred. test )^2) / var ( ph. train ) MSE Train Test nn1 30.8% 44.9% nn2 99.8% -34 rf1 84.5% 51.2%% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 42
38 Random forest Predictions vs observations plot ( ph. train, rf1 $ predicted, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 38 / 42
39 Random forest Importance analysis layout ( matrix ( c (1,2), ncol =2)) barplot ( t ( rf1 $ importance [,1]), xlab = " variables ", ylab = " % Inc. MSE ", col = " darkred ", las =2, names = rep ( NA, nrow ( rf1 $ importance ))) barplot ( t ( rf1 $ importance [,2]), xlab = " variables ", ylab = " Inc. Node Purity ", col = " darkred ", las =2, names = rep ( NA, nrow ( rf1 $ importance ))) which ( rf1 $ importance [,1] >0.0005) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 39 / 42
40 Random forest Random forest training (classication framework) Training with the randomforest package (advanced features) set. seed ( ) rfc1 <- randomforest ( d. train, factor ( phc. train ), ntree =5000, mtry =150, sampsize =40, nodesize = 2, xtest = d. test, ytest = factor ( phc. test ), importance =T, keep. forest = F ) Basic analysis of the output plot ( rfc1 ) rfc1 $ importance Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 40 / 42
41 Random forest Predictions and errors # T r a i n i n g s e t ( o o b e r r o r a n d t r a i n i n g e r r o r ) table ( rfc1 $ predicted, phc. train ) sum ( rfc1 $ predicted! = phc. train ) / length ( phc. train ) # T e s t s e t table ( rfc1 $ test $ predicted, phc. test ) sum ( rfc1 $ test $ predicted! = phc. test ) / length ( phc. test ) Overall misclassication rates Train Test nnc1 2.22% 45.45% nnc % treec % rfc % 36.36% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 41 / 42
42 Random forest Importance analysis # I m p o r t a n c e a n a l y s i s layout ( matrix ( c (1,2), ncol =2)) barplot ( t ( rfc1 $ importance [,3]), xlab = " variables ", ylab = " Mean Dec. Accu. ", col = " darkred ", las =2, names = rep ( NA, nrow ( rfc1 $ importance ))) barplot ( t ( rfc1 $ importance [,4]), xlab = " variables ", ylab = " Mean Dec. Gini ", col = " darkred ", las =2, names = rep ( NA, nrow ( rfc1 $ importance ))) which ( rfc1 $ importance [,3] >0.002) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 42 / 42
43 Random forest Liaubet, L., Lobjois, V., Faraut, T., Tircazes, A., Benne, F., Iannuccelli, N., Pires, J., Glénisson, J., Robic, A., Le Roy, P., SanCristobal, M., and Cherel, P. (2011). Genetic variability or transcript abundance in pig peri-mortem skeletal muscle: eqtl localized genes involved in stress response, cell death, muscle disorders and metabolism. BMC Genomics, 12(548). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 42 / 42
Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3
Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationTitanic: Data Analysis
Titanic: Data Analysis Victor Bernal Arzola May 2, 26 victor.bernal@mathmods.eu Introduction Data analysis is a process for obtaining raw data and converting it into information useful for decisionmaking
More information2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets
2013 Eric Pitman Summer Workshop in Computational Science...an introduction to R, statistics, programming, and getting to know datasets Introducing the Workshop Project Here's what we'll cover: The story
More information8. Classification and Regression Trees (CART, MRT & RF)
8. Classification and Regression Trees (CART, MRT & RF) Classification And Regression Tree analysis (CART) and its extension to multiple simultaneous response variables, Multivariate Regression Tree analysis
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationRegression and Classification Trees
Regression and Classification Trees 1 Regression Trees The basic idea behind regression trees is the following: Group the n subjects into a bunch of groups based solely on the explanatory variables. Prediction
More informationEric Pitman Summer Workshop in Computational Science
Eric Pitman Summer Workshop in Computational Science Intro to Project Introduction Jeanette Sperhac & Amanda Ruby Introducing the Workshop Project Here's what we'll cover: The story of the HWI protein
More informationVariable Selection and Weighting by Nearest Neighbor Ensembles
Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction
More informationConditional variable importance in R package extendedforest
Conditional variable importance in R package extendedforest Stephen J. Smith, Nick Ellis, C. Roland Pitcher February 10, 2011 Contents 1 Introduction 1 2 Methods 2 2.1 Conditional permutation................................
More informationPrediction problems 3: Validation and Model Checking
Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationDecision Support. Dr. Johan Hagelbäck.
Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems
More informationBART: Bayesian additive regression trees
BART: Bayesian additive regression trees Hedibert F. Lopes & Paulo Marques Insper Institute of Education and Research São Paulo, Brazil Most of the notes were kindly provided by Rob McCulloch (Arizona
More informationABC random forest for parameter estimation. Jean-Michel Marin
ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint
More information1 Model Economy. 1.1 Demographics
1 Model Economy To quantify the effects demographic dynamics have on savings, investment, rate of return on the factors of production, and, finally, current account, we calibrate a two-country general
More informationThe OmicCircos usages by examples
The OmicCircos usages by examples Ying Hu and Chunhua Yan October 30, 2017 Contents 1 Introduction 2 2 Input file formats 2 2.1 segment data............................................. 2 2.2 mapping data.............................................
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationStatistical Prediction
Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationMultiple Kernel Self-Organizing Maps
Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr
More informationHypothesis Evaluation
Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction
More informationRegression tree methods for subgroup identification I
Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem
More informationMaking Our Cities Safer: A Study In Neighbhorhood Crime Patterns
Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence
More informationarxiv: v1 [math.st] 14 Mar 2016
Impact of subsampling and pruning on random forests. arxiv:1603.04261v1 math.st] 14 Mar 2016 Roxane Duroux Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France roxane.duroux@upmc.fr Erwan Scornet
More informationBiology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:
Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationVariable importance in RF. 1 Start. p < Conditional variable importance in RF. 2 n = 15 y = (0.4, 0.6) Other variable importance measures
n = y = (.,.) n = 8 y = (.,.89) n = 8 > 8 n = y = (.88,.8) > > n = 9 y = (.8,.) n = > > > > n = n = 9 y = (.,.) y = (.,.889) > 8 > 8 n = y = (.,.8) n = n = 8 y = (.889,.) > 8 n = y = (.88,.8) n = y = (.8,.)
More informationOutline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses
UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction
More informationMachine Leanring Theory and Applications: third lecture
Machine Leanring Theory and Applications: third lecture Arnak Dalalyan ENSAE PARISTECH 12 avril 2016 Framework and notation Framework and notation We observe (X i, Y i ) X Y, i = 1,..., n independent randomly
More informationAdvanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)
Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,
More informationOnline Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications
Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Marianne Pouplier, Jona Cederbaum, Philip Hoole, Stefania Marin, Sonja Greven R Syntax
More informationSB2b Statistical Machine Learning Bagging Decision Trees, ROC curves
SB2b Statistical Machine Learning Bagging Decision Trees, ROC curves Dino Sejdinovic (guest lecturer) Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~flaxman/course_ml.html
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationDEGseq: an R package for identifying differentially expressed genes from RNA-seq data
DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics
More informationRobust Inference in Generalized Linear Models
Robust Inference in Generalized Linear Models Claudio Agostinelli claudio@unive.it Dipartimento di Statistica Università Ca Foscari di Venezia San Giobbe, Cannaregio 873, Venezia Tel. 041 2347446, Fax.
More informationExtra exercises for STK2100
Extra exercises for STK2100 Geir Storvik Vår 2017 Exercise 1 (Linear regression) Consider the standard linear regression model Y i = x i β + ε i, i = 1,..., n where ε 1,..., ε n iid N(0, σ 2 ). We can
More informationVariable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship
Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship Vladimir Svetnik, Andy Liaw, and Christopher Tong Biometrics Research, Merck & Co., Inc. P.O. Box 2000
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationMetric Predicted Variable With One Nominal Predictor Variable
Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationUsing R for Epidemiological Analysis
Using R for Epidemiological Analysis Practical 3: Disease Mapping in R Introduction In this session we will use R to do some disease mapping. We will work through an example of how to create basic maps
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationInformal Definition: Telling things apart
9. Decision Trees Informal Definition: Telling things apart 2 Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationInferences on Linear Combinations of Coefficients
Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you
More informationSupplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests
Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,
More informationAn introduction to geostatistics with R/gstat
An introduction to geostatistics with R/gstat D G Rossiter Cornell University, School of Integrative Plant Science, Crop and Soil Sciences Section February 4, 2015 Contents 1 Starting R 3 1.1 Installing
More informationSolutions to obligatorisk oppgave 2, STK2100
Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationIntroduction to Statistics and R
Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationMeasurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R
Measurement, Scaling, and Dimensional Analysis Summer 2017 Bill Jacoby METRIC MDS IN R This handout shows the contents of an R session that carries out a metric multidimensional scaling analysis of the
More informationHomework Assignment # 8
Homework Assignment # 8 36-350, Data Mining SOLUTIONS 1. The Prototype Method Really Is a Linear Classifier (a) Show that the prototype method is really a linear classifier of the form sgn b + x w, and
More informationIntroduction to R, Part I
Introduction to R, Part I Basic math, variables, and variable types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here
More informationHypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.
Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No. 674 School of Statistics University of Minnesota March
More informationSTA 411 (R. Neal) March 7, Assignment # 1
STA 411 (R. Neal) March 7, 2014 Discussion Linear Models Assignment # 1 Saša Milić 997 410 626 The average mean square errors for the linear models (for training set 1 and 2, respectively), with 8, 16,
More informationLeslie matrices and Markov chains.
Leslie matrices and Markov chains. Example. Suppose a certain species of insect can be divided into 2 classes, eggs and adults. 10% of eggs survive for 1 week to become adults, each adult yields an average
More informationBASIC TECHNOLOGY Pre K starts and shuts down computer, monitor, and printer E E D D P P P P P P P P P P
BASIC TECHNOLOGY Pre K 1 2 3 4 5 6 7 8 9 10 11 12 starts and shuts down computer, monitor, and printer P P P P P P practices responsible use and care of technology devices P P P P P P opens and quits an
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit
More informationStat 401XV Final Exam Spring 2017
Stat 40XV Final Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationPackage noise. July 29, 2016
Type Package Package noise July 29, 2016 Title Estimation of Intrinsic and Extrinsic Noise from Single-Cell Data Version 1.0 Date 2016-07-28 Author Audrey Qiuyan Fu and Lior Pachter Maintainer Audrey Q.
More information(S1) (S2) = =8.16
Formulae for Attributes As described in the manuscript, the first step of our method to create a predictive model of material properties is to compute attributes based on the composition of materials.
More informationHierarchical Factor Analysis.
Hierarchical Factor Analysis. As published in Benchmarks RSS Matters, April 2013 http://web3.unt.edu/benchmarks/issues/2013/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD jonathan.starkweather@unt.edu
More informationSTK 2100 Oblig 1. Zhou Siyu. February 15, 2017
STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationAssignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14
Assignment 4 Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14 Exercise 1 (Rewriting the Fisher criterion for LDA, 2 points) criterion J(w) = w, m +
More informationTutorial of some statistical methods using R language
Tutorial of some statistical methods using R language Nobuhiko ENDO Japan Agency for Marine-Earth Science and Technology 1. Introduction Dr. Tomoshige Inoue (JAMSTEC) found that a linear relationship between
More informationGeneralized Linear Models in R
Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationSmoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns
Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns Mikko Korpela 1 Harri Mäkinen 2 Mika Sulkava 1 Pekka Nöjd 2 Jaakko Hollmén 1 1 Helsinki
More informationGS Analysis of Microarray Data
GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org
More informationChuck Cartledge, PhD. 21 January 2018
Big Data: Data Analysis Boot Camp Non-SQL and R Chuck Cartledge, PhD 21 January 2018 1/19 Table of contents (1 of 1) 1 Intro. 2 Non-SQL DBMS Classic Non-SQL databases 3 Hands-on Airport connections as
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMultivariate Methods in Statistical Data Analysis
Multivariate Methods in Statistical Data Analysis Web-Site: http://tmva.sourceforge.net/ See also: "TMVA - Toolkit for Multivariate Data Analysis, A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E.
More informationA Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601
A Brief Introduction To GRTensor On MAPLE Platform A write-up for the presentation delivered on the same topic as a part of the course PHYS 601 March 2012 BY: ARSHDEEP SINGH BHATIA arshdeepsb@gmail.com
More informationPackage NlinTS. September 10, 2018
Type Package Title Non Linear Time Series Analysis Version 1.3.5 Date 2018-09-05 Package NlinTS September 10, 2018 Maintainer Youssef Hmamouche Models for time series forecasting
More informationPackage clr. December 3, 2018
Type Package Package clr December 3, 2018 Title Curve Linear Regression via Dimension Reduction Version 0.1.0 Author Amandine Pierrot with contributions and/or help from Qiwei Yao, Haeran Cho, Yannig Goude
More informationInterpreting Regression
Interpreting Regression January 10, 2018 Contents Standard error of the estimate Homoscedasticity Using R to make interpretations about regresssion Questions In the tutorial on prediction we used the regression
More informationInterpreting Regression
Interpreting Regression January 27, 2019 Contents Standard error of the estimate Homoscedasticity Using R to make interpretations about regresssion Questions In the tutorial on prediction we used the regression
More informationThe OmicCircos usages by examples
The OmicCircos usages by examples Ying Hu and Chunhua Yan October 14, 2013 Contents 1 Introduction 2 2 Input file formats 3 2.1 segment data............................................... 3 2.2 mapping
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationComputer Assignment 8 - Discriminant Analysis. 1 Linear Discriminant Analysis
Computer Assignment 8 - Discriminant Analysis Created by James D. Wilson, UNC Chapel Hill Edited by Andrew Nobel and Kelly Bodwin, UNC Chapel Hill In this assignment, we will investigate another tool for
More informationSTAT 675 Statistical Computing
STAT 675 Statistical Computing Solutions to Homework Exercises Chapter 3 Note that some outputs may differ, depending on machine settings, generating seeds, random variate generation, etc. 3.1. Sample
More informationContents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.
2 2 dplyr lfe readr MASS auto.csv plot() plot() ggplot2 plot() # Start the.jpeg driver jpeg("your_plot.jpeg") # Make the plot plot(x = 1:10, y = 1:10) # Turn off the driver dev.off() # Start the.pdf driver
More informationAn experimental study of the intrinsic stability of random forest variable importance measures
Wang et al. BMC Bioinformatics (2016) 17:60 DOI 10.1186/s12859-016-0900-5 RESEARCH ARTICLE Open Access An experimental study of the intrinsic stability of random forest variable importance measures Huazhen
More informationCase Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016
Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016 Background This is a case study concerning time-series modelling of the temperature of an industrial dryer. Data set contains
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationAn Introduction to the spls Package, Version 1.0
An Introduction to the spls Package, Version 1.0 Dongjun Chung 1, Hyonho Chun 1 and Sündüz Keleş 1,2 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics
More informationPerformance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation
Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation Sebastian F. Santibanez Urban4M - Humboldt University of Berlin / Department of Geography 135
More informationExercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins
Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins Max Kuhn, Kjell Johnson Version 1 January 8, 2015 The solutions in this file uses several R packages not used in the
More informationIntroduction to Machine Learning and Cross-Validation
Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance
More informationSupplementary Information ABC random forests for Bayesian parameter inference
Supplementary Information ABC random forests for Bayesian parameter inference Louis Raynal 1, Jean-Michel Marin 1,2, Pierre Pudlo 3, Mathieu Ribatet 1, Christian P. Robert 4,5, and Arnaud Estoup 2,6 1
More information