Machine Learning - TP

Size: px
Start display at page:

Download "Machine Learning - TP"

Transcription

1 Machine Learning - TP Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 1 / 42

2 Packages in R is provided with basic functions but more than 3, 000 packages are available on the CRAN (Comprehensive R Archive Network) for additionnal functions (see also the project Bioconductor). Installing new packages (has to be done only once) with the command line: install. packages ( c ( " nnet "," e1071 "," rpart "," car ", " randomforest " )) with the menu (Windows or Mac OS X) Loading a package (has to be done each time R is re-started) library ( nnet ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 2 / 42

3 Working directory All les in proper directories/subdirectories can be downloaded at: either as individual les or as a full zip le inra-package.zip. The companion script ML-scriptR.R is made to be run from the subdirectory ML/TP of the provided material. For the computers in the classroom, this can be set by the following command line: setwd ( " / home / fp / Bureau / inra - package / ML / TP " ) If you are using Windows or Mac OS X, you can also choose to do it from the menu (in Fichier / Dénir le répertoire de travail). In any case, you must adapt the working directory if you want to run the script on another computer! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 3 / 42

4 Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 4 / 42

5 Introduction: Data importation and exploration Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 5 / 42

6 Introduction: Data importation and exploration Use case description Data kindly provided by Laurence Liaubet described in [Liaubet et al., 2011]: microarray data: expression of 272 selected genes over 57 individuals (pigs); a phenotype of interest (muscle ph) measured over the 57 invididuals (numerical variable). le 1: genes expressions le 2: muscle ph Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 6 / 42

7 Introduction: Data importation and exploration Load data # L o a d i n g g e n e s e x p r e s s i o n s d <- read. table ( ".. /.. / Data / sel _ data. csv ", sep = " ; ", header =T, row. names =1, dec = "," ) dim ( d ) names ( d ) summary ( d ) # L o a d i n g p H ph <- scan ( ".. /.. / Data / sel _ ph. csv ", dec = "," ) length ( ph ) summary ( ph ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 7 / 42

8 Introduction: Data importation and exploration Basic analysis # D a t a d i s t r i b u t i o n layout ( matrix ( c (1,1,2), ncol =3)) boxplot (d, main = " genes expressions " ) boxplot ( ph, main = " ph " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 8 / 42

9 Introduction: Data importation and exploration Correlation analysis # C o r r e l a t i o n a n a l y s i s heatmap ( as. matrix ( d )) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 9 / 42

10 Introduction: Data importation and exploration Correlation with ph # C o r r e l a t i o n w i t h p H ph. cor <- function ( x ) { cor (x, ph )} totalcor <- apply (d,1, ph. cor ) hist ( totalcor, main = " Correlations between genes expression \ n and ph ", xlab = " Correlations " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 10 / 42

11 Introduction: Data importation and exploration Classication and regression tasks 1 Regression: predict ph (numerical variable) from genes expressions. Useful to: help verify the strength of the relation between genes expressions and ph; help understand the nature of the relation. 2 Classication: predict whether the ph is smaller or greater than 5.7 from genes expressions. (toy example) # C l a s s e s d e f i n i t i o n ph. classes <- rep (0, length ( ph )) ph. classes [ ph >5.7] <- 1 table ( ph. classes ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 11 / 42

12 Introduction: Data importation and exploration Train/Test split: regression framework # I n i t i a l i z a t i o n a n d t r a i n i n g s a m p l e s e l e c t i o n set. seed ( ) training <- sample (1: ncol ( d ), round (0.8 * ncol ( d )), replace = F ) # M a t r i c e s d e f i n i t i o n d. train <- t ( d [, training ]) ph. train <- ph [ training ] d. test <- t ( d [, - training ]) ph. test <- ph [ - training ] # D a t a f r a m e s d e f i n i t i o n r. train <- cbind ( d. train, ph. train ) r. test <- cbind ( d. test, ph. test ) colnames ( r. train ) <- c ( colnames ( d. train ), " ph " ) colnames ( r. test ) <- c ( colnames ( d. train ), " ph " ) r. train <- data. frame ( r. train ) r. test <- data. frame ( r. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 12 / 42

13 Introduction: Data importation and exploration Train/Test split: classication framework # V e c t o r s d e f i n i t i o n phc. train <- ph. classes [ training ] phc. test <- ph. classes [ - training ] # D a t a f r a m e s d e f i n i t i o n c. train <- cbind ( d. train, phc. train ) c. test <- cbind ( d. test, phc. test ) colnames ( c. train ) <- c ( colnames ( d. train ), " phc " ) colnames ( c. test ) <- c ( colnames ( d. train ), " phc " ) c. train <- data. frame ( c. train ) c. test <- data. frame ( c. test ) # T r a n s f o r m i n g p H c i n t o a f a c t o r c. train $ phc <- factor ( c. train $ phc ) c. test $ phc <- factor ( c. test $ phc ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 13 / 42

14 Introduction: Data importation and exploration Training / Test sets (short) analysis layout ( matrix ( c (1,1,2,3,3,4), ncol =3, nrow =2, byrow = T )) boxplot ( d. train, main = " genes expressions ( train ) " ) boxplot ( ph. train, main = " ph ( train ) " ) boxplot ( d. test, main = " genes expressions ( test ) " ) boxplot ( ph. test, main = " ph ( test ) " ) table ( phc. train ) table ( phc. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 14 / 42

15 Neural networks Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 15 / 42

16 Neural networks Loading the data (to be consistent) and nnet library # L o a d i n g t h e d a t a load ( ".. /.. / Data / train - test. Rdata " ) MLP are unusable with a large number of predictors (here: 272 predictors for 45 observations only in the training set) selection of a relevant subset of variables (with LASSO): # L o a d i n g t h e s u b s e t o f p r e d i c t o r s load ( ".. /.. / Data / selected. Rdata " ) MLPs are provided in the nnet package: # L o a d i n g n n e t l i b r a r y library ( nnet ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 16 / 42

17 Neural networks Simple use: MLP in the regression framework Training # S i m p l e u s e : M L P w i t h p = 3 a n d n o d e c a y set. seed ( ) nn1 <- nnet ( d. train [, selected ], ph. train, size =3, Analysis print ( nn1 ) summary ( nn1 ) decay =0, maxit =500, linout = T ) # T r a i n i n g e r r o r a n d p s e u d o - R 2 mean (( ph. train - nn1 $ fitted )^2) 1 - mean (( ph. train - nn1 $ fitted )^2) / var ( ph. train ) # P r e d i c t i o n s ( t e s t s e t ) pred. test <- predict ( nn1, d. test [, selected ]) # T e s t e r r o r a n d p s e u d o - R 2 mean (( ph. test - pred. test )^2) 1 - mean (( ph. test - pred. test )^2) / var ( ph. train ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 17 / 42

18 Neural networks Predictions vs observations plot ( ph. train, nn1 $ fitted, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 18 / 42

19 Neural networks Check results variability set. seed ( ) nn2 <- nnet ( d. train [, selected ], ph. train, size =3, decay =0, maxit =500, linout = T )... Summary Train Test nn1 30.8% 44.9% nn2 99.8% -34!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 19 / 42

20 Neural networks Simple use: MLP in the classication framework Training # S i m p l e u s e : M L P w i t h p = 3 a n d n o d e c a y set. seed ( ) nnc1 <- nnet ( phc ~., data = c. train, size =3, decay =0, Analysis print ( nnc1 ) summary ( nnc1 ) # P r e d i c t i o n s nnc1 $ fitted # R e c o d i n g maxit =500, linout = F ) pred. train <- rep (0, length ( nnc1 $ fitted )) pred. train [ nnc1 $ fitted >0.5] <- 1 # T r a i n i n g e r r o r table ( pred. train, c. train $ phc ) sum ( pred. train! = c. train $ phc ) / length ( pred. train ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 20 / 42

21 Neural networks Test error # P r e d i c t i o n s a n d r e c o d i n g raw. pred. test <- predict ( nnc1, c. test ) pred. test <- rep (0, length ( raw. pred. test )) pred. test [ raw. pred. test >0.5] <- 1 # T e s t e r r o r table ( pred. test, c. test $ phc ) sum ( pred. test! = c. test $ phc ) / length ( pred. test ) Summary Train Observations 0 1 Predictions Test Observations 0 1 Predictions Overall misclassication rate: 2.22% 45.45% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 21 / 42

22 Neural networks Tuning MLP with the e1071 package Search for the best parameters with 10-fold CV library ( e1071 ) set. seed (1643) t. nnc2 <- tune. nnet ( phc ~., data = c. train [, c ( selected, ncol ( c. train ))], size = c (2,5,10), decay =10^( - c (10,8,6,4,2)), maxit =500, linout =F, tunecontrol = tune. control ( nrepeat =5, sampling = " cross ", cross =10)) Basic analysis of the output # L o o k i n g f o r t h e b e s t p a r a m e t e r s plot ( t. nnc2 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 22 / 42

23 Neural networks Best parameters? summary ( t. nnc2 ) t. nnc2 $ best. parameters # S e l e c t i n g t h e b e s t M L P nnc2 <- t. nnc2 $ best. model Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 23 / 42

24 Neural networks Results analysis # T r a i n i n g e r r o r pred. train <- rep (0, length ( nnc2 $ fitted )) pred. train [ nnc2 $ fitted >0.5] <- 1 table ( pred. train, c. train $ phc ) sum ( pred. train! = c. train $ phc ) / length ( pred. train ) # P r e d i c t i o n s a n d t e s t e r r o r raw. pred. test <- predict ( nnc2, c. test ) pred. test <- rep (0, length ( raw. pred. test )) pred. test [ raw. pred. test >0.5] <- 1 table ( pred. test, c. test $ phc ) sum ( pred. test! = c. test $ phc ) / length ( pred. test ) Summary Train Test nnc1 2.22% 45.45% nnc2 0% 22.27% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 24 / 42

25 CART Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 25 / 42

26 CART Regression tree training Training with the rpart package library ( rpart ) # R e g r e s s i o n t r e e tree1 <- rpart ( ph ~., data = r. train, control = rpart. control ( minsplit =2)) Basic analysis of the output print ( tree1 ) summary ( tree1 ) plot ( tree1 ) text ( tree1 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 26 / 42

27 CART Resulting tree tree1 $ where # l e a f n u m b e r tree1 $ frame # n o d e s f e a t u r e s Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 27 / 42

28 CART Performance analysis # T r a i n i n g p r e d i c t i o n s a n d e r r o r pred. train <- tree1 $ frame $ yval [ tree1 $ where ] mean (( pred. train - r. train $ ph )^2) 1 - mean (( pred. train - r. train $ ph )^2) / var ( ph. train ) # T e s t p r e d i c t i o n s a n d e r r o r pred. test <- predict ( tree1, r. test ) mean (( pred. test - r. test $ ph )^2) 1 - mean (( pred. test - r. test $ ph )^2) / var ( ph. train ) # F i t t e d v a l u e s v s T r u e v a l u e s plot ( ph. train, pred. train, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 28 / 42

29 CART Summary Numerical performance Train Test tree1 96.6% 11.6% nn2 99.8% -34!! Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 29 / 42

30 CART Classication tree training and tuning Training with the rpart package and tuning with the e1071 package # R a n d o m s e e d i n i t i a l i z a t i o n set. seed ( ) # T u n i n g t h e m i n i m a l n u m b e r o f o b s e r v a t i o n s i n a # n o d e ( m i n s p l i t ; d e f a u l t v a l u e i s 2 0 ) t. treec1 <- tune. rpart ( phc ~., data = c. train, minsplit = c (4,6,8,10), tunecontrol = tune. control ( sampling = " bootstrap ", nboot =20)) Basic analysis of the output plot ( t. treec1 ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 30 / 42

31 CART Tuning results t. treec1 $ best. parameters treec1 <- t. treec1 $ best. model Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 31 / 42

32 CART Basic analysis of the best tree summary ( treec1 ) plot ( treec1 ) text ( treec1 ) treec1 $ where # l e a f n u m b e r treec1 $ frame # n o d e s f e a t u r e s Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 32 / 42

33 CART Predictions and errors Training set # M a k e t h e p r e d i c t i o n pred. train <- predict ( treec1, c. train ) # F i n d o u t w h i c h c l a s s i s p r e d i c t e d pred. train <- apply ( pred. train,1, which. max ) library ( car ) # t o h a v e t h e " r e c o d e " f u n c t i o n pred. train <- recode ( pred. train, " 2=1;1=0 " ) # C a l c u l a t e m i s c l a s s i f i c a t i o n e r r o r table ( pred. train, phc. train ) sum ( pred. train! = phc. train ) / length ( pred. train ) Test set pred. test <- predict ( treec1, c. test ) pred. test <- apply ( pred. test,1, which. max ) pred. test <- recode ( pred. test, " 2=1;1=0 " ) table ( pred. test, phc. test ) sum ( pred. test! = phc. test ) / length ( pred. test ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 33 / 42

34 CART Classication performance summary Overall misclassication rates Train Test nnc1 2.22% 45.45% nnc % treec % Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 34 / 42

35 Random forest Outline 1 Introduction: Data importation and exploration 2 Neural networks 3 CART 4 Random forest Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 35 / 42

36 Random forest Random forest training (regression framework) Training with the randomforest package library ( randomforest ) set. seed ( ) rf1 <- randomforest ( d. train, ph. train, importance =T, keep. forest = T ) Basic analysis of the output plot ( rf1 ) rf1 $ ntree rf1 $ mtry rf1 $ importance Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 36 / 42

37 Random forest Predictions and errors # T r a i n i n g s e t ( o o b e r r o r a n d t r a i n i n g e r r o r ) mean (( ph. train - rf1 $ predicted )^2) rf1 $ mse 1 - mean (( ph. train - rf1 $ predicted )^2) / var ( ph. train ) 1 - mean (( ph. train - predict ( rf1, d. train ))^2) # T e s t s e t / var ( ph. train ) pred. test <- predict ( rf1, d. test ) mean (( ph. test - pred. test )^2) 1 - mean (( ph. test - pred. test )^2) / var ( ph. train ) MSE Train Test nn1 30.8% 44.9% nn2 99.8% -34 rf1 84.5% 51.2%% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 37 / 42

38 Random forest Predictions vs observations plot ( ph. train, rf1 $ predicted, xlab = " Observations ", ylab = " Fitted values ", main = " ", pch =3, col = " blue " ) points ( ph. test, pred. test, pch =19, col = " darkred " ) legend ( " bottomright ", pch = c (3,19), col = c ( " blue ", " darkred " ), legend = c ( " Train "," Test " )) abline (0,1, col = " darkgreen " ) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 38 / 42

39 Random forest Importance analysis layout ( matrix ( c (1,2), ncol =2)) barplot ( t ( rf1 $ importance [,1]), xlab = " variables ", ylab = " % Inc. MSE ", col = " darkred ", las =2, names = rep ( NA, nrow ( rf1 $ importance ))) barplot ( t ( rf1 $ importance [,2]), xlab = " variables ", ylab = " Inc. Node Purity ", col = " darkred ", las =2, names = rep ( NA, nrow ( rf1 $ importance ))) which ( rf1 $ importance [,1] >0.0005) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 39 / 42

40 Random forest Random forest training (classication framework) Training with the randomforest package (advanced features) set. seed ( ) rfc1 <- randomforest ( d. train, factor ( phc. train ), ntree =5000, mtry =150, sampsize =40, nodesize = 2, xtest = d. test, ytest = factor ( phc. test ), importance =T, keep. forest = F ) Basic analysis of the output plot ( rfc1 ) rfc1 $ importance Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 40 / 42

41 Random forest Predictions and errors # T r a i n i n g s e t ( o o b e r r o r a n d t r a i n i n g e r r o r ) table ( rfc1 $ predicted, phc. train ) sum ( rfc1 $ predicted! = phc. train ) / length ( phc. train ) # T e s t s e t table ( rfc1 $ test $ predicted, phc. test ) sum ( rfc1 $ test $ predicted! = phc. test ) / length ( phc. test ) Overall misclassication rates Train Test nnc1 2.22% 45.45% nnc % treec % rfc % 36.36% Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 41 / 42

42 Random forest Importance analysis # I m p o r t a n c e a n a l y s i s layout ( matrix ( c (1,2), ncol =2)) barplot ( t ( rfc1 $ importance [,3]), xlab = " variables ", ylab = " Mean Dec. Accu. ", col = " darkred ", las =2, names = rep ( NA, nrow ( rfc1 $ importance ))) barplot ( t ( rfc1 $ importance [,4]), xlab = " variables ", ylab = " Mean Dec. Gini ", col = " darkred ", las =2, names = rep ( NA, nrow ( rfc1 $ importance ))) which ( rfc1 $ importance [,3] >0.002) Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 42 / 42

43 Random forest Liaubet, L., Lobjois, V., Faraut, T., Tircazes, A., Benne, F., Iannuccelli, N., Pires, J., Glénisson, J., Robic, A., Le Roy, P., SanCristobal, M., and Cherel, P. (2011). Genetic variability or transcript abundance in pig peri-mortem skeletal muscle: eqtl localized genes involved in stress response, cell death, muscle disorders and metabolism. BMC Genomics, 12(548). Formation INRA (Niveau 3) Machine Learning Nathalie Villa-Vialaneix 42 / 42

Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Machine Learning. Nathalie Villa-Vialaneix -  Formation INRA, Niveau 3 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Titanic: Data Analysis

Titanic: Data Analysis Titanic: Data Analysis Victor Bernal Arzola May 2, 26 victor.bernal@mathmods.eu Introduction Data analysis is a process for obtaining raw data and converting it into information useful for decisionmaking

More information

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets

2013 Eric Pitman Summer Workshop in Computational Science....an introduction to R, statistics, programming, and getting to know datasets 2013 Eric Pitman Summer Workshop in Computational Science...an introduction to R, statistics, programming, and getting to know datasets Introducing the Workshop Project Here's what we'll cover: The story

More information

8. Classification and Regression Trees (CART, MRT & RF)

8. Classification and Regression Trees (CART, MRT & RF) 8. Classification and Regression Trees (CART, MRT & RF) Classification And Regression Tree analysis (CART) and its extension to multiple simultaneous response variables, Multivariate Regression Tree analysis

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Regression and Classification Trees

Regression and Classification Trees Regression and Classification Trees 1 Regression Trees The basic idea behind regression trees is the following: Group the n subjects into a bunch of groups based solely on the explanatory variables. Prediction

More information

Eric Pitman Summer Workshop in Computational Science

Eric Pitman Summer Workshop in Computational Science Eric Pitman Summer Workshop in Computational Science Intro to Project Introduction Jeanette Sperhac & Amanda Ruby Introducing the Workshop Project Here's what we'll cover: The story of the HWI protein

More information

Variable Selection and Weighting by Nearest Neighbor Ensembles

Variable Selection and Weighting by Nearest Neighbor Ensembles Variable Selection and Weighting by Nearest Neighbor Ensembles Jan Gertheiss (joint work with Gerhard Tutz) Department of Statistics University of Munich WNI 2008 Nearest Neighbor Methods Introduction

More information

Conditional variable importance in R package extendedforest

Conditional variable importance in R package extendedforest Conditional variable importance in R package extendedforest Stephen J. Smith, Nick Ellis, C. Roland Pitcher February 10, 2011 Contents 1 Introduction 1 2 Methods 2 2.1 Conditional permutation................................

More information

Prediction problems 3: Validation and Model Checking

Prediction problems 3: Validation and Model Checking Prediction problems 3: Validation and Model Checking Data Science 101 Team May 17, 2018 Outline Validation Why is it important How should we do it? Model checking Checking whether your model is a good

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Decision Support. Dr. Johan Hagelbäck.

Decision Support. Dr. Johan Hagelbäck. Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

More information

BART: Bayesian additive regression trees

BART: Bayesian additive regression trees BART: Bayesian additive regression trees Hedibert F. Lopes & Paulo Marques Insper Institute of Education and Research São Paulo, Brazil Most of the notes were kindly provided by Rob McCulloch (Arizona

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

1 Model Economy. 1.1 Demographics

1 Model Economy. 1.1 Demographics 1 Model Economy To quantify the effects demographic dynamics have on savings, investment, rate of return on the factors of production, and, finally, current account, we calibrate a two-country general

More information

The OmicCircos usages by examples

The OmicCircos usages by examples The OmicCircos usages by examples Ying Hu and Chunhua Yan October 30, 2017 Contents 1 Introduction 2 2 Input file formats 2 2.1 segment data............................................. 2 2.2 mapping data.............................................

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Statistical Prediction

Statistical Prediction Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent

More information

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition

CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data

More information

Multiple Kernel Self-Organizing Maps

Multiple Kernel Self-Organizing Maps Multiple Kernel Self-Organizing Maps Madalina Olteanu (2), Nathalie Villa-Vialaneix (1,2), Christine Cierco-Ayrolles (1) http://www.nathalievilla.org {madalina.olteanu,nathalie.villa}@univ-paris1.fr, christine.cierco@toulouse.inra.fr

More information

Hypothesis Evaluation

Hypothesis Evaluation Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

More information

Regression tree methods for subgroup identification I

Regression tree methods for subgroup identification I Regression tree methods for subgroup identification I Xu He Academy of Mathematics and Systems Science, Chinese Academy of Sciences March 25, 2014 Xu He (AMSS, CAS) March 25, 2014 1 / 34 Outline The problem

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

arxiv: v1 [math.st] 14 Mar 2016

arxiv: v1 [math.st] 14 Mar 2016 Impact of subsampling and pruning on random forests. arxiv:1603.04261v1 math.st] 14 Mar 2016 Roxane Duroux Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France roxane.duroux@upmc.fr Erwan Scornet

More information

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week:

Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Biology 559R: Introduction to Phylogenetic Comparative Methods Topics for this week: Course general information About the course Course objectives Comparative methods: An overview R as language: uses and

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Variable importance in RF. 1 Start. p < Conditional variable importance in RF. 2 n = 15 y = (0.4, 0.6) Other variable importance measures

Variable importance in RF. 1 Start. p < Conditional variable importance in RF. 2 n = 15 y = (0.4, 0.6) Other variable importance measures n = y = (.,.) n = 8 y = (.,.89) n = 8 > 8 n = y = (.88,.8) > > n = 9 y = (.8,.) n = > > > > n = n = 9 y = (.,.) y = (.,.889) > 8 > 8 n = y = (.,.8) n = n = 8 y = (.889,.) > 8 n = y = (.88,.8) n = y = (.8,.)

More information

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses UCLA Department of Statistics Statistical Consulting Center Introduction to Regression in R Part II: Multivariate Linear Regression Denise Ferrari denise@stat.ucla.edu Outline 1 Preliminaries 2 Introduction

More information

Machine Leanring Theory and Applications: third lecture

Machine Leanring Theory and Applications: third lecture Machine Leanring Theory and Applications: third lecture Arnak Dalalyan ENSAE PARISTECH 12 avril 2016 Framework and notation Framework and notation We observe (X i, Y i ) X Y, i = 1,..., n independent randomly

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications

Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Online Appendix to Mixed Modeling for Irregularly Sampled and Correlated Functional Data: Speech Science Spplications Marianne Pouplier, Jona Cederbaum, Philip Hoole, Stefania Marin, Sonja Greven R Syntax

More information

SB2b Statistical Machine Learning Bagging Decision Trees, ROC curves

SB2b Statistical Machine Learning Bagging Decision Trees, ROC curves SB2b Statistical Machine Learning Bagging Decision Trees, ROC curves Dino Sejdinovic (guest lecturer) Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~flaxman/course_ml.html

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data DEGseq: an R package for identifying differentially expressed genes from RNA-seq data Likun Wang Zhixing Feng i Wang iaowo Wang * and uegong Zhang * MOE Key Laboratory of Bioinformatics and Bioinformatics

More information

Robust Inference in Generalized Linear Models

Robust Inference in Generalized Linear Models Robust Inference in Generalized Linear Models Claudio Agostinelli claudio@unive.it Dipartimento di Statistica Università Ca Foscari di Venezia San Giobbe, Cannaregio 873, Venezia Tel. 041 2347446, Fax.

More information

Extra exercises for STK2100

Extra exercises for STK2100 Extra exercises for STK2100 Geir Storvik Vår 2017 Exercise 1 (Linear regression) Consider the standard linear regression model Y i = x i β + ε i, i = 1,..., n where ε 1,..., ε n iid N(0, σ 2 ). We can

More information

Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship

Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship Vladimir Svetnik, Andy Liaw, and Christopher Tong Biometrics Research, Merck & Co., Inc. P.O. Box 2000

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Data analysis strategies for high dimensional social science data M3 Conference May 2013

Data analysis strategies for high dimensional social science data M3 Conference May 2013 Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional

More information

Metric Predicted Variable With One Nominal Predictor Variable

Metric Predicted Variable With One Nominal Predictor Variable Metric Predicted Variable With One Nominal Predictor Variable Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Using R for Epidemiological Analysis

Using R for Epidemiological Analysis Using R for Epidemiological Analysis Practical 3: Disease Mapping in R Introduction In this session we will use R to do some disease mapping. We will work through an example of how to create basic maps

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom

More information

Informal Definition: Telling things apart

Informal Definition: Telling things apart 9. Decision Trees Informal Definition: Telling things apart 2 Nominal data No numeric feature vector Just a list or properties: Banana: longish, yellow Apple: round, medium sized, different colors like

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests

Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Supplementary material for Intervention in prediction measure: a new approach to assessing variable importance for random forests Irene Epifanio Dept. Matemàtiques and IMAC Universitat Jaume I Castelló,

More information

An introduction to geostatistics with R/gstat

An introduction to geostatistics with R/gstat An introduction to geostatistics with R/gstat D G Rossiter Cornell University, School of Integrative Plant Science, Crop and Soil Sciences Section February 4, 2015 Contents 1 Starting R 3 1.1 Installing

More information

Solutions to obligatorisk oppgave 2, STK2100

Solutions to obligatorisk oppgave 2, STK2100 Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler + Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

More information

Introduction to Statistics and R

Introduction to Statistics and R Introduction to Statistics and R Mayo-Illinois Computational Genomics Workshop (2018) Ruoqing Zhu, Ph.D. Department of Statistics, UIUC rqzhu@illinois.edu June 18, 2018 Abstract This document is a supplimentary

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R

Measurement, Scaling, and Dimensional Analysis Summer 2017 METRIC MDS IN R Measurement, Scaling, and Dimensional Analysis Summer 2017 Bill Jacoby METRIC MDS IN R This handout shows the contents of an R session that carries out a metric multidimensional scaling analysis of the

More information

Homework Assignment # 8

Homework Assignment # 8 Homework Assignment # 8 36-350, Data Mining SOLUTIONS 1. The Prototype Method Really Is a Linear Classifier (a) Show that the prototype method is really a linear classifier of the form sgn b + x w, and

More information

Introduction to R, Part I

Introduction to R, Part I Introduction to R, Part I Basic math, variables, and variable types Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here

More information

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No. Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No. 674 School of Statistics University of Minnesota March

More information

STA 411 (R. Neal) March 7, Assignment # 1

STA 411 (R. Neal) March 7, Assignment # 1 STA 411 (R. Neal) March 7, 2014 Discussion Linear Models Assignment # 1 Saša Milić 997 410 626 The average mean square errors for the linear models (for training set 1 and 2, respectively), with 8, 16,

More information

Leslie matrices and Markov chains.

Leslie matrices and Markov chains. Leslie matrices and Markov chains. Example. Suppose a certain species of insect can be divided into 2 classes, eggs and adults. 10% of eggs survive for 1 week to become adults, each adult yields an average

More information

BASIC TECHNOLOGY Pre K starts and shuts down computer, monitor, and printer E E D D P P P P P P P P P P

BASIC TECHNOLOGY Pre K starts and shuts down computer, monitor, and printer E E D D P P P P P P P P P P BASIC TECHNOLOGY Pre K 1 2 3 4 5 6 7 8 9 10 11 12 starts and shuts down computer, monitor, and printer P P P P P P practices responsible use and care of technology devices P P P P P P opens and quits an

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

Stat 401XV Final Exam Spring 2017

Stat 401XV Final Exam Spring 2017 Stat 40XV Final Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Package noise. July 29, 2016

Package noise. July 29, 2016 Type Package Package noise July 29, 2016 Title Estimation of Intrinsic and Extrinsic Noise from Single-Cell Data Version 1.0 Date 2016-07-28 Author Audrey Qiuyan Fu and Lior Pachter Maintainer Audrey Q.

More information

(S1) (S2) = =8.16

(S1) (S2) = =8.16 Formulae for Attributes As described in the manuscript, the first step of our method to create a predictive model of material properties is to compute attributes based on the composition of materials.

More information

Hierarchical Factor Analysis.

Hierarchical Factor Analysis. Hierarchical Factor Analysis. As published in Benchmarks RSS Matters, April 2013 http://web3.unt.edu/benchmarks/issues/2013/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD jonathan.starkweather@unt.edu

More information

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017 STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Assignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14

Assignment 4. Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14 Assignment 4 Machine Learning, Summer term 2014, Ulrike von Luxburg To be discussed in exercise groups on May 12-14 Exercise 1 (Rewriting the Fisher criterion for LDA, 2 points) criterion J(w) = w, m +

More information

Tutorial of some statistical methods using R language

Tutorial of some statistical methods using R language Tutorial of some statistical methods using R language Nobuhiko ENDO Japan Agency for Marine-Earth Science and Technology 1. Introduction Dr. Tomoshige Inoue (JAMSTEC) found that a linear relationship between

More information

Generalized Linear Models in R

Generalized Linear Models in R Generalized Linear Models in R NO ORDER Kenneth K. Lopiano, Garvesh Raskutti, Dan Yang last modified 28 4 2013 1 Outline 1. Background and preliminaries 2. Data manipulation and exercises 3. Data structures

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns

Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns Smoothed Prediction of the Onset of Tree Stem Radius Increase Based on Temperature Patterns Mikko Korpela 1 Harri Mäkinen 2 Mika Sulkava 1 Pekka Nöjd 2 Jaakko Hollmén 1 1 Helsinki

More information

GS Analysis of Microarray Data

GS Analysis of Microarray Data GS01 0163 Analysis of Microarray Data Keith Baggerly and Kevin Coombes Department of Bioinformatics and Computational Biology UT M. D. Anderson Cancer Center kabagg@mdanderson.org kcoombes@mdanderson.org

More information

Chuck Cartledge, PhD. 21 January 2018

Chuck Cartledge, PhD. 21 January 2018 Big Data: Data Analysis Boot Camp Non-SQL and R Chuck Cartledge, PhD 21 January 2018 1/19 Table of contents (1 of 1) 1 Intro. 2 Non-SQL DBMS Classic Non-SQL databases 3 Hands-on Airport connections as

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Multivariate Methods in Statistical Data Analysis

Multivariate Methods in Statistical Data Analysis Multivariate Methods in Statistical Data Analysis Web-Site: http://tmva.sourceforge.net/ See also: "TMVA - Toolkit for Multivariate Data Analysis, A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E.

More information

A Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601

A Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601 A Brief Introduction To GRTensor On MAPLE Platform A write-up for the presentation delivered on the same topic as a part of the course PHYS 601 March 2012 BY: ARSHDEEP SINGH BHATIA arshdeepsb@gmail.com

More information

Package NlinTS. September 10, 2018

Package NlinTS. September 10, 2018 Type Package Title Non Linear Time Series Analysis Version 1.3.5 Date 2018-09-05 Package NlinTS September 10, 2018 Maintainer Youssef Hmamouche Models for time series forecasting

More information

Package clr. December 3, 2018

Package clr. December 3, 2018 Type Package Package clr December 3, 2018 Title Curve Linear Regression via Dimension Reduction Version 0.1.0 Author Amandine Pierrot with contributions and/or help from Qiwei Yao, Haeran Cho, Yannig Goude

More information

Interpreting Regression

Interpreting Regression Interpreting Regression January 10, 2018 Contents Standard error of the estimate Homoscedasticity Using R to make interpretations about regresssion Questions In the tutorial on prediction we used the regression

More information

Interpreting Regression

Interpreting Regression Interpreting Regression January 27, 2019 Contents Standard error of the estimate Homoscedasticity Using R to make interpretations about regresssion Questions In the tutorial on prediction we used the regression

More information

The OmicCircos usages by examples

The OmicCircos usages by examples The OmicCircos usages by examples Ying Hu and Chunhua Yan October 14, 2013 Contents 1 Introduction 2 2 Input file formats 3 2.1 segment data............................................... 3 2.2 mapping

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Computer Assignment 8 - Discriminant Analysis. 1 Linear Discriminant Analysis

Computer Assignment 8 - Discriminant Analysis. 1 Linear Discriminant Analysis Computer Assignment 8 - Discriminant Analysis Created by James D. Wilson, UNC Chapel Hill Edited by Andrew Nobel and Kelly Bodwin, UNC Chapel Hill In this assignment, we will investigate another tool for

More information

STAT 675 Statistical Computing

STAT 675 Statistical Computing STAT 675 Statistical Computing Solutions to Homework Exercises Chapter 3 Note that some outputs may differ, depending on machine settings, generating seeds, random variate generation, etc. 3.1. Sample

More information

Contents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1.

Contents 1 Admin 2 General extensions 3 FWL theorem 4 Omitted variable bias 5 The R family Admin 1.1 What you will need Packages Data 1. 2 2 dplyr lfe readr MASS auto.csv plot() plot() ggplot2 plot() # Start the.jpeg driver jpeg("your_plot.jpeg") # Make the plot plot(x = 1:10, y = 1:10) # Turn off the driver dev.off() # Start the.pdf driver

More information

An experimental study of the intrinsic stability of random forest variable importance measures

An experimental study of the intrinsic stability of random forest variable importance measures Wang et al. BMC Bioinformatics (2016) 17:60 DOI 10.1186/s12859-016-0900-5 RESEARCH ARTICLE Open Access An experimental study of the intrinsic stability of random forest variable importance measures Huazhen

More information

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016

Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016 Case Study: Modelling Industrial Dryer Temperature Arun K. Tangirala 11/19/2016 Background This is a case study concerning time-series modelling of the temperature of an industrial dryer. Data set contains

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

An Introduction to the spls Package, Version 1.0

An Introduction to the spls Package, Version 1.0 An Introduction to the spls Package, Version 1.0 Dongjun Chung 1, Hyonho Chun 1 and Sündüz Keleş 1,2 1 Department of Statistics, University of Wisconsin Madison, WI 53706. 2 Department of Biostatistics

More information

Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation

Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation Sebastian F. Santibanez Urban4M - Humboldt University of Berlin / Department of Geography 135

More information

Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins

Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins Exercises for Applied Predictive Modeling Chapter 6 Linear Regression and Its Cousins Max Kuhn, Kjell Johnson Version 1 January 8, 2015 The solutions in this file uses several R packages not used in the

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

Supplementary Information ABC random forests for Bayesian parameter inference

Supplementary Information ABC random forests for Bayesian parameter inference Supplementary Information ABC random forests for Bayesian parameter inference Louis Raynal 1, Jean-Michel Marin 1,2, Pierre Pudlo 3, Mathieu Ribatet 1, Christian P. Robert 4,5, and Arnaud Estoup 2,6 1

More information