ABC random forest for parameter estimation. Jean-Michel Marin

Size: px
Start display at page:

Download "ABC random forest for parameter estimation. Jean-Michel Marin"

Transcription

1 ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint work with Pierre Pudlo, Louis Raynal, Mathieu Ribatet and Christian Robert ABCruise, Helsinki 1/39

2 Introduction We consider statistical models for which no explicit forms for the likelihood are available or that a single evaluation of the latter is too CPU demanding. = numerous heterogeneous latent variables (like with the coalescent model) or intractable normalizing constant of the likelihood (like with Gibbs random fields) We focus on Approximate Bayesian Computation methods. ABCruise, Helsinki 2/39

3 The principle of ABC is to conduct Bayesian inference on a dataset through comparisons with numerous simulated datasets. We assume that it is possible to generate realizations from the statistical model under consideration. It suffers from two major difficulties: to ensure reliability of the method, the number of simulations should be large, calibration has always been a critical step in ABC implementation. ABCruise, Helsinki 3/39

4 Idea: use regression or quantile Random Forests (RF) to estimate some quantities of interest: posterior expectations, variances, quantiles or covariances Why Random Forests? = RF regression and quantile methods were shown to be mostly insensitive both to strong correlations between predictors (here the summary statistics) and to the presence of noisy variables. = Using such a strategy less number of simulations and no calibration! ABCruise, Helsinki 4/39

5 Extend the work of Pudlo et al. (2016) to the case of parameters estimation: Pudlo et al. (JMM & CPR) (2016) Reliable ABC model choice via random forests, Bioinformatics Related methods: adjusted local linear: Beaumont et al. (2002) Approximate Bayesian computation in population genetics, Genetics ridge regression: Blum et al. (2013) A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation, Statistical Science adjusted neural networks: Blum and François (2010) Non-linear regression models for Approximate Bayesian Computation, Statistics and Computing ABCruise, Helsinki 5/39

6 Outline of the talk - Recap on random Forests 1. ClAssification and Regression Trees (CART) 2. Bootstrap AGGregatING 3. Random Forests - The ODOF methodology 1. Posterior Expectations 2. Quantiles 3. Variances 4. Covariances - Simulation study: a Gaussian toy example - Simulation study: a regression toy example ABCruise, Helsinki 6/39

7 Recap on Random Forest (inspired from Adele Cutler September 15-17, 2010 Ovronnaz, Switzerland) The work of Leo Breiman ( ) Breiman et al. (1984) Classification and Regression Trees, Wadsworth Statistics/Probability Breiman (1996) Bagging, Learning Machine Breiman (2001) Random Forests, Machine Learning ABCruise, Helsinki 7/39

8 1. Classification and Regression Trees (CART) Grow a binary tree At each node, split the data into two daughter nodes Splits are chosen using a splitting criterion For regression the predicted values at terminal nodes or leaves are the average response variable for all observations in the leaves For classification the predicted class is the most common class in the leaves (majority vote) ABCruise, Helsinki 8/39

9 Splitting criteria Regression: Residual Sum of Squares (y i y R ) 2 left (y i y L ) 2 + right where y L mean y-value for left node y R mean y-value for right node Classification: Gini criterion N L K k=1 p kl (1 p kl ) + N R K k=1 where p kl proportion of class k in left node p kr proportion of class k in right node p kr (1 p kr ) ABCruise, Helsinki 9/39

10 ABCruise, Helsinki 10/39

11 ABCruise, Helsinki 11/39

12 Advantages Computationally simple and quick to fit, even for large problems No formal distributional assumptions (non-parametric) Can handle highly non-linear interactions and classification boundaries Automatic variable selection Very easy to interpret if the tree is small Disadvantages Accuracy = current methods, such as support vector machines and ensemble classifiers often have 30% lower error rates than CART Instability = if we change the data a little, the tree picture can change a lot ABCruise, Helsinki 12/39

13 2. Bagging (Bootstrap AGGregatING) predictors ABCruise, Helsinki 13/39

14 Single Regression Tree ABCruise, Helsinki 14/39

15 10 Regression Trees ABCruise, Helsinki 15/39

16 Average of 100 Regression Trees ABCruise, Helsinki 16/39

17 Fit classification or regression models to bootstrap samples from the data and combine by voting (classification) or averaging (regression) Bagging reduces the variance of the base learner but has limited effect on the bias It s most effective if we use strong base learners that have very little bias but high variance (unstable), e.g. trees ABCruise, Helsinki 17/39

18 3. Random Forests Grow a forest of many trees Grow each tree on an independent bootstrap sample from the training data At each node: 1. Select m variables at random out of all M possible variables (independently for each node) 2. Find the best split on the selected m variables Grow the trees to maximum depth (classification) Vote/average the trees to get predictions for new data ABCruise, Helsinki 18/39

19 Improve on CART with respect to: Accuracy Random Forests is competitive with the best known machine learning methods Instability if we change the data a little, the individual trees may change but the forest is relatively stable because it is a combination of many trees A case in the training data is not in the bootstrap sample for about one third of the trees (we say the case is out-of-bag ). Vote (or average) the predictions of these trees to give the out-of-bag predictor ABCruise, Helsinki 19/39

20 RF handles thousands of predictors RF regression and classification methods were shown to be mostly insensitive both to strong correlations between predictors and to the presence of noisy variables. ABCruise, Helsinki 20/39

21 The One Dimension One Forest (ODOF) Methodology Parametric statistical model: {f(y; θ): y Y, θ Θ}, Y R n, Θ R p Prior distribution π(θ) Goal: estimating a quantity of interest ψ(y) R: posterior means, variances, quantiles or covariances Difficulty: the evaluation of f( ; θ) is not possible ABCruise, Helsinki 21/39

22 η : Y R k is an appropriate summary statistic Produce the Reference Table (RT) that will be used as learning dataset for some different RF methods: for t 1 N 1. Simulate θ (t) π(θ) 2. Simulate ỹ t = (ỹ 1,t,..., ỹ n,t ) f(y; θ (t) ) 3. Compute η(ỹ t ) = {η 1 (ỹ t ),..., η k (ỹ t )} ABCruise, Helsinki 22/39

23 1. Posterior Expectations θ = (θ 1,..., θ d ) R d Construct d regression RF, one per dimension: for dimension j response θ j predictors variables the summary statistics η(y) = {η 1 (y),..., η k (y)} If L b (η(y )) denotes the leaf of the b-th tree associated with η(y ), the leaf reached after following the path of binary choices given by this tree, there are L b response variables in that leaf E(θ j η(y )) = 1 B B b=1 1 L b (η(y )) t:η(y t ) L b (η(y )) θ (t) j ABCruise, Helsinki 23/39

24 2. Quantiles Meinshausen (2006) Quantile Regression Forests, JMLR E(θ j η(y )) = 1 B with w t (η(y )) = 1 B B b=1 B b=1 1 L b (η(y )) I Lb (η(y ))(η(y t )) L b (η(y )) t:η(y t ) L b (η(y )) Estimate the posterior cdf of θ j with θ (t) j = N t=1 w t (η(y ))θ (t) j F (u η(y )) = N t=1 w t (η(y ))I {θ (t) j u}. Posterior quantiles and hence credible intervals are then derived by inverting ˆF ABCruise, Helsinki 24/39

25 3. Variances While an approximation to Var(θ j η(y )) can be derived in a natural way from ˆF, we suggest using a slightly more involved version In a given tree b, some entries from the reference table are not exploited since this tree relies on a bootstrap subsample. These absent entries are called out-of-bag simulations and can be used to return an estimate of E{θ j η(y t )}, θ j (t), Apply the weights ω t (η(y )) to the out-of-bag residuals Var(θ j η(y )) = N ω t (η(y )) t=1 { (θ (t) j θ j (t) } 2 ABCruise, Helsinki 25/39

26 3. Covariances For Cov(θ j, θ l η(y )), we propose to construct a specific RF response: the product out-of-bag errors for θ j and θ l { θ (t) j θ } { } (t) j θ l,t predictors variables the summary statistics η(y) = {η 1 (y),..., η k (y)} θ (t) l ABCruise, Helsinki 26/39

27 Simulation study: a Gaussian toy example (y 1,..., y n ) θ 1, θ 2 iid N (θ 1, θ 2 ), n = 10 θ 1 θ 2 N (0, θ 2 ) and θ 2 IG(4, 3) θ 1 y T ( n + 8, (nȳ)/(n + 1), (s 2 + 6)/((n + 1)(n + 8)) ) θ 2 y IG { n/2 + 4, s 2 /2 + 3 } = straightforward to derive theoretical values such as ψ 1 (y) = E(θ 1 y), ψ 2 (y) = E(θ 2 y), ψ 3 (y) = Var(θ 1 y) and ψ 4 (y) = Var(θ 2 y) ABCruise, Helsinki 27/39

28 Reference table of N = 10, 000 replicates Independent test set of size N pred = 100 k = 53 summary statistics: the sample mean, the sample variance and the sample median absolute deviation, and 50 independent noisy variables (uniform [0,1]) ABCruise, Helsinki 28/39

29 ψ ~ ψ 1 ψ ~ ψ 3 ψ ~ ψ ~ ψ 2 ψ 4 Scatterplot of the theoretical values with their corresponding estimates ABCruise, Helsinki 29/39

30 Q ~ 0.025(θ 1 y) Q (θ 1 y) Q ~ 0.975(θ 1 y) Q (θ 2 y) Q ~ 0.975(θ 2 y) Q ~ 0.025(θ 2 y) Q (θ 1 y) Q (θ 2 y) Scatterplot of the theoretical values of 2.5% and 97.5% posterior quantiles for θ 1 and θ 2 with their corresponding estimates ABCruise, Helsinki 30/39

31 ODOF adj local linear adj ridge adj neural net ψ 1 (y) = E(θ 1 y) ψ 2 (y) = E(θ 2 y) ψ 3 (y) = Var(θ 1 y) ψ 4 (y) = Var(θ 2 y) Q (θ 1 y) Q (θ 2 y) Q (θ 1 y) Q (θ 2 y) Comparison of normalized mean absolute errors ABCruise, Helsinki 31/39

32 ~ Var (θ1 y) True ODOF loc linear ridge Neural net ~ Var (θ2 y) True ODOF loc linear ridge Neural net Boxplot comparison of Var(θ 1 y), Var(θ 2 y) with the true values, ODOF and usual ABC methods ABCruise, Helsinki 32/39

33 Simulation study: a regression toy example (y 1,..., y n ) β 1, β 2, σ 2 N n (Xβ, σ 2 I n ) X = [x 1 x 2 ] a n 2 design matrix, β = (β 1, β 2 ) and n = 100 β 1, β 2 σ 2 N 2 (0, nσ 2 (X X) 1 ) σ 2 IG(4, 3) = this conjugate ( model leads to closed-form posteriors n β 1, β 2 y T 2 n+1 (X X) 1 X y 3+y (Id X(X X) 1 X )y/2 4+n/2 ) n n+1 (X X) 1, 8 + n σ 2 y IG ( 4 + n 2, y (Id X(X X) 1 X )y ) ABCruise, Helsinki 33/39

34 Reference table of N = 10, 000 replicates Independent test set of size N pred = 100 k = 62 summary statistics: the maximum likelihood estimates of β 1, β 2, the residual sum of squares, the empirical covariance and correlation between y and x j, the sample mean, the sample variance, the sample median... and 50 independent noisy variables (uniform [0,1]) X chosen such that there is a significant posterior correlation between β 1 and β 2 ABCruise, Helsinki 34/39

35 ~ Var (β1 y) ~ Var ridge(β1 y) ~ Var neural(β1 y) Var(β 1 y) Var(β 1 y) Var(β 1 y) ~ Var (β2 y) ~ Var ridge(β2 y) ~ Var neural(β2 y) Var(β 2 y) Var(β 2 y) Var(β 2 y) Scatterplot of the theoretical values of posterior variances with their corresponding estimates ABCruise, Helsinki 35/39

36 ~ Var (σ 2 y) ~ 2 Var ridge(σ y) ~ 2 Var neural(σ y) Var(σ 2 y) Var(σ 2 y) Var(σ 2 y) Scatterplot of the theoretical values of posterior variances with their corresponding estimates ABCruise, Helsinki 36/39

37 ~ Cov (β1, β2 y) ~ Cov ridge(β1, β2 y) ~ Cov neural(β1, β2 y) Cov(β 1, β 2 y) Cov(β 1, β 2 y) Cov(β 1, β 2 y) Scatterplot of the theoretical values of posterior covariances between β 1 and β 2 with their corresponding estimates ABCruise, Helsinki 37/39

38 ODOF adj ridge adj neural net E(β 1 y) E(β 2 y) E(σ 2 y) Var(β 1 y) Var(β 2 y) Var(σ 2 y) Cov(β 1, β 2 y) Q (β 1 y) Q (β 1 y) Comparison of normalized mean absolute errors ABCruise, Helsinki 38/39

39 ~ Var (β 1 y) ~ Var (β 2 y) ~ Var (σ 2 y) True ODOF Neural net True ODOF Neural net True ODOF Neural net Boxplot comparison of Var(β 1 y), Var(β 2 y) and Var(σ 2 y) ABCruise, Helsinki 39/39

arxiv: v4 [stat.me] 14 Nov 2017

arxiv: v4 [stat.me] 14 Nov 2017 arxiv:1605.05537v4 [stat.me] 14 Nov 2017 Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A. 2017. ABC random forests for Bayesian parameter inference. arxiv 1605.05537v4, https://arxiv.org/pdf/1605.05537

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Ensemble Methods and Random Forests

Ensemble Methods and Random Forests Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Approximate Bayesian Computation

Approximate Bayesian Computation Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate

More information

Supplementary Information ABC random forests for Bayesian parameter inference

Supplementary Information ABC random forests for Bayesian parameter inference Supplementary Information ABC random forests for Bayesian parameter inference Louis Raynal 1, Jean-Michel Marin 1,2, Pierre Pudlo 3, Mathieu Ribatet 1, Christian P. Robert 4,5, and Arnaud Estoup 2,6 1

More information

REGRESSION TREE CREDIBILITY MODEL

REGRESSION TREE CREDIBILITY MODEL LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Ensembles of Classifiers.

Ensembles of Classifiers. Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Learning Ensembles. 293S T. Yang. UCSB, 2017.

Learning Ensembles. 293S T. Yang. UCSB, 2017. Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 1 2 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 2 An experimental bias variance analysis of SVM ensembles based on resampling

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Decision trees COMS 4771

Decision trees COMS 4771 Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m ) CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive

More information

Bias Correction in Classification Tree Construction ICML 2001

Bias Correction in Classification Tree Construction ICML 2001 Bias Correction in Classification Tree Construction ICML 21 Alin Dobra Johannes Gehrke Department of Computer Science Cornell University December 15, 21 Classification Tree Construction Outlook Temp. Humidity

More information

Lecture 13: Ensemble Methods

Lecture 13: Ensemble Methods Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap

More information

Fast Likelihood-Free Inference via Bayesian Optimization

Fast Likelihood-Free Inference via Bayesian Optimization Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology

More information

Neural Networks and Ensemble Methods for Classification

Neural Networks and Ensemble Methods for Classification Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated

More information

Statistics and learning: Big Data

Statistics and learning: Big Data Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics

WALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics 1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by

More information

Machine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3

Machine Learning. Nathalie Villa-Vialaneix -  Formation INRA, Niveau 3 Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Cross Validation & Ensembling

Cross Validation & Ensembling Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning

More information

arxiv: v5 [stat.me] 18 Apr 2016

arxiv: v5 [stat.me] 18 Apr 2016 Correlation and variable importance in random forests Baptiste Gregorutti 12, Bertrand Michel 2, Philippe Saint-Pierre 2 1 Safety Line 15 rue Jean-Baptiste Berlier, 75013 Paris, France arxiv:1310.5726v5

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Bagging and the Bayesian Bootstrap

Bagging and the Bayesian Bootstrap Bagging and the Bayesian Bootstrap erlise A. Clyde and Herbert K. H. Lee Institute of Statistics & Decision Sciences Duke University Durham, NC 27708 Abstract Bagging is a method of obtaining more robust

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

Variable importance measures in regression and classification methods

Variable importance measures in regression and classification methods MASTER THESIS Variable importance measures in regression and classification methods Institute for Statistics and Mathematics Vienna University of Economics and Business under the supervision of Univ.Prof.

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression

CSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Efficient Likelihood-Free Inference

Efficient Likelihood-Free Inference Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Gradient Boosting (Continued)

Gradient Boosting (Continued) Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

ABC methods for phase-type distributions with applications in insurance risk problems

ABC methods for phase-type distributions with applications in insurance risk problems ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Minwise hashing for large-scale regression and classification with sparse data

Minwise hashing for large-scale regression and classification with sparse data Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

1 Handling of Continuous Attributes in C4.5. Algorithm

1 Handling of Continuous Attributes in C4.5. Algorithm .. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline

More information

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting

Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Anne Philippe Laboratoire de Mathématiques Jean Leray Université de Nantes Workshop EDF-INRIA,

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Presentation in Convex Optimization

Presentation in Convex Optimization Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology

More information

University of Alberta

University of Alberta University of Alberta CLASSIFICATION IN THE MISSING DATA by Xin Zhang A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master

More information

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions

CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive

More information

Stat 502X Exam 2 Spring 2014

Stat 502X Exam 2 Spring 2014 Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part

More information

SF2930 Regression Analysis

SF2930 Regression Analysis SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression

More information

UVA CS 4501: Machine Learning

UVA CS 4501: Machine Learning UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

Making rating curves - the Bayesian approach

Making rating curves - the Bayesian approach Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information