ABC random forest for parameter estimation. Jean-Michel Marin
|
|
- Paulina Conley
- 6 years ago
- Views:
Transcription
1 ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint work with Pierre Pudlo, Louis Raynal, Mathieu Ribatet and Christian Robert ABCruise, Helsinki 1/39
2 Introduction We consider statistical models for which no explicit forms for the likelihood are available or that a single evaluation of the latter is too CPU demanding. = numerous heterogeneous latent variables (like with the coalescent model) or intractable normalizing constant of the likelihood (like with Gibbs random fields) We focus on Approximate Bayesian Computation methods. ABCruise, Helsinki 2/39
3 The principle of ABC is to conduct Bayesian inference on a dataset through comparisons with numerous simulated datasets. We assume that it is possible to generate realizations from the statistical model under consideration. It suffers from two major difficulties: to ensure reliability of the method, the number of simulations should be large, calibration has always been a critical step in ABC implementation. ABCruise, Helsinki 3/39
4 Idea: use regression or quantile Random Forests (RF) to estimate some quantities of interest: posterior expectations, variances, quantiles or covariances Why Random Forests? = RF regression and quantile methods were shown to be mostly insensitive both to strong correlations between predictors (here the summary statistics) and to the presence of noisy variables. = Using such a strategy less number of simulations and no calibration! ABCruise, Helsinki 4/39
5 Extend the work of Pudlo et al. (2016) to the case of parameters estimation: Pudlo et al. (JMM & CPR) (2016) Reliable ABC model choice via random forests, Bioinformatics Related methods: adjusted local linear: Beaumont et al. (2002) Approximate Bayesian computation in population genetics, Genetics ridge regression: Blum et al. (2013) A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation, Statistical Science adjusted neural networks: Blum and François (2010) Non-linear regression models for Approximate Bayesian Computation, Statistics and Computing ABCruise, Helsinki 5/39
6 Outline of the talk - Recap on random Forests 1. ClAssification and Regression Trees (CART) 2. Bootstrap AGGregatING 3. Random Forests - The ODOF methodology 1. Posterior Expectations 2. Quantiles 3. Variances 4. Covariances - Simulation study: a Gaussian toy example - Simulation study: a regression toy example ABCruise, Helsinki 6/39
7 Recap on Random Forest (inspired from Adele Cutler September 15-17, 2010 Ovronnaz, Switzerland) The work of Leo Breiman ( ) Breiman et al. (1984) Classification and Regression Trees, Wadsworth Statistics/Probability Breiman (1996) Bagging, Learning Machine Breiman (2001) Random Forests, Machine Learning ABCruise, Helsinki 7/39
8 1. Classification and Regression Trees (CART) Grow a binary tree At each node, split the data into two daughter nodes Splits are chosen using a splitting criterion For regression the predicted values at terminal nodes or leaves are the average response variable for all observations in the leaves For classification the predicted class is the most common class in the leaves (majority vote) ABCruise, Helsinki 8/39
9 Splitting criteria Regression: Residual Sum of Squares (y i y R ) 2 left (y i y L ) 2 + right where y L mean y-value for left node y R mean y-value for right node Classification: Gini criterion N L K k=1 p kl (1 p kl ) + N R K k=1 where p kl proportion of class k in left node p kr proportion of class k in right node p kr (1 p kr ) ABCruise, Helsinki 9/39
10 ABCruise, Helsinki 10/39
11 ABCruise, Helsinki 11/39
12 Advantages Computationally simple and quick to fit, even for large problems No formal distributional assumptions (non-parametric) Can handle highly non-linear interactions and classification boundaries Automatic variable selection Very easy to interpret if the tree is small Disadvantages Accuracy = current methods, such as support vector machines and ensemble classifiers often have 30% lower error rates than CART Instability = if we change the data a little, the tree picture can change a lot ABCruise, Helsinki 12/39
13 2. Bagging (Bootstrap AGGregatING) predictors ABCruise, Helsinki 13/39
14 Single Regression Tree ABCruise, Helsinki 14/39
15 10 Regression Trees ABCruise, Helsinki 15/39
16 Average of 100 Regression Trees ABCruise, Helsinki 16/39
17 Fit classification or regression models to bootstrap samples from the data and combine by voting (classification) or averaging (regression) Bagging reduces the variance of the base learner but has limited effect on the bias It s most effective if we use strong base learners that have very little bias but high variance (unstable), e.g. trees ABCruise, Helsinki 17/39
18 3. Random Forests Grow a forest of many trees Grow each tree on an independent bootstrap sample from the training data At each node: 1. Select m variables at random out of all M possible variables (independently for each node) 2. Find the best split on the selected m variables Grow the trees to maximum depth (classification) Vote/average the trees to get predictions for new data ABCruise, Helsinki 18/39
19 Improve on CART with respect to: Accuracy Random Forests is competitive with the best known machine learning methods Instability if we change the data a little, the individual trees may change but the forest is relatively stable because it is a combination of many trees A case in the training data is not in the bootstrap sample for about one third of the trees (we say the case is out-of-bag ). Vote (or average) the predictions of these trees to give the out-of-bag predictor ABCruise, Helsinki 19/39
20 RF handles thousands of predictors RF regression and classification methods were shown to be mostly insensitive both to strong correlations between predictors and to the presence of noisy variables. ABCruise, Helsinki 20/39
21 The One Dimension One Forest (ODOF) Methodology Parametric statistical model: {f(y; θ): y Y, θ Θ}, Y R n, Θ R p Prior distribution π(θ) Goal: estimating a quantity of interest ψ(y) R: posterior means, variances, quantiles or covariances Difficulty: the evaluation of f( ; θ) is not possible ABCruise, Helsinki 21/39
22 η : Y R k is an appropriate summary statistic Produce the Reference Table (RT) that will be used as learning dataset for some different RF methods: for t 1 N 1. Simulate θ (t) π(θ) 2. Simulate ỹ t = (ỹ 1,t,..., ỹ n,t ) f(y; θ (t) ) 3. Compute η(ỹ t ) = {η 1 (ỹ t ),..., η k (ỹ t )} ABCruise, Helsinki 22/39
23 1. Posterior Expectations θ = (θ 1,..., θ d ) R d Construct d regression RF, one per dimension: for dimension j response θ j predictors variables the summary statistics η(y) = {η 1 (y),..., η k (y)} If L b (η(y )) denotes the leaf of the b-th tree associated with η(y ), the leaf reached after following the path of binary choices given by this tree, there are L b response variables in that leaf E(θ j η(y )) = 1 B B b=1 1 L b (η(y )) t:η(y t ) L b (η(y )) θ (t) j ABCruise, Helsinki 23/39
24 2. Quantiles Meinshausen (2006) Quantile Regression Forests, JMLR E(θ j η(y )) = 1 B with w t (η(y )) = 1 B B b=1 B b=1 1 L b (η(y )) I Lb (η(y ))(η(y t )) L b (η(y )) t:η(y t ) L b (η(y )) Estimate the posterior cdf of θ j with θ (t) j = N t=1 w t (η(y ))θ (t) j F (u η(y )) = N t=1 w t (η(y ))I {θ (t) j u}. Posterior quantiles and hence credible intervals are then derived by inverting ˆF ABCruise, Helsinki 24/39
25 3. Variances While an approximation to Var(θ j η(y )) can be derived in a natural way from ˆF, we suggest using a slightly more involved version In a given tree b, some entries from the reference table are not exploited since this tree relies on a bootstrap subsample. These absent entries are called out-of-bag simulations and can be used to return an estimate of E{θ j η(y t )}, θ j (t), Apply the weights ω t (η(y )) to the out-of-bag residuals Var(θ j η(y )) = N ω t (η(y )) t=1 { (θ (t) j θ j (t) } 2 ABCruise, Helsinki 25/39
26 3. Covariances For Cov(θ j, θ l η(y )), we propose to construct a specific RF response: the product out-of-bag errors for θ j and θ l { θ (t) j θ } { } (t) j θ l,t predictors variables the summary statistics η(y) = {η 1 (y),..., η k (y)} θ (t) l ABCruise, Helsinki 26/39
27 Simulation study: a Gaussian toy example (y 1,..., y n ) θ 1, θ 2 iid N (θ 1, θ 2 ), n = 10 θ 1 θ 2 N (0, θ 2 ) and θ 2 IG(4, 3) θ 1 y T ( n + 8, (nȳ)/(n + 1), (s 2 + 6)/((n + 1)(n + 8)) ) θ 2 y IG { n/2 + 4, s 2 /2 + 3 } = straightforward to derive theoretical values such as ψ 1 (y) = E(θ 1 y), ψ 2 (y) = E(θ 2 y), ψ 3 (y) = Var(θ 1 y) and ψ 4 (y) = Var(θ 2 y) ABCruise, Helsinki 27/39
28 Reference table of N = 10, 000 replicates Independent test set of size N pred = 100 k = 53 summary statistics: the sample mean, the sample variance and the sample median absolute deviation, and 50 independent noisy variables (uniform [0,1]) ABCruise, Helsinki 28/39
29 ψ ~ ψ 1 ψ ~ ψ 3 ψ ~ ψ ~ ψ 2 ψ 4 Scatterplot of the theoretical values with their corresponding estimates ABCruise, Helsinki 29/39
30 Q ~ 0.025(θ 1 y) Q (θ 1 y) Q ~ 0.975(θ 1 y) Q (θ 2 y) Q ~ 0.975(θ 2 y) Q ~ 0.025(θ 2 y) Q (θ 1 y) Q (θ 2 y) Scatterplot of the theoretical values of 2.5% and 97.5% posterior quantiles for θ 1 and θ 2 with their corresponding estimates ABCruise, Helsinki 30/39
31 ODOF adj local linear adj ridge adj neural net ψ 1 (y) = E(θ 1 y) ψ 2 (y) = E(θ 2 y) ψ 3 (y) = Var(θ 1 y) ψ 4 (y) = Var(θ 2 y) Q (θ 1 y) Q (θ 2 y) Q (θ 1 y) Q (θ 2 y) Comparison of normalized mean absolute errors ABCruise, Helsinki 31/39
32 ~ Var (θ1 y) True ODOF loc linear ridge Neural net ~ Var (θ2 y) True ODOF loc linear ridge Neural net Boxplot comparison of Var(θ 1 y), Var(θ 2 y) with the true values, ODOF and usual ABC methods ABCruise, Helsinki 32/39
33 Simulation study: a regression toy example (y 1,..., y n ) β 1, β 2, σ 2 N n (Xβ, σ 2 I n ) X = [x 1 x 2 ] a n 2 design matrix, β = (β 1, β 2 ) and n = 100 β 1, β 2 σ 2 N 2 (0, nσ 2 (X X) 1 ) σ 2 IG(4, 3) = this conjugate ( model leads to closed-form posteriors n β 1, β 2 y T 2 n+1 (X X) 1 X y 3+y (Id X(X X) 1 X )y/2 4+n/2 ) n n+1 (X X) 1, 8 + n σ 2 y IG ( 4 + n 2, y (Id X(X X) 1 X )y ) ABCruise, Helsinki 33/39
34 Reference table of N = 10, 000 replicates Independent test set of size N pred = 100 k = 62 summary statistics: the maximum likelihood estimates of β 1, β 2, the residual sum of squares, the empirical covariance and correlation between y and x j, the sample mean, the sample variance, the sample median... and 50 independent noisy variables (uniform [0,1]) X chosen such that there is a significant posterior correlation between β 1 and β 2 ABCruise, Helsinki 34/39
35 ~ Var (β1 y) ~ Var ridge(β1 y) ~ Var neural(β1 y) Var(β 1 y) Var(β 1 y) Var(β 1 y) ~ Var (β2 y) ~ Var ridge(β2 y) ~ Var neural(β2 y) Var(β 2 y) Var(β 2 y) Var(β 2 y) Scatterplot of the theoretical values of posterior variances with their corresponding estimates ABCruise, Helsinki 35/39
36 ~ Var (σ 2 y) ~ 2 Var ridge(σ y) ~ 2 Var neural(σ y) Var(σ 2 y) Var(σ 2 y) Var(σ 2 y) Scatterplot of the theoretical values of posterior variances with their corresponding estimates ABCruise, Helsinki 36/39
37 ~ Cov (β1, β2 y) ~ Cov ridge(β1, β2 y) ~ Cov neural(β1, β2 y) Cov(β 1, β 2 y) Cov(β 1, β 2 y) Cov(β 1, β 2 y) Scatterplot of the theoretical values of posterior covariances between β 1 and β 2 with their corresponding estimates ABCruise, Helsinki 37/39
38 ODOF adj ridge adj neural net E(β 1 y) E(β 2 y) E(σ 2 y) Var(β 1 y) Var(β 2 y) Var(σ 2 y) Cov(β 1, β 2 y) Q (β 1 y) Q (β 1 y) Comparison of normalized mean absolute errors ABCruise, Helsinki 38/39
39 ~ Var (β 1 y) ~ Var (β 2 y) ~ Var (σ 2 y) True ODOF Neural net True ODOF Neural net True ODOF Neural net Boxplot comparison of Var(β 1 y), Var(β 2 y) and Var(σ 2 y) ABCruise, Helsinki 39/39
arxiv: v4 [stat.me] 14 Nov 2017
arxiv:1605.05537v4 [stat.me] 14 Nov 2017 Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A. 2017. ABC random forests for Bayesian parameter inference. arxiv 1605.05537v4, https://arxiv.org/pdf/1605.05537
More informationBAGGING PREDICTORS AND RANDOM FOREST
BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS
More informationTutorial on Approximate Bayesian Computation
Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016
More informationEnsemble Methods and Random Forests
Ensemble Methods and Random Forests Vaishnavi S May 2017 1 Introduction We have seen various analysis for classification and regression in the course. One of the common methods to reduce the generalization
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationLossless Online Bayesian Bagging
Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationSupplementary Information ABC random forests for Bayesian parameter inference
Supplementary Information ABC random forests for Bayesian parameter inference Louis Raynal 1, Jean-Michel Marin 1,2, Pierre Pudlo 3, Mathieu Ribatet 1, Christian P. Robert 4,5, and Arnaud Estoup 2,6 1
More informationREGRESSION TREE CREDIBILITY MODEL
LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationVariance Reduction and Ensemble Methods
Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis
More informationRandom Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.
Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average
More informationEnsembles of Classifiers.
Ensembles of Classifiers www.biostat.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts ensemble bootstrap sample bagging boosting random forests error correcting
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationLearning Ensembles. 293S T. Yang. UCSB, 2017.
Learning Ensembles 293S T. Yang. UCSB, 2017. Outlines Learning Assembles Random Forest Adaboost Training data: Restaurant example Examples described by attribute values (Boolean, discrete, continuous)
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationIEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1
IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 1 2 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 2 An experimental bias variance analysis of SVM ensembles based on resampling
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDecision trees COMS 4771
Decision trees COMS 4771 1. Prediction functions (again) Learning prediction functions IID model for supervised learning: (X 1, Y 1),..., (X n, Y n), (X, Y ) are iid random pairs (i.e., labeled examples).
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationBayesian Additive Regression Tree (BART) with application to controlled trail data analysis
Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions. f(x) = c m 1(x R m )
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions with R 1,..., R m R p disjoint. f(x) = M c m 1(x R m ) m=1 The CART algorithm is a heuristic, adaptive
More informationBias Correction in Classification Tree Construction ICML 2001
Bias Correction in Classification Tree Construction ICML 21 Alin Dobra Johannes Gehrke Department of Computer Science Cornell University December 15, 21 Classification Tree Construction Outlook Temp. Humidity
More informationLecture 13: Ensemble Methods
Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap
More informationFast Likelihood-Free Inference via Bayesian Optimization
Fast Likelihood-Free Inference via Bayesian Optimization Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology
More informationNeural Networks and Ensemble Methods for Classification
Neural Networks and Ensemble Methods for Classification NEURAL NETWORKS 2 Neural Networks A neural network is a set of connected input/output units (neurons) where each connection has a weight associated
More informationStatistics and learning: Big Data
Statistics and learning: Big Data Learning Decision Trees and an Introduction to Boosting Sébastien Gadat Toulouse School of Economics February 2017 S. Gadat (TSE) SAD 2013 1 / 30 Keywords Decision trees
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationWALD LECTURE II LOOKING INSIDE THE BLACK BOX. Leo Breiman UCB Statistics
1 WALD LECTURE II LOOKING INSIDE THE BLACK BOX Leo Breiman UCB Statistics leo@stat.berkeley.edu ORIGIN OF BLACK BOXES 2 Statistics uses data to explore problems. Think of the data as being generated by
More informationMachine Learning. Nathalie Villa-Vialaneix - Formation INRA, Niveau 3
Machine Learning Nathalie Villa-Vialaneix - nathalie.villa@univ-paris1.fr http://www.nathalievilla.org IUT STID (Carcassonne) & SAMM (Université Paris 1) Formation INRA, Niveau 3 Formation INRA (Niveau
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationarxiv: v5 [stat.me] 18 Apr 2016
Correlation and variable importance in random forests Baptiste Gregorutti 12, Bertrand Michel 2, Philippe Saint-Pierre 2 1 Safety Line 15 rue Jean-Baptiste Berlier, 75013 Paris, France arxiv:1310.5726v5
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationImportance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University
Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationRelated Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM
Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationBagging and the Bayesian Bootstrap
Bagging and the Bayesian Bootstrap erlise A. Clyde and Herbert K. H. Lee Institute of Statistics & Decision Sciences Duke University Durham, NC 27708 Abstract Bagging is a method of obtaining more robust
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationVariable importance measures in regression and classification methods
MASTER THESIS Variable importance measures in regression and classification methods Institute for Statistics and Mathematics Vienna University of Economics and Business under the supervision of Univ.Prof.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationAdaptive Crowdsourcing via EM with Prior
Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationStatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech
StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationCSC2515 Winter 2015 Introduction to Machine Learning. Lecture 2: Linear regression
CSC2515 Winter 2015 Introduction to Machine Learning Lecture 2: Linear regression All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/csc2515_winter15.html
More informationEnsemble Methods for Machine Learning
Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationEfficient Likelihood-Free Inference
Efficient Likelihood-Free Inference Michael Gutmann http://homepages.inf.ed.ac.uk/mgutmann Institute for Adaptive and Neural Computation School of Informatics, University of Edinburgh 8th November 2017
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationABC methods for phase-type distributions with applications in insurance risk problems
ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationPDEEC Machine Learning 2016/17
PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /
More informationMinwise hashing for large-scale regression and classification with sparse data
Minwise hashing for large-scale regression and classification with sparse data Nicolai Meinshausen (Seminar für Statistik, ETH Zürich) joint work with Rajen Shah (Statslab, University of Cambridge) Simons
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More information1 Handling of Continuous Attributes in C4.5. Algorithm
.. Spring 2009 CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Potpourri Contents 1. C4.5. and continuous attributes: incorporating continuous
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 7 Unsupervised Learning Statistical Perspective Probability Models Discrete & Continuous: Gaussian, Bernoulli, Multinomial Maimum Likelihood Logistic
More informationTDT4173 Machine Learning
TDT4173 Machine Learning Lecture 9 Learning Classifiers: Bagging & Boosting Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline
More informationConstruction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting
Construction of an Informative Hierarchical Prior Distribution: Application to Electricity Load Forecasting Anne Philippe Laboratoire de Mathématiques Jean Leray Université de Nantes Workshop EDF-INRIA,
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationMachine Learning! in just a few minutes. Jan Peters Gerhard Neumann
Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often
More informationPresentation in Convex Optimization
Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology
More informationUniversity of Alberta
University of Alberta CLASSIFICATION IN THE MISSING DATA by Xin Zhang A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Master
More informationCART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions
CART Classification and Regression Trees Trees can be viewed as basis expansions of simple functions f (x) = M c m 1(x R m ) m=1 with R 1,..., R m R p disjoint. The CART algorithm is a heuristic, adaptive
More informationStat 502X Exam 2 Spring 2014
Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part
More informationSF2930 Regression Analysis
SF2930 Regression Analysis Alexandre Chotard Tree-based regression and classication 20 February 2017 1 / 30 Idag Overview Regression trees Pruning Bagging, random forests 2 / 30 Today Overview Regression
More informationUVA CS 4501: Machine Learning
UVA CS 4501: Machine Learning Lecture 21: Decision Tree / Random Forest / Ensemble Dr. Yanjun Qi University of Virginia Department of Computer Science Where are we? è Five major sections of this course
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationMaking rating curves - the Bayesian approach
Making rating curves - the Bayesian approach Rating curves what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationIntroduction to Simple Linear Regression
Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department
More information