Fully Nonparametric Bayesian Additive Regression Trees

Size: px
Start display at page:

Download "Fully Nonparametric Bayesian Additive Regression Trees"

Transcription

1 Fully Nonparametric Bayesian Additive Regression Trees Ed George, Prakash Laud, Brent Logan, Robert McCulloch, Rodney Sparapani Ed: Wharton, U Penn Prakash, Brent, Rodney: Medical College of Wisconsin Rob: Arizona State

2 1. Basic BART Ideas 2. A Simple Simulated Example 3. Out of Sample Prediction 4. The BART Model and Prior 5. BART MCMC 6. Fully Nonparametric BART 7. Simulated Examples 8. Real Data 9. More on DPM 10. BART Papers

3 1. Basic BART Ideas BART stands for Bayesian Additive Regression Trees. The original BART model (Chipman, George, and McCulloch) is: Y i = f (x i ) + ɛ i, ɛ i N(0, σ 2 ), iid. where the function f is represented as the sum of many regression trees. 1

4 BART was inspired by the Boosting literature, in particular the work of Jerry Friedman. The connection to boosting is obvious in that the model is based on a sum of trees. However, BART is a fundamentally different algorithm with some consequent pros and cons. 2

5 BART is a Bayesian MCMC procdure. We: put a prior on the model parameters (f, σ). run a Markov chain with state (f, σ) such that stationary distribution is the posterior (f, σ) D = {x i, y i } n i=1. Examine the draws as a repesentation of the full posterior. In particular, we can look at marginals of σ and f (x) at any given x. 3

6 Note: At the d th MCMC iteration we have {f d, σ d }. We will look at the sequence of draws σ d. We can t just look at the f d draws!! For any x we can look at {f d (x)}. For example, ˆf (x) could be the average of the numbers {f d (x)} which is our MCMC estimate of the posterior mean of the random variable f (x) D. 4

7 2. A Simple Simulated Example Simulate data from the model: Y i = x 3 i + ɛ i ɛ i N(0, σ 2 ) iid n = 100 sigma =.1 f = function(x) {x^3} set.seed(14) x = sort(2*runif(n)-1) y = f(x) + sigma*rnorm(n) xtest = seq(-.95,.95,length.out=20) Here, xtest will be the out of sample x values at which we wish to infer f or make predictions. 5

8 plot(x,y) points(xtest,rep(0,length(xtest)),col="red",pch=16) x y Red is xtest. 6

9 library(bart) rb = wbart(x,y,xtest) length(xtest) [1] 20 dim(rb$yhat.test) [1] The (d, j) element of yhat.test is f d evaluated at the j th value of xtest. 1,000 draws of f, each of which is evaluated at 20 xtest values. 7

10 plot(x,y) lines(xtest,xtest^3,col="blue") lines(xtest,apply(rb$yhat.test,2,mean),col="red") qm = apply(rb$yhat.test,2,quantile,probs=c(.025,.975)) lines(xtest,qm[1,],col="red",lty=2) lines(xtest,qm[2,],col="red",lty=2) x y 8

11 n=5, y x 9

12 {σ d } draws. There are 100 draws counted as burn-in + 1,000 additional draws. In all our previous f (x) inference, we dropped the first 100 iterations Index rb$sigma You can see that it looks burned in after

13 3. Out of Sample Prediction Did out of sample predictive comparisons on 42 data sets. (thanks to Wei-Yin Loh!!) p=3 65, n = 100 7, 000. for each data set 20 random splits into 5/6 train and 1/6 test use 5-fold cross-validation on train to pick hyperparameters (except BART-default!) gives 20*42 = 840 out-of-sample predictions, for each prediction, divide rmse of different methods by the smallest + each boxplots represents 840 predictions for a method means you are 20% worse than the best + BART-cv best + BART-default (use default prior) does amazingly well!! Rondom Forests Neural Net Boosting BART cv BART default

14 4. The BART Model and Prior Regression Trees: First, we review regression trees to set the notation for BART. Note however, that even in the simple regression tree case, our Bayesian approach is very different from the usual CART type approach. The model with have parameters and corresponding priors. 12

15 Regression Tree: Let T denote the tree structure including the decision rules. Let M = {µ 1, µ 2,..., µ b } denote the set of bottom node µ s. Let g(x; θ), θ = (T, M) be a regression tree function that assigns a µ value to x. x 5 < c x 2 < d x 2 % d µ 1 = -2 µ 2 = 5 x 5 % c µ 3 = 7 A single tree model: y = g(x; θ) + ɛ. 13

16 A coordinate view of g(x; θ) x 5 < c x 5 % c µ 3 = 7 x 5 c µ 3 = 7 x 2 < d x 2 % d µ 1 = -2 µ 2 = 5 µ 1 = -2 µ 2 = 5 d x 2 Easy to see that g(x; θ) is just a step function. 14

17 Here is an example of a simple tree with one x fit using standard CART methdology. 15

18 Here is an example with 2 x variables. 16

19 And here is the corresponding function (our g). 17

20 What s Boosting??? For Numeric y: (i) Set ˆf (x) = 0. r i = y i for all i in the training set. (ii) for b = 1, 2,... B, repeat: Fit a tree ˆf b with d splits (d + 1 terminal nodes) to the training data (X, r). Update ˆf by adding in a shrunken version of the new tree: ˆf (x) ˆf (x) + λ ˆf b (x). Update the residuls: r i r i λ ˆf b (x). (iii) Output the boosted model: ˆf (x) = B λ ˆf b (x). i=1 An Introduction to Statistical Learning, James, Witten, Hastie, Tibshirani. 18

21 .. it is rather amazing that an ensemble of trees leads to the state of the art in black-box predictors! Bradley Efron and Trevor Hastie, Computer Age Statistical Inference, chapter 17,

22 The BART Model Y = g(x;t 1,M 1 ) + g(x;t 2,M 2 ) g(x;t m,m m ) +! z, z ~ N(0,1) µ 1 µ 4 µ 2 µ 3 m = 200, 1000,..., big,.... f (x ) is the sum of all the corresponding µ s at each bottom node. Such a model combines additive and interaction effects. All parameters but σ are unidentified!!!! 20

23 ...the connection to Boosting is obvious... But,.. Rather than simply adding in fit in an iterative scheme, we will explicitly specify a prior on the model which directly impacts the performance. We will have an MCMC which infers each tree model in the sum. In particular, the depth of each tree is inferred. 21

24 Complete the Model with a Regularization Prior π(θ) = π((t 1, M 1 ), (T 2, M 2 ),..., (T m, M m ), σ). Have to specify: m π(θ) = π(σ) π(t j ) π(m j T j ) j=1 π(σ) π(t ) π(m T ) 22

25 π wants: Each T small. Each µ small. nice σ (smaller than least squares estimate). We refer to π as a regularization prior because it restrains the overall fit. In addition, it keeps the contribution of each g(x; T i, M i ) model component small. 23

26 Prior on T We specify a process we can use to draw a tree from the prior. The probability a current bottom node, at depth d, gives birth to a left and right child is The usual BART defaults are α (1 + d) β α = base =.95, β = power = 2. This makes non-null but small trees likely. nbottom Splitting variables and cutpoints are drawn uniformly from the set of available ones. 24

27 Prior on M Let θ denote all the parameters. f (x θ) = µ 1 + µ 2 + µ m. where µ i, is the µ in the bottom node x falls to in the i th tree. Let µ i N(0, τ 2 ), iid. f (x θ) N(0, m τ 2 ). In practice we often, unabashadly, use the data by first centering and then choosing τ so that f (x θ) (y min, y max ), with high probability. This gives: τ 1 m. 25

28 Prior on σ Default: ν = 3. σ 2 ν λ χ 2 ν λ: Get a reasonable estimate of ˆσ of sigma then choose λ to put ˆσ at a specified quantile of the σ prior. Default: quantile =.9 Default: if p < n, ˆσ is the usual least squares estimate, else sd(y). 26

29 Solid blue line at ˆσ sigma Conjecture: Most failures of BART are due to this default. 27

30 5. BART MCMC Y = g(x;t 1,M 1 ) + g(x;t 2,M 2 ) g(x;t m,m m ) +! z, z ~ N(0,1) µ 1 µ 4 µ 2 µ 3 First, it is a simple Gibbs sampler: (T i, M i ) (T 1, M 1,..., T i 1, M i 1, T i+1, M i+1,..., T m, M m, σ) σ (T 1, M 1,...,..., T m, M m ) To draw σ we subtract the trees off to get the ɛ i = y i f (x i ). To draw (T i, M i ) we subtract the contributions of the other trees from both sides to get a simple one-tree model. We integrate out M to draw T and then draw M T. 28

31 To draw T we use a Metropolis-Hastings with Gibbs step. We use various moves, but the key is a birth-death step. such as? => propose a more complex tree? => propose a simpler tree 29

32 Y = g(x;t 1,M 1 ) g(x;t m,m m ) + & z plus #((T 1,M 1 ),...(T m,m m ),&) Connections to Other Modeling Ideas: Bayesian Nonparametrics: - Lots of parameters to make model flexible. - A strong prior to shrink towards a simple structure. - BART shrinks towards additive models with some interaction. Dynamic Random Basis: - g(x; T 1, M 1 ), g(x; T 2, M 2 ),..., g(x; T m, M m ) are dimensionally adaptive. Gradient Boosting: - Overall fit becomes the cumulative effort of many weak learners. 30

33 Why does it work??? Build up the fit, by adding up tiny bits of fit.. Boosting: Freund and Schapire, Jerome Friedman 31

34 Note: I really want to be able to pick a (data based) default prior so I can put out my R package and people can get good results without too much effort. Contrast this with Deep Neural Nets, which are hard to fit. But, you can pretty easily put choose a prior for f (x) and σ!!! Constrast this with Deep Neural Nets, where it is very hard to think about the prior. 32

35 6. Fully Nonparametric BART BART Y i = f (x i ) + ɛ i, ɛ i N(0, σ 2 ). where f is a sum of trees. normal errors are embarrassing. prior on σ is flawed. normal errors may lead to influential observations and poorly calibrated predictive intervals. 33

36 Obvious Solution: Use DPM (Dirichlet Process Mixtures) in the classic Escobar and West manner to model the errors non parametrically. Tried this in the past with mixed success. The DPM stuff is tricky....not all obvious that you can get away with flexible f and flexible errors!!! The Goal: Goes in the R-package so people can use it with automatic priors and reliably get sensible results. 34

37 The MCW crowd (Prakash is a long-time nonparametric Bayesian) have a lot of experience with DPM. Prakash has recent work on choosing priors for DPM: Low Information Omnibus (LIO) Priors for Dirichlet Process Mixture Models (Yushu Shi, Michael Martens, Anjishnu Banerjee, and Purushottam Laud) Cautiously optimistic that we have a scheme that is close to working. 35

38 DPMBART Y i = f (x i ) + µ i + σ i Z i, Z i N(0, 1). each observation gets to have its own (µ i, σ i ). But, the DPM machinery allows us to uncover a set of (µ j, σ j ), j = 1, 2,..., I such that each for each i, (µ i, σ i ) = (µ j, σ j ), for some j. In our real example, n = 1, 479, I 100. Even though each observation can have it s own (µ i, σ i ), subsets of the obserations have the same (µ, σ) so that there is a relatively small number of unique values. 36

39 Markov Chain Monte Carlo (MCMC): {µ i, σ i } f, f {µ i, σ i } At each draw d we have f d, {(µ d i, σ d i )}, i = 1, 2,..., n where at each draw, many of the (µ, σ) pairs are repeats. For example, ˆf (x) = 1 D D f d (x) d=1 37

40 Connection to Mixture of Normals At each draw d we have Let be the unique (µ, σ) pairs. Let f, {(µ i, σ i )}, i = 1, 2,..., n {(µ j, σ j )}, j = 1, 2,..., I p j = ] # [(µ i, σ i ) = (µ j, σ j ) n Then ɛ I j=1 p j N(µ j, (σ j ) 2 ) 38

41 7. Simulated Examples Simulated data with t 20 (essentially normal) errors. n = x y top 5% mu_i top 5% sigma_i dpmbart fhat bart fhat 39

42 draw # alpha alpha draws draw # number unique draws of number unique (mu,sigma) E(mu) abs(error) E(mu) vs. abs(y fx) E(mu) E(sigma) E(mu) vs E(sigma) 40

43 Inference for the error distribution: dpmbart error distribution inference dpm error distribution inference from true errors t density t density dpmbart pointwise 95% intervals dpm (true errors) pointwise 95% intervals error error dpm,dpmbart, bart error distribution inference (dpm from true errors) dpmbart and density smooths of true errors t density t density dpm (true errors) dpmbart bart dpmbart density (true errors): adjust=.5 density (true errors): adjust= error error 41

44 Simulated data with t 3 errors x y top 5% mu_i top 5% sigma_i dpmbart fhat bart fhat 42

45 draw # alpha alpha draws draw # number unique draws of number unique (mu,sigma) E(mu) abs(error) E(mu) vs. abs(y fx) E(mu) E(sigma) E(mu) vs E(sigma) 43

46 Inference for the error distribution: dpmbart error distribution inference dpm error distribution inference from true errors t density dpmbart pointwise 95% intervals t density dpm (true errors) pointwise 95% intervals error error dpm,dpmbart, bart error distribution inference (dpm from true errors) dpmbart and density smooths of true errors t density dpm (true errors) dpmbart bart t density dpmbart density (true errors): adjust=.5 density (true errors): adjust= error error 44

47 Three basic examples: t20, t3, skewed. If the error is close to normal, then dpmbart is close to bart. If the error in nonnormal, dpmbart is much closer to the truth, but shrunk a bit towards bart. In these examples, ˆf for dpmbart and bart are pretty much the same but with lower signal/sample sizes this does not have to be the case. 45

48 8. Real Data Using one month of a much larger data set I am working on. y: return on cross-section of firms x: things about the firm measured the previous month. y: Index ym 46

49 Multiple regression results: Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) ** r1_ e-07 *** r12_ e-05 *** idiosyncraticvol seasonality industrymom ln_turn *** me ** an_cbprofitability Signif. codes: 0 *** ** 0.01 * Residual standard error: on 1470 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 8 and 1470 DF, p-value: 7.626e-15 It s like looking for a needle in a haystack!!! 47

50 Compare the ˆf : linear, bart, dpmbart: linear bart dpmbart dpmbart a little different from bart because it is not pulled around by the outliers??? 48

51 Note: the errors are now the errors from the multiple regression since we don t have the y f (x) we had for the simulated data draw # alpha alpha draws draw # number unique draws of number unique (mu,sigma) E(mu) abs(lm error) E(mu) vs. abs(lm error) E(mu) E(sigma) E(mu) vs E(sigma) 49

52 dpmbart error distribution inference dpmbart pointwise 95% intervals bart error 50

53 9. More on DPM 51

54 Prior on α: Used construction of Conley, Hanson, McCulloch, and Rossi. discrete distribution for α. you get to pick (I min, I max ) range for number of unique θ values. Default was I min = 1, I max.1n. In our examples, draws of α bumped up against upper limit. This could be good in that we want the prior conservative. 52

55 (µ, τ): τ: For τ = 1/σ 2 we used an approach similar to the BART default, but we tighten up up a bit. σ 2 νλ χ 2, ν = 2α o, λ = β o /α o. ν bart: ν = 3, dpmbart: ν = 10. bart: choose λ to put ˆσ at quantile =.9, dpmbart: quantile =.95. The bart default gets ˆσ from the multiple regression. 53

56 µ: µ λ ko t ν. let e i be the residuals from the multiple regression. Let k s be scaling for the µ marginal. Let k o solve: max e i = k s λ ko. Default: k s =

57 Comments: You can t be too diffuse on the base measure. Would prefer not to extend the hierarchy and put priors on the base hyperparameters (a common practice). BART default depends on the standard deviation of the regression residuals, DPMBART depends on the sd of the resids and the overall scale of the resids. k s = 10 may seem large, you don t have to cover the residual range, as µ get s bigger, σ gets bigger and you can t be too spread out. We would be happy to keep the dpm prior somewhat conservative in that we nail the normal error case but miss slightly on the non-normal cases: DO NO HARM. 55

58 10. BART Papers Log-Linear Bayesian Additive Regression Trees for Categorical and Count Responses, Jared Murray Bayesian regression trees for high-dimensional prediction and variable selection, Tony Linero Posterior Concentration for Bayesian Regression Trees and Their Ensembles, Rockova and van der Pas Nonparametric survival analysis using Bayesian Additive Regression Trees (BART), Rodney Sparapani and Brent Logan and Robert McCulloch and P. Laud Accelerated Bayesian Additive Regression Trees. He, Jingyu, Saar Yalov, and P. R. Hahn Heteroscedastic BART via Multiplicative Regression Trees}, {M.~T.~Pratola and H.~A.~Chipman and E.~I.~George and R.~Mc{C}ulloch}, Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects, Hahn, P Richard and Murray, Jared S and Carvalho, Carlos M High-dimensional nonparametric monotone function estimation using BART, H.~A.~Chipman and E.~George and R.~McCulloch and T.~S.~Shively 56

BART: Bayesian additive regression trees

BART: Bayesian additive regression trees BART: Bayesian additive regression trees Hedibert F. Lopes & Paulo Marques Insper Institute of Education and Research São Paulo, Brazil Most of the notes were kindly provided by Rob McCulloch (Arizona

More information

Bayesian Ensemble Learning

Bayesian Ensemble Learning Bayesian Ensemble Learning Hugh A. Chipman Department of Mathematics and Statistics Acadia University Wolfville, NS, Canada Edward I. George Department of Statistics The Wharton School University of Pennsylvania

More information

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity

Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis

Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Bayesian Additive Regression Tree (BART) with application to controlled trail data analysis Weilan Yang wyang@stat.wisc.edu May. 2015 1 / 20 Background CATE i = E(Y i (Z 1 ) Y i (Z 0 ) X i ) 2 / 20 Background

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects P. Richard Hahn, Jared Murray, and Carlos Carvalho July 29, 2018 Regularization induced

More information

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach Zvi Gilula (Hebrew University) Robert McCulloch (Arizona State) Ya acov Ritov (University of Michigan)

More information

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13 Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Heteroscedastic BART Via Multiplicative Regression Trees

Heteroscedastic BART Via Multiplicative Regression Trees Heteroscedastic BART Via Multiplicative Regression Trees M. T. Pratola, H. A. Chipman, E. I. George, and R. E. McCulloch July 11, 2018 arxiv:1709.07542v2 [stat.me] 9 Jul 2018 Abstract BART (Bayesian Additive

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Transportation Research Part B

Transportation Research Part B Transportation Research Part B 44 (2010) 686 698 Contents lists available at ScienceDirect Transportation Research Part B journal homepage: www.elsevier.com/locate/trb Bayesian flexible modeling of trip

More information

Bayesian Classification and Regression Trees

Bayesian Classification and Regression Trees Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University

Importance Sampling: An Alternative View of Ensemble Learning. Jerome H. Friedman Bogdan Popescu Stanford University Importance Sampling: An Alternative View of Ensemble Learning Jerome H. Friedman Bogdan Popescu Stanford University 1 PREDICTIVE LEARNING Given data: {z i } N 1 = {y i, x i } N 1 q(z) y = output or response

More information

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo Andrew Gordon Wilson www.cs.cmu.edu/~andrewgw Carnegie Mellon University March 18, 2015 1 / 45 Resources and Attribution Image credits,

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Bayesian Causal Forests

Bayesian Causal Forests Bayesian Causal Forests for estimating personalized treatment effects P. Richard Hahn, Jared Murray, and Carlos Carvalho December 8, 2016 Overview We consider observational data, assuming conditional unconfoundedness,

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

Ensemble Methods for Machine Learning

Ensemble Methods for Machine Learning Ensemble Methods for Machine Learning COMBINING CLASSIFIERS: ENSEMBLE APPROACHES Common Ensemble classifiers Bagging/Random Forests Bucket of models Stacking Boosting Ensemble classifiers we ve studied

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

TREE ENSEMBLES WITH RULE STRUCTURED HORSESHOE REGULARIZATION

TREE ENSEMBLES WITH RULE STRUCTURED HORSESHOE REGULARIZATION Submitted to the Annals of Applied Statistics arxiv: arxiv:1702.05008 TREE ENSEMBLES WITH RULE STRUCTURED HORSESHOE REGULARIZATION BY MALTE NALENZ AND MATTIAS VILLANI German Research Center for Environmental

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning

Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Variable Selection and Sensitivity Analysis via Dynamic Trees with an application to Computer Code Performance Tuning Robert B. Gramacy University of Chicago Booth School of Business faculty.chicagobooth.edu/robert.gramacy

More information

1 Probabilities. 1.1 Basics 1 PROBABILITIES

1 Probabilities. 1.1 Basics 1 PROBABILITIES 1 PROBABILITIES 1 Probabilities Probability is a tricky word usually meaning the likelyhood of something occuring or how frequent something is. Obviously, if something happens frequently, then its probability

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

BART: Bayesian Additive Regression Trees

BART: Bayesian Additive Regression Trees University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2010 BART: Bayesian Additive Regression Trees Hugh A. Chipman Edward I. George University of Pennsylvania Robert E.

More information

STAT 518 Intro Student Presentation

STAT 518 Intro Student Presentation STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible

More information

ABC random forest for parameter estimation. Jean-Michel Marin

ABC random forest for parameter estimation. Jean-Michel Marin ABC random forest for parameter estimation Jean-Michel Marin Université de Montpellier Institut Montpelliérain Alexander Grothendieck (IMAG) Institut de Biologie Computationnelle (IBC) Labex Numev! joint

More information

Bayesian linear regression

Bayesian linear regression Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Hierarchical Linear Models

Hierarchical Linear Models Hierarchical Linear Models Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin The linear regression model Hierarchical Linear Models y N(Xβ, Σ y ) β σ 2 p(β σ 2 ) σ 2 p(σ 2 ) can be extended

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Bayesian Estimation with Sparse Grids

Bayesian Estimation with Sparse Grids Bayesian Estimation with Sparse Grids Kenneth L. Judd and Thomas M. Mertens Institute on Computational Economics August 7, 27 / 48 Outline Introduction 2 Sparse grids Construction Integration with sparse

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Markov chain Monte Carlo (MCMC) Gibbs and Metropolis Hastings Slice sampling Practical details Iain Murray http://iainmurray.net/ Reminder Need to sample large, non-standard distributions:

More information

Markov Chain Monte Carlo, Numerical Integration

Markov Chain Monte Carlo, Numerical Integration Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical

More information

Bayesian Graphical Models

Bayesian Graphical Models Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference

More information

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models

Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Individualized Treatment Effects with Censored Data via Nonparametric Accelerated Failure Time Models Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University July 31, 2018

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Statistical Prediction

Statistical Prediction Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent

More information

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic

More information

Monte Carlo Inference Methods

Monte Carlo Inference Methods Monte Carlo Inference Methods Iain Murray University of Edinburgh http://iainmurray.net Monte Carlo and Insomnia Enrico Fermi (1901 1954) took great delight in astonishing his colleagues with his remarkably

More information

CS 188: Artificial Intelligence. Bayes Nets

CS 188: Artificial Intelligence. Bayes Nets CS 188: Artificial Intelligence Probabilistic Inference: Enumeration, Variable Elimination, Sampling Pieter Abbeel UC Berkeley Many slides over this course adapted from Dan Klein, Stuart Russell, Andrew

More information

Nonparametric Bayes Uncertainty Quantification

Nonparametric Bayes Uncertainty Quantification Nonparametric Bayes Uncertainty Quantification David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & ONR Review of Bayes Intro to Nonparametric Bayes

More information

Chapter 6. Ensemble Methods

Chapter 6. Ensemble Methods Chapter 6. Ensemble Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Introduction

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017 Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Reducing The Computational Cost of Bayesian Indoor Positioning Systems

Reducing The Computational Cost of Bayesian Indoor Positioning Systems Reducing The Computational Cost of Bayesian Indoor Positioning Systems Konstantinos Kleisouris, Richard P. Martin Computer Science Department Rutgers University WINLAB Research Review May 15 th, 2006 Motivation

More information

Bayesian model selection in graphs by using BDgraph package

Bayesian model selection in graphs by using BDgraph package Bayesian model selection in graphs by using BDgraph package A. Mohammadi and E. Wit March 26, 2013 MOTIVATION Flow cytometry data with 11 proteins from Sachs et al. (2005) RESULT FOR CELL SIGNALING DATA

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Bayesian GLMs and Metropolis-Hastings Algorithm

Bayesian GLMs and Metropolis-Hastings Algorithm Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017 Markov Chain Monte Carlo (MCMC) and Model Evaluation August 15, 2017 Frequentist Linking Frequentist and Bayesian Statistics How can we estimate model parameters and what does it imply? Want to find the

More information

DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

More information

Learning the hyper-parameters. Luca Martino

Learning the hyper-parameters. Luca Martino Learning the hyper-parameters Luca Martino 2017 2017 1 / 28 Parameters and hyper-parameters 1. All the described methods depend on some choice of hyper-parameters... 2. For instance, do you recall λ (bandwidth

More information

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree

Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Markov chain Monte-Carlo to estimate speciation and extinction rates: making use of the forest hidden behind the (phylogenetic) tree Nicolas Salamin Department of Ecology and Evolution University of Lausanne

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Markov chain Monte Carlo

Markov chain Monte Carlo 1 / 26 Markov chain Monte Carlo Timothy Hanson 1 and Alejandro Jara 2 1 Division of Biostatistics, University of Minnesota, USA 2 Department of Statistics, Universidad de Concepción, Chile IAP-Workshop

More information

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective 1 Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective Chuanhai Liu Purdue University 1. Introduction In statistical analysis, it is important to discuss both uncertainty due to model choice

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns Aly Kane alykane@stanford.edu Ariel Sagalovsky asagalov@stanford.edu Abstract Equipped with an understanding of the factors that influence

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017

MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 MALA versus Random Walk Metropolis Dootika Vats June 4, 2017 Introduction My research thus far has predominantly been on output analysis for Markov chain Monte Carlo. The examples on which I have implemented

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects

Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Bayesian Nonparametric Accelerated Failure Time Models for Analyzing Heterogeneous Treatment Effects Nicholas C. Henderson Thomas A. Louis Gary Rosner Ravi Varadhan Johns Hopkins University September 28,

More information