Supplementary materials for Scalable Bayesian model averaging through local information propagation
|
|
- Bryce Oliver
- 5 years ago
- Views:
Transcription
1 Supplementary materials for Scalable Bayesian model averaging through local information propagation August 25, 2014 S1. Proofs Proof of Theorem 1. The result follows immediately from the distributions of the decision variables and the fact that the pfs procedure stops in the first t 1 steps iff γ (t 1) < t 1. Proof of Theorem 2. Our proof strategy is to actually find a pfs representation for any model space distribution π. To this end, we can proceed by induction on the total number of potential predictors. First, we note that the conclusion holds for p = 1 or Ω = {0, 1}. In this case there are but two models in the space: the null model and the model including X 1, written as (0) and (1) respectively. Let π( ) be any probability distribution on Ω. It is easy to check that π( ) is the marginal distribution of the final model under the pfs procedure with ρ(0) = π(0), ρ(1) = 1, and λ 1 (0) = 1. Now suppose the inductive claim holds for any model space involving up to p 1 variables. We next show it must hold for the one with p predictors, or Ω = {0, 1} p, as well. To this end, again let π( ) be any distribution on {0, 1} p, and let Ω ( p) = {0, 1}p 1 {0} the collection of models that do not involve X p. Let us define a new distribution π ( ) on Ω ( p) such that for each γ Ω ( p), π (γ) = π(γ) + π(γ +p ). 1
2 where γ +p Ω is the model that adds an additional variable, X p, into γ. It is easy to check that π (γ) = 1. γ Ω ( p) Because Ω ( p) is isomorphic to {0, 1}p 1, π ( ) can be considered a probability distribution on {0, 1} p 1. Thus by the inductive hypothesis, π has a pfs representation with parameter mappings ρ and λ defined on {0, 1} p 1 \{1, 1,...,1}. Now for any γ {0, 1} p, let γ 1:p 1 = (γ 1,γ 2,...,γ p 1 ) {0, 1} p 1, and let ρ and λ be mappings defined on Ω ( p) such that for any γ Ω ( p), if γ 1:p 1 < p 1, ρ (γ) = ρ (γ 1:p 1 ), λ j (γ) = λ j(γ 1:p 1 ) for j = 1, 2,...,p 1, and λ p(γ) = 0, while if γ 1:p 1 = p 1, then ρ (γ) = 1, λ j(γ) = 0 for j = 1, 2,...,p 1, and λ p(γ) = 1. that Now consider the pfs procedure with p predictors with mappings ρ and λ defined such (i) If γ Ω ( p) and π(γ+p ) > 0, ρ(γ) = ρ (γ) π(γ) π (γ) 1 ρ (γ) λ 1 ρ(γ) j(γ) for j = 1, 2,...,p 1, λ j (γ) = π(γ +p ) π (γ) ρ (γ) 1 ρ(γ) for j = p, (ii) If γ Ω ( p) and π(γ+p ) = 0, ρ(γ) = ρ (γ) and λ j (γ) = λ j(γ) for j = 1, 2,...,p. 2
3 (iii) If γ Ω \ Ω ( p) and γ < p, ρ(γ) = 1 and λ j (γ) = 1/(p γ ) 1 γj =0. Under this pfs procedure, the pth predictor is always the last to be added. Now let us check that the marginal distribution of the final model γ (p) is indeed π. Now, for any γ Ω ( p) such that π(γ) > 0, by (i), (ii), and (iii) we have γ (1),...,γ (p) = γ (1),...,γ (p) γ [1 ρ(γ (t 1) )] λ jt (γ (t 1) ) ρ(γ) t=1 γ [1 ρ (γ (t 1) )] λ j t (γ (t 1) ) ρ (γ) π(γ)/π (γ) t=1 =π (γ) π(γ)/π (γ) = π(γ), where j 1,j 2,...,j γ are the values of the selection variables J 1,J 2,...,J γ that corresponds to the sequence of models γ (1),...,γ ( γ 1),γ ( γ ) = γ. Similarly, for any γ Ω \Ω ( p) such that π(γ) > 0, by (i), (ii), and (iii), the marginal probability for γ (p) to be γ is = γ (1),...,γ (p) :γ (p) =γ γ (1),...,γ (p) :γ (p) =γ γ [1 ρ(γ (t 1) )] λ jt (γ (t 1) ) ρ(γ) t=1 γ 1 [1 ρ (γ (t 1) )] λ j t (γ (t 1) ) [1 ρ(γ γ 1)] t=1 =π(γ)/π (γ ( γ 1) ) π (γ ( γ 1) ) = π(γ). π(γ) π (γ ( γ 1) ) ρ (γ ( γ 1) ) [1 ρ(γ ( γ 1) )] 1 The second equality follows because for γ Ω \Ω ( p) such that π(γ) > 0, under (i), (ii), 3
4 and (iii) γ (1),...,γ (p) :γ (p) =γ γ 1 [1 ρ (γ (t 1) )] λ j t (γ (t 1) ) ρ (γ ( γ 1) ) = π (γ ( γ 1) ), t=1 and with probability 1, γ ( γ 1) is the model with the pth predictor removed from γ. Proof of Theorem 3. Let S 1,J 1,S 2,J 2,...,S p,j p be the latent decision variables of the pfs representation of π under consideration. We let (Ω d, F d ) be the probability space on which these decision variables are jointly defined. The sequence of models γ (1),γ (2),...,γ (p) are functions of the decision variables and thus also measurable with respect to (Ω d, F d ). Fixing the data D, the marginal likelihood under the final model, p(d γ (p) ), is also a random variable on (Ω d, F d ). For any γ Ω, we define an event U γ on (Ω d, F d ) that γ is a submodel of the final model γ (p), that is, γ (p) contains all of the predictors included in γ. athematically, this event can be expressed as U γ : = {ω Ω d : γ (t) (ω) = γ for t = γ }. Next, we define a mapping Φ : Ω R as follows. For each γ Ω, Φ(γ) := E γ(p) [ p(d γ(p) ) U γ ], where the data D is fixed and the expectation is taken over the final model γ (p), or equivalently the decision variables, conditional on the event U γ. Now for any γ Ω, we claim that p(d γ) if γ = p, Φ(γ) = ρ(γ)p(d γ) + (1 ρ(γ)) j:γ j =0 λ j(γ) Φ(γ +j ), if γ < p. 4
5 To see this, note that if γ = p, then conditional on U γ, we have γ (p) = γ, and so E γ(p) [p(d γ (p) ) U γ ] = p(d γ). Now if γ = t < p, then by the tower property, E γ(p) [p(d γ (p) ) U γ ] =E γ(p) [E γ(p) [p(d γ (p) ) S t+1,u γ ] U γ ] =E γ(p) [p(d γ (p) ) S t+1 = 1,U γ ] P(S t+1 = 1 U γ ) + E γ(p) [p(d γ (p) ) J t+1 = j,s t+1 = 0,U γ ] P(J t+1 = j S t+1 = 0,U γ ) P(S t+1 = 0 U γ ). j:γ j =0 Now note that S t+1 = 1 and U γ together imply that γ (p) = γ and so E γ(p) [p(d γ (p) ) S t+1 = 1,U γ ] = p(d γ). Also, for each j such that γ j = 0, {ω Ω d : J t+1 (ω) = 1,S t+1 (ω) = 0} U γ U γ +j. oreover, conditional on the event U γ +j, γ (p) is a function of S t+2,j t+2,...,s p,j p,s p+1 and so is independent of S 1,J 1,...,S t+1,j t+1. Thus, E γ(p) [p(d γ (p) ) J t+1 = j,s t+1 = 0,U γ ] = E γ(p) [p(d γ (p) ) J t+1 = j,s t+1 = 0,U γ,u γ +j ] = E γ(p) [p(d γ (p) ) U γ +j ] = Φ(γ +j ). Finally, since P(S t+1 = 1 U γ ) = ρ(γ) and P(J t+1 = j S t+1 = 0,U γ ) = λ j (γ), putting the pieces together we have Φ(γ) = ρ(γ)p(d γ) + (1 ρ(γ)) λ j (γ) Φ(γ +j ). j:γ j =0 This establishes the above claim about Φ. 5
6 Given the mapping Φ, we are now ready to establish the theorem. First, because under the pfs representation the data generative mechanism essentially forms an H by Theorem 1, the model space posterior has a pfs representation with the mappings ρ( D) and λ( D) determined by the posterior distributions of the decision variables S 1,J 1,...,S p,j p. So our proof strategy now is to simply find the posterior distributions of these decision variables. For any model γ Ω with γ = t < p, ρ(γ D) = P(S t+1 = 1 U γ, D) = P(S t+1 = 1, D U γ ) P(D U γ ) = E γ (p) [p(d γ (p) ) S t+1 = 1,U γ ] P(S t+1 = 1 U γ ) E γ(p) [p(d γ (p) ) U γ ] = ρ(γ)p(d γ)/φ(γ), which is equal to 1 when ρ(γ) = 1. Similarly, if ρ(γ) 1, then λ j (γ D) = P(J t+1 = j U γ,s t+1 = 0, D) = P(J t+1 = j,s t+1 = 0, D U γ ) P(S t+1 = 0, D U γ ) = P(J t+1 = j,s t+1 = 0, D U γ ) P(D U γ ) P(S t+1 = 1, D U γ ) = E γ (p) [p(d γ (p) ) J t+1 = j,s t+1 = 0,U γ ] P(J t+1 = j S t+1 = 0,U γ ) P(S t+1 = 0 U γ ) E γ(p) [p(d γ (p) ) U γ ] E γ(p) [p(d γ (p) ) S t+1 = 1,U γ ] P(S t+1 = 1 U γ ) = Φ(γ+j ) λ j (γ) (1 ρ(γ)). Φ(γ) p(d γ) ρ(γ) On the other hand, for ρ(γ) = 1, then given U γ, S t+1 = 0 with probability 0 and the value of J t for t > γ has no impact on the final model γ (p). So we can simply set λ j (γ D) = λ j (γ) for all j. The theorem now follows by letting φ(γ) = Φ(γ)/p(D 0). 6
7 S2. Bayes factors under g and hyper-g priors For many common priors on the regression coefficients, the BF term in the weight update can be computed either in closed form or well approximated numerically. Here let us consider two popular priors the g-prior and the hyper-g prior. Given a particular model γ, Zellner s g-prior in its most popular form is the following prior on the regression coefficients and the noise variance p(ϕ) 1/ϕ and β γ ϕ,γ N(β 0 γ,g(x T X) 1 /ϕ) where βγ 0 and g are hyperparameters. Following the exposition in Liang et al. (2008), we assume without loss of generality that the predictor variables X 1, X 2,..., X p have all been mean centered at zero. Then we can place a common non-informative flat prior on the intercept α for all models. So p(α,ϕ) 1/ϕ. Under this prior setup, one can show that the BF for a model γ versus the null model is given by BF 0 (γ) = (1 + g) (n 1 γ )/2 ( 1 + g(1 R 2 γ ) ) (n 1)/2 where Rγ 2 is the coefficient of determination for model γ. To avoid undesirable features of the g-priors such as Barlett s paradox and the information paradox (Berger and Pericchi, 2001), Liang et al. (2008) proposed the use of mixtures of g- priors. In particular, they introduced the hyper-g prior, which puts the following hyperprior on g: g 1 + g Beta(1,a/2 1). This prior also renders a closed form representation for the model-specific marginal likelihood, and thus for the corresponding BFs. In particular, Liang et al. (2008) showed that the BF 7
8 of a model γ versus the null model is given by BF 0 (γ) = a 2 γ + a 2 ( ) 2F 1 (n 1)/2, 1; ( γ + a)/2;r 2 γ where 2 F 1 (, ; ; ) is the hypergeometric function. ore specifically, in the notations of Liang et al. (2008), 2F 1 (a,b;c;z) = Γ(c) 1 t b 1 (1 t) c b 1 dt. Γ(b)Γ(c b) 0 (1 tz) a Therefore, with either the g-prior and the hyper-g prior the BF in the weight update can be computed as ) BF ( ) BF 0 (γ i γ(t),γ i (t 1) i (t) = ). BF 0 (γ(t 1) i S3. Incorporating dilution under model space redundancy In this subsection we show that the pfs representation allows us much flexibility in incorporating prior information, and we illustrates this through an interesting phenomenon called the dilution effect first noted by George (1999). Dilution occurs when there is redundancy in the model space. ore specifically, consider the scenario where there is strong correlation among some of the predictors, and any one of these predictors captures virtually all of the association between them and the response. In this case models that contain different members of this class but are otherwise identical are essentially the same. As a result, if, say, a symmetric prior specification is adopted, these models will receive more prior probability than they properly should. At the same time, other models that do not include members of this class will be down-weighted in the prior. In real data, this phenomenon occurs to varying degrees depending on the underlying correlation structure among the predictors. 8
9 Next, we present a very simple specification of the model space prior under the pfs representation that can effectively address this phenomenon. We do not claim that this approach is the best way to deal with dilution, but rather use this as an example to illustrate the flexibility rendered by the pfs representation. The specification can most simply be described in two steps. Step I. Pre-clustering the predictors based on their correlation. First, we carry out a hierarchical clustering over the predictor variables using the (absolute) correlation as the similarity metric, which divides the predictors into K clusters C 1,C 2,...,C K. We recommend using complete linkage for this purpose as this will ensure that the variables within each cluster are all very close to each other. One need to choose a correlation threshold s for cutting the corresponding dendrogram into clusters in the case of complete linkage, this is the minimum correlation for two variables to be in the same cluster. We recommend choosing a large s, such as 0.9, to place variables into the same basket only if they are very highly correlated. Step II. Prior specification given the predictor clusters. Based on the predictor clusters, we assign prior selection probabilities for a model γ to the variables not yet in the model in the following manner. First, we place equal total prior selection probability over each of the available clusters. Then within each cluster, we assign selection probability evenly across the variables. For example, consider the situation where there are a total of 10 predictors X 1 through X 10, and following Step I, they form four clusters C 1 = {X 1,X 2,X 3 }, C 2 = {X 4,X 10 }, C 3 = {X 5,X 7,X 9 } and C 4 = {X 6,X 8 }. Let γ be the model that contains variables X 1, X 4, X 5, X 6, and X 8. That is, γ = (1, 0, 0, 1, 1, 1, 0, 1, 0, 0). If the FS procedure reaches γ and the procedure does not stop, that is, S(γ) = 0, then five variables, X 2,X 3,X 7,X 9,X 10, from three clusters C 1 = {X 2,X 3 }, C 2 = {X 4 }, and C 3 = {X 5 } are available for further 9
10 inclusion. In this case we choose the selection probabilities λ(γ) to be: λ 1 (γ) = λ 4 (γ) = λ 5 (γ) = λ 6 (γ) = λ 8 (γ) = 0, λ 2 (γ) = λ 3 (γ) = 1/3 1/2 = 1/6, λ 4 = 1/3 and λ 5 = 1/3. Under such a specification, the predictors falling in the same cluster evenly share a fixed piece of the prior selection probability, which ensures that the prior weight on the other variables are not diluted. References Berger, J. O. and L. R. Pericchi (2001). Objective Bayesian methods for model selection: Introduction and comparison. Lecture Notes-onograph Series 38, pp George, E. I. (1999). Sampling considerations for model averaging and model search. invited discussion of odel averaging and model search, by. Clyde. In J.. Bernado, J. O. Berger, A. P. Dawid, and A. F.. Smith (Eds.), Bayesian Statistics 6, pp Oxford, UK: Oxford University Press. Liang, F., R. Paulo, G. olina,. A. Clyde, and J. O. Berger (2008). ixtures of g-priors for Bayesian Variable Selection. Journal of the American Statistical Association 103(481),
Bayesian Model Averaging
Bayesian Model Averaging Hoff Chapter 9, Hoeting et al 1999, Clyde & George 2004, Liang et al 2008 October 24, 2017 Bayesian Model Choice Models for the variable selection problem are based on a subset
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 9, 2017 Bartlett s Paradox The Bayes factor for comparing M γ to the null model: BF
More informationMixtures of Prior Distributions
Mixtures of Prior Distributions Hoff Chapter 9, Liang et al 2007, Hoeting et al (1999), Clyde & George (2004) November 10, 2016 Bartlett s Paradox The Bayes factor for comparing M γ to the null model:
More informationBayesian Variable Selection Under Collinearity
Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. Abstract In this article we highlight some interesting facts about Bayesian variable selection methods for linear regression
More informationModel Choice. Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci. October 27, 2015
Model Choice Hoff Chapter 9, Clyde & George Model Uncertainty StatSci, Hoeting et al BMA StatSci October 27, 2015 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model
More informationBayesian Variable Selection Under Collinearity
Bayesian Variable Selection Under Collinearity Joyee Ghosh Andrew E. Ghattas. June 3, 2014 Abstract In this article we provide some guidelines to practitioners who use Bayesian variable selection for linear
More informationMixtures of g-priors for Bayesian Variable Selection
Mixtures of g-priors for Bayesian Variable Selection January 8, 007 Abstract Zellner s g-prior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency
More informationMixtures of g-priors for Bayesian Variable Selection
Mixtures of g-priors for Bayesian Variable Selection Feng Liang, Rui Paulo, German Molina, Merlise A. Clyde and Jim O. Berger August 8, 007 Abstract Zellner s g-prior remains a popular conventional prior
More informationg-priors for Linear Regression
Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks
(9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate
More informationMixtures of g Priors for Bayesian Variable Selection
Mixtures of g Priors for Bayesian Variable Selection Feng LIANG, RuiPAULO, GermanMOLINA, Merlise A. CLYDE, and Jim O. BERGER Zellner s g prior remains a popular conventional prior for use in Bayesian variable
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More informationThe Median Probability Model and Correlated Variables
The Median Probability Model and Correlated Variables July 1 st 018 By Barbieri, M. 1, Berger, J. O., George, E. I. 3 and Rockova, V. 4 1 Universita Roma Tre, Duke University, 3 University of Pennsylvania,
More informationSupplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control
Supplementary Materials for Molecular QTL Discovery Incorporating Genomic Annotations using Bayesian False Discovery Rate Control Xiaoquan Wen Department of Biostatistics, University of Michigan A Model
More informationPosterior Model Probabilities via Path-based Pairwise Priors
Posterior Model Probabilities via Path-based Pairwise Priors James O. Berger 1 Duke University and Statistical and Applied Mathematical Sciences Institute, P.O. Box 14006, RTP, Durham, NC 27709, U.S.A.
More informationComment on Article by Scutari
Bayesian Analysis (2013) 8, Number 3, pp. 543 548 Comment on Article by Scutari Hao Wang Scutari s paper studies properties of the distribution of graphs ppgq. This is an interesting angle because it differs
More informationChoosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation
Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble
More informationTHE MEDIAN PROBABILITY MODEL AND CORRELATED VARIABLES
Submitted to the Annals of Statistics THE MEDIAN PROBABILITY MODEL AND CORRELATED VARIABLES By Maria M. Barbieri, James O. Berger Edward I. George and, Veronika Ročková,, 17 August 2018 Università Roma
More informationPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More information20: Gaussian Processes
10-708: Probabilistic Graphical Models 10-708, Spring 2016 20: Gaussian Processes Lecturer: Andrew Gordon Wilson Scribes: Sai Ganesh Bandiatmakuri 1 Discussion about ML Here we discuss an introduction
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationModel Choice. Hoff Chapter 9. Dec 8, 2010
Model Choice Hoff Chapter 9 Dec 8, 2010 Topics Variable Selection / Model Choice Stepwise Methods Model Selection Criteria Model Averaging Variable Selection Reasons for reducing the number of variables
More informationKazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies. Abstract
Bayesian Estimation of A Distance Functional Weight Matrix Model Kazuhiko Kakamu Department of Economics Finance, Institute for Advanced Studies Abstract This paper considers the distance functional weight
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More information1 Hypothesis Testing and Model Selection
A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection
More informationMODEL AVERAGING by Merlise Clyde 1
Chapter 13 MODEL AVERAGING by Merlise Clyde 1 13.1 INTRODUCTION In Chapter 12, we considered inference in a normal linear regression model with q predictors. In many instances, the set of predictor variables
More informationThe Calibrated Bayes Factor for Model Comparison
The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop
More informationAdaptive testing of conditional association through Bayesian recursive mixture modeling
Adaptive testing of conditional association through Bayesian recursive mixture modeling Li Ma February 12, 2013 Abstract In many case-control studies, a central goal is to test for association or dependence
More informationAn introduction to Bayesian inference and model comparison J. Daunizeau
An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison
More informationA simple two-sample Bayesian t-test for hypothesis testing
A simple two-sample Bayesian t-test for hypothesis testing arxiv:159.2568v1 [stat.me] 8 Sep 215 Min Wang Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA and Guangying
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationBayesian inference for the location parameter of a Student-t density
Bayesian inference for the location parameter of a Student-t density Jean-François Angers CRM-2642 February 2000 Dép. de mathématiques et de statistique; Université de Montréal; C.P. 628, Succ. Centre-ville
More informationBAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS
BAYESIAN CLASSIFICATION OF HIGH DIMENSIONAL DATA WITH GAUSSIAN PROCESS USING DIFFERENT KERNELS Oloyede I. Department of Statistics, University of Ilorin, Ilorin, Nigeria Corresponding Author: Oloyede I.,
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical and Physical Sciences, National Technical
More informationHierarchical Modelling for Multivariate Spatial Data
Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationHarrison B. Prosper. CMS Statistics Committee
Harrison B. Prosper Florida State University CMS Statistics Committee 08-08-08 Bayesian Methods: Theory & Practice. Harrison B. Prosper 1 h Lecture 3 Applications h Hypothesis Testing Recap h A Single
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationHierarchical Modeling for Multivariate Spatial Data
Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.
More informationVariance prior forms for high-dimensional Bayesian variable selection
Bayesian Analysis (0000) 00, Number 0, pp. 1 Variance prior forms for high-dimensional Bayesian variable selection Gemma E. Moran, Veronika Ročková and Edward I. George Abstract. Consider the problem of
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More information. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)
Supplementary information S7 Testing for association at imputed SPs puted SPs Score tests A Score Test needs calculations of the observed data score and information matrix only under the null hypothesis,
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web this afternoon: capture/recapture.
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationSparse Factor-Analytic Probit Models
Sparse Factor-Analytic Probit Models By JAMES G. SCOTT Department of Statistical Science, Duke University, Durham, North Carolina 27708-0251, U.S.A. james@stat.duke.edu PAUL R. HAHN Department of Statistical
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationRelevance Vector Machines
LUT February 21, 2011 Support Vector Machines Model / Regression Marginal Likelihood Regression Relevance vector machines Exercise Support Vector Machines The relevance vector machine (RVM) is a bayesian
More informationBayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives. 1(w i = h + 1)β h + ɛ i,
Bayesian Hypothesis Testing in GLMs: One-Sided and Ordered Alternatives Often interest may focus on comparing a null hypothesis of no difference between groups to an ordered restricted alternative. For
More informationFrailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.
Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk
More informationThe joint posterior distribution of the unknown parameters and hidden variables, given the
DERIVATIONS OF THE FULLY CONDITIONAL POSTERIOR DENSITIES The joint posterior distribution of the unknown parameters and hidden variables, given the data, is proportional to the product of the joint prior
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Lecture 15-7th March Arnaud Doucet
Stat 535 C - Statistical Computing & Monte Carlo Methods Lecture 15-7th March 2006 Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Mixture and composition of kernels. Hybrid algorithms. Examples Overview
More information(1) Introduction to Bayesian statistics
Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they
More informationLecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT / 15
Lecture 16 : Bayesian analysis of contingency tables. Bayesian linear regression. Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 15 Contingency table analysis North Carolina State University
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationUsing Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results
Using Historical Experimental Information in the Bayesian Analysis of Reproduction Toxicological Experimental Results Jing Zhang Miami University August 12, 2014 Jing Zhang (Miami University) Using Historical
More informationST 740: Model Selection
ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationDiscussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance
Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance by Casarin, Grassi, Ravazzolo, Herman K. van Dijk Dimitris Korobilis University of Essex,
More informationLearning to Learn and Collaborative Filtering
Appearing in NIPS 2005 workshop Inductive Transfer: Canada, December, 2005. 10 Years Later, Whistler, Learning to Learn and Collaborative Filtering Kai Yu, Volker Tresp Siemens AG, 81739 Munich, Germany
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationA REVERSE TO THE JEFFREYS LINDLEY PARADOX
PROBABILITY AND MATHEMATICAL STATISTICS Vol. 38, Fasc. 1 (2018), pp. 243 247 doi:10.19195/0208-4147.38.1.13 A REVERSE TO THE JEFFREYS LINDLEY PARADOX BY WIEBE R. P E S T M A N (LEUVEN), FRANCIS T U E R
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationBayesian Adjustment for Multiplicity
Bayesian Adjustment for Multiplicity Jim Berger Duke University with James Scott University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19, 2011 1 2011 Rao Prize
More informationBayesian model selection for computer model validation via mixture model estimation
Bayesian model selection for computer model validation via mixture model estimation Kaniav Kamary ATER, CNAM Joint work with É. Parent, P. Barbillon, M. Keller and N. Bousquet Outline Computer model validation
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationObjective Bayesian Hypothesis Testing
Objective Bayesian Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Statistical Science and Philosophy of Science London School of Economics (UK), June 21st, 2010
More informationPubh 8482: Sequential Analysis
Pubh 8482: Sequential Analysis Joseph S. Koopmeiners Division of Biostatistics University of Minnesota Week 10 Class Summary Last time... We began our discussion of adaptive clinical trials Specifically,
More informationBayesian variable selection in high dimensional problems without assumptions on prior model probabilities
arxiv:1607.02993v1 [stat.me] 11 Jul 2016 Bayesian variable selection in high dimensional problems without assumptions on prior model probabilities J. O. Berger 1, G. García-Donato 2, M. A. Martínez-Beneito
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More information