5: Biostatistical Applications of Bayesian Decision Theory

Size: px
Start display at page:

Download "5: Biostatistical Applications of Bayesian Decision Theory"

Transcription

1 Introduction to Bayesian Data Analysis 5: Biostatistical Applications of Bayesian Decision Theory David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz, USA draper Centers for Disease Control and Prevention (Atlanta GA) June 2008 c 2008 David Draper (all rights reserved) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 1

2 The Big Picture One possible definition statistics is the study of uncertainty: how to measure it, and what to do about it. How to measure uncertainty: probability; two main probability paradigms: frequentist and Bayesian. What to do about uncertainty: two main activities Inference: Generalizing outward from a given set of information (sample) to a larger universe (population), and attaching well-calibrated measures of uncertainty to the generalizations (e.g., Nonwhites in the population of people at substantial risk of HIV 1 infection are 88% more likely to get infected if they don t receive this rgp120 vaccine than if they do receive it (relative risk of infection 1.88, 95% interval estimate ) ). Decision-Making: Taking or recommending an action on the basis of available data, in spite of remaining uncertainties (e.g., Based on this trial, for whom nonwhites were a secondary subgroup, it s recommended that the vaccine be studied further with nonwhites as the primary study group ). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 2

3 Use of Frequentist and Bayesian Probability in Statistics Frequentist probability: Restrict attention to phenomena that are inherently repeatable under (essentially) identical conditions; then, for an event A of interest, P F (A) = limiting relative frequency with which A occurs in the (hypothetical) repetitions, as number of repetitions n. + Math easier; focuses attention on calibration issues (how often do I get the right answer?). Only applies to inherently repeatable phenomena: can t speak directly about many uncertain things of interest (e.g., P F (this patient is HIV+) is undefined); predictive interval estimates often not so easy to create; small-sample inferential calibration not so easy to achieve. Bayesian probability: numerical weight of evidence in favor of an uncertain proposition, obeying a series of reasonable axioms to ensure that Bayesian probabilities are coherent (internally logically consistent). + Applies to any uncertain situation; predictive intervals easy; Wald (1950; a frequentist!): All good decisions are Bayes rules. Math harder; coherence doesn t guarantee good calibration. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 3

4 Frequentist Inference, Prediction and Decision-Making Frequentist inference: (1) I think of my data as like a random sample from some population (challenge: often difficult with observational data to identify what this population really is). (2) I identify some numerical summary θ of the population of interest (e.g., a relative risk), and I find a reasonable estimate ˆθ of θ based on the sample (challenge: how define reasonable?). (3) I imagine repeating the random sampling, and I use the random behavior of ˆθ across these repetitions to make probability statements involving (but not about!) θ (e.g., confidence intervals for θ [e.g., I m 95% confident that θ RR is between 1.14 and 3.13 ] or hypothesis tests about θ [e.g., the P value for testing H 0 : θ RR < 1 against H A : θ RR 1 is 0.012, so I reject H 0 ). Frequentist point prediction (e.g., in regression) is easy; constructing predictive intervals to check the calibration of the prediction process on new data is less easy (one solution: the bootstrap); nobody does real-world frequentist decision-making since Wald s famous theorem. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 4

5 Bayesian Statistical Paradigm Three basic ingredients of the Bayesian statistical paradigm: θ, something of interest which is unknown (or only partially known) to me (e.g., θ RR ). Often θ is a parameter vector (of finite length k, say) or a matrix, but it can literally be almost anything, e.g., a function (e.g., a cumulative distribution function (CDF) or density, a regression surface,...), a phylogenetic tree, an image of the (true) surface of Mars,.... y, an information source which is relevant to decreasing my uncertainty about θ. Often y is a vector of real numbers (of length n, say), but it can also literally be almost anything, e.g., a time series, a movie, the text in a book,.... A desire to learn about θ from y in a way that is both coherent (internally consistent, i.e., free of internal logical contradictions) and well-calibrated (externally consistent, e.g., capable of making accurate predictions of future data y ). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 5

6 All Uncertainty Quantified With Probability Distributions It turns out (e.g., de Finetti 1990, Jaynes 2003) that I m compelled in this situation to reason within the standard rules of probability as the basis of my inferences about θ, predictions of future data y, and decisions in the face of uncertainty, and to quantify my uncertainty about any unknown quantities through conditional probability distributions, as follows: p(θ y, B) = c p(θ B) l(θ y, B) p(y y, B) = p(y θ, B) p(θ y, B) dθ (1) a = argmax a A E (θ y,b) [U(a, θ)] B stands for my background (often not fully stated) assumptions and judgments about how the world works, as these assumptions relate to learning about θ from y. B is often omitted from the basic equations (sometimes with unfortunate consequences), yielding the simpler-looking forms p(θ y) = c p(θ) l(θ y) a = argmax a A p(y y) = p(y θ) p(θ y) dθ E (θ y) [U(a, θ)] (2) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 6

7 Prior and Posterior Distributions p(θ y, B) = c p(θ B) l(θ y, B) p(y y, B) = p(y θ, B) p(θ y, B) dθ a = argmax a A E (θ y,b) [U(a, θ)] p(θ B) is my (so-called) prior information about θ given B, in the form of a probability density function (PDF) or probability mass function (PMF) if θ lives continuously or discretely on R k (let s just agree to call this my prior distribution), and p(θ y, B) is my (so-called) posterior distribution about θ given y and B, which summarizes my current total information about θ and solves the inference problem. These are actually not very good names for p(θ B) and p(θ y, B), because (e.g.) p(θ B) really stands for all (relevant) information about θ (given B) external to y, whether that information was obtained before (or after) y arrives, but (a) they do emphasize the sequential nature of learning and (b) through long usage we re stuck with them. c (here and throughout) is a generic positive normalizing constant, inserted into the top equation above to make the left-hand side integrate (or sum) to 1 (as any coherent distribution must). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 7

8 Sampling Distributions, Likelihood Functions and Utility p(θ y, B) = c p(θ B) l(θ y, B) p(y y, B) = p(y θ, B) p(θ y, B) dθ a = argmax a A E (θ y,b) [U(a, θ)] p(y θ, B) is my sampling distribution for future data values y given θ and B (and presumably I would use the same sampling distribution p(y θ, B) for (past) data values y, thinking before the data arrives about what values of y I might see). This assumes that I m willing to regard my data as like random draws from a population of possible data values (an heroic assumption in some cases, e.g., with observational rather than randomized data). l(θ y, B) is my likelihood function for θ given y and B, which is defined to be any positive constant multiple of the sampling distribution p(y θ, B) but re-interpreted as a function of θ for fixed y: l(θ y, B) = c p(y θ, B). (3) A is my set of possible actions, U(a, θ) is the numerical value (utility) I attach to taking action a if the unknown is really θ, and the third equation says I should find the action a that maximizes expected utility (MEU). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 8

9 Predictive Distributions and MCMC p(θ y, B) = c p(θ B) l(θ y, B) p(y y, B) = p(y θ, B) p(θ y, B) dθ a = argmax E (θ y,b) [U(a, θ)] a A And p(y y, B), my (posterior) predictive distribution for future data y given (past) data y and B, must be a weighted average of my sampling distribution p(y θ, B) weighted by my current best information p(θ y, B) about θ given y and B. That s the paradigm, and it s been highly successful in the past (say) 30 years in fields as far-ranging as bioinformatics, econometrics, environmetrics, and medicine at quantifying uncertainty in a coherent and well-calibrated way and helping people find satisfying answers to hard scientific questions. Evaluating (potentially high-dimensional) integrals (like the one in the second equation above, and many others that arise in the Bayesian approach) is a technical challenge, often addressed these days with sampling-based Markov chain Monte Carlo (MCMC) methods (e.g., Gilks, Richardson and Spiegelhalter 1996). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 9

10 An Example of Poorly-Calibrated Frequentist Inference Quality of hospital care is often studied with cluster samples: I take a random sample of J hospitals (indexed by j) and a random sample of N total patients (indexed by i) nested in the chosen hospitals, and I measure quality of care for the chosen patients and various hospital- and patient-level predictors. With y ij as the quality of care score for patient i in hospital j, a first step would often be to fit a variance-components model with random effects at both the hospital and patient levels: y ij = β 0 + u j + e ij, i = 1,..., n j, j = 1,..., J; J j=1 n j = N, (u j σu) 2 IID N(0, σu), 2 (e ij σe) 2 IID N(0, σe). 2 (4) Browne and Draper (2006) used a simulation study to show that, with a variety of maximum-likelihood-based methods for creating confidence intervals for σu, 2 the actual coverage of nominal 95% intervals ranged from 72% to 94% across realistic sample sizes and true parameter values, versus 89 94% for Bayesian methods. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 10

11 Poorly-Calibrated Frequentist Inference (continued) In a re-analysis of a Guatemalan National Survey of Maternal and Child Health, with three-level data (births within mothers within communities), working with the random-effects logistic regression model (y ijk p ijk ) indep Bernoulli(p ijk ) with logit(p ijk ) = β 0 + β 1 x 1ijk + β 2 x 2jk + β 3 x 3k + u jk + v k, (5) where y ijk is a binary indicator of modern prenatal care or not and where u jk N(0, σu) 2 and v k N(0, σv) 2 were random effects at the mother and community levels (respectively), Browne and Draper (2006) showed that things can be even worse for likelihood-based methods, with actual coverages (at nominal 95%) as low as 0 2% for intervals for σu 2 and σv, 2 whereas Bayesian methods again produce actual coverages from 89 96%. The technical problem is that the marginal likelihood functions for random-effects variances are often heavily skewed, with maxima at or near 0 even when the true variance is positive; Bayesian methods, which integrate over the likelihood function rather than maximizing it, can have (much) better small-sample calibration performance. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 11

12 Where I m Headed in This Part of the Short Course I ve argued that the frequentist and Bayesian paradigms both have strengths and weaknesses, so (unlike the position taken by many people in the 20th century) my job is not to choose one paradigm and defend it against attacks from people who prefer the other paradigm, but instead to (a) understand both paradigms thoroughly and (b) find a fusion of the two that emphasizes the strengths and plays down the weaknesses. My personal fusion has two steps: (1) to reason in a Bayesian way when formulating my inferences, predictions and decisions, because the Bayesian paradigm is the most flexible approach so far invented for quantifying all relevant sources of uncertainty, and (2) to reason in a frequentist way when evaluating the quality of my answers, by paying attention to calibration issues (e.g., creating a simulation environment similar to the problem I m studying but in which truth is known, and seeing how often my Bayesian methods recover known truth). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 12

13 Where I m Headed (continued) The 20th century was dominated by the frequentist point of view, which was good because of the emphasis on calibration but bad because this produced an over-emphasis on inference at the expense of prediction and decision-making. In particular, problems that at first look inferential (because that was the only way the last century s dominant paradigm could handle them) may profitably be reformulated as decisions, and people sometimes use inferential tools to suggest optimal behaviors that are not as optimal as they initially seem. Here I ll describe two case studies in biostatistics in which Bayesian decision theory gives new insight in settings that seem inferential: variable selection in generalized linear models (with application to the construction of a cost-effective scale for measuring sickness at admission to hospital), and determining the efficacy of a vaccine against HIV. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 13

14 Measuring Sickness at Admission Variable selection (choosing the best subset of predictors) in generalized linear models is an old problem, dating back at least to the 1960s, and many methods have been proposed to try to solve it; but virtually all of them ignore an aspect of the problem that can be important: the cost of data collection of the predictors. Case study 1. (Fouskakis and Draper, JASA, 2008; Fouskakis, Ntzoufras and Draper (FND), submitted, 2007a, 2007b). In the field of quality of health care measurement, patient sickness at admission is often assessed by using logistic regression of mortality within 30 days of admission on a fairly large number of sickness indicators (on the order of 100) to construct a sickness scale, employing standard variable selection methods (e.g., backward selection from a model with all predictors) to find an optimal subset of indicators. Such benefit-only methods ignore the considerable differences among the sickness indicators in cost of data collection, an issue that s crucial when admission sickness is used to drive programs (now implemented or Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 14

15 Choosing Utility Function (continued) under consideration in several countries, including the U.S. and U.K.) that attempt to identify substandard hospitals by comparing observed and expected mortality rates (given admission sickness). When both data-collection cost and accuracy of prediction of 30-day mortality are considered, a large variable-selection problem arises in which costly variables that do not predict well enough should be omitted from the final scale. There are two main ways to solve this problem you can (a) put cost and predictive accuracy on the same scale and optimize, or (b) maximize the latter subject to a bound on the former leading to three methods: (1) a decision-theoretic cost-benefit approach based on maximizing expected utility (Fouskakis and Draper, 2008), (2) an alternative cost-benefit approach based on posterior model odds (FND, 2007a), and (3) a cost-restriction-benefit analysis that maximizes predictive accuracy subject to a bound on cost (FND, 2007b). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 15

16 The Data Data (Kahn et al., JAMA, 1990): p = 83 sickness indicators gathered on representative sample of n = 2, 532 elderly American patients hospitalized in the period with pneumonia; original RAND benefit-only scale based on subset of 14 predictors: Variable Cost (U.S.$) Correlation Good? Total APACHE II score (36-point scale) Age Systolic blood pressure score (2-point scale) Chest X-ray congestive heart failure score (3-point scale) Blood urea nitrogen APACHE II coma score (3-point scale) Serum albumin (3-point scale) Shortness of breath (yes, no) Respiratory distress (yes, no) Septic complications (yes, no) Prior respiratory failure (yes, no) Recently hospitalized (yes, no) Ambulatory score (3-point scale) Temperature Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 16

17 Decision-Theoretic Cost-Benefit Approach Approach (1) (decision-theoretic cost-benefit). Problem formulation: Suppose (a) the 30 day mortality outcome y i and data on p sickness indicators (x i1,..., X ip ) have been collected on n individuals sampled exchangeably from a population P of patients with a given disease, and (b) the goal is to predict the death outcome for n new patients who will in the future be sampled exchangeably from P, (c) on the basis of some or all of the predictors X j, when (d) the marginal costs of data collection per patient c 1,..., c p for the X j vary considerably. What is the best subset of the X j to choose, if a fixed amount of money is available for this task and you re rewarded based on the quality of your predictions? Since data on future patients are not available, we use a cross-validation approach in which (i) a random subset of n M observations is drawn for creation of the mortality predictions (the modeling subsample) and (ii) the quality of those predictions is assessed on the remaining n V = (n n M ) observations (the validation subsample, which serves as a proxy for future patients). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 17

18 Utility Elicitation Here utility is quantified in monetary terms, so that data collection part of utility function is simply negative of total amount of money required to gather data on specified predictor subset (manual data abstraction from hardcopy patient charts will gradually be replaced by electronic medical records, but still widely used in quality of care studies). Letting I j = 1 if X j is included in a given model (and 0 otherwise), the data-collection utility associated with subset I = (I 1,..., I p ) for patients in the validation subsample is p U D (I) = n V c j I j, (6) where c j is the marginal cost per patient of data abstraction for variable j (the second column in the table above gave examples of these marginal costs). To measure the accuracy of a model s predictions, a metric is needed that quantifies the discrepancy between the actual and predicted values, and in this problem the metric must come out in monetary terms on a scale comparable to that employed with the data-collection utility. j=1 Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 18

19 Utility Elicitation (continued) In the setting of this problem the outcomes Y i are binary death indicators and the predicted values ˆp i, based on statistical modeling, take the form of estimated death probabilities. We use an approach to the comparison of actual and predicted values that involves dichotomizing the ˆp i with respect to a cutoff, to mimic the decision-making reality that actions taken on the basis of observed-versus-expected quality assessment will have an all-or-nothing character at the hospital level (for example, regulators must decide either to subject or not subject a given hospital to a more detailed, more expensive quality audit based on process criteria). In the first step of our approach, given a particular predictor subset I, we fit a logistic regression model to the modeling subsample M and apply this model to validation subsample V to create predicted death probabilities ˆp I i. In more detail, letting Y i = 1 if patient i dies and 0 otherwise, and taking X i1,..., X ik to be the k sickness predictors for this patient under model I, the usual sampling model which underlies logistic regression in this case is Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 19

20 Utility Elicitation (continued) (Y i p I i ) indep Bernoulli(p I i ), log( pi i ) = β 1 p I 0 + β 1 X i β k X ik. i We use maximum likelihood to fit this model (as a computationally efficient approximation to Bayesian fitting with relatively diffuse priors), obtaining a vector ˆβ of estimated logistic regression coefficients, from which the predicted death probabilities for the patients in subsample V are as usual given by [ ( )] 1 k ˆp I i = 1 + exp ˆβ j X ij, (8) where X i0 = 1 (ˆp I i j=0 may be thought of as the sickness score for patient i under model I). In the second step of our approach we classify patient i in the validation subsample as predicted dead or alive according to whether ˆp I i exceeds or falls short of a cutoff p, which is chosen by searching on a discrete grid from 0.01 to 0.99 by steps of 0.01 to maximize the predictive accuracy of model I. (7) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 20

21 Utility Elicitation (continued) We then cross-tabulate actual versus predicted death status in a 2 2 contingency table, rewarding and penalizing model I according to the numbers of patients in the validation sample which fall into the cells of the right-hand part of the following table. Rewards and Penalties Predicted Counts Predicted Died Lived Died Lived Actual Died C 11 C 12 n 11 n 12 Lived C 21 C 22 n 21 n 22 The left-hand part of this table records the rewards and penalties in US$. The predictive utility of model I is then U P (I) = 2 2 C lm n lm. (9) l=1 m=1 To elicit the utility values C lm we reason as follows. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 21

22 Utility Elicitation (continued) (1) Clearly C 11 (the reward for correctly predicting death at 30 days) and C 22 (the reward for correctly predicting living at 30 days) should be positive, and C 12 (the penalty for a false prediction of living) and C 21 (the penalty for a false prediction of death) should be negative. (2) Since it s easier to correctly predict that a person lives than dies with these data (the overall pneumonia 30 day death rate in the RAND sample was 16%, so a prediction that every patient lives would be right about 84% of the time), it s natural to specify that C 11 > C 22. (3) Since it s arguably worse to label a bad hospital as good than the other way around, one should take C 12 > C 21, and furthermore it s natural that the magnitudes of the penalties should exceed those of the rewards. (4) We completed the utility specification by eliciting information from health experts in the U.S. and U.K, first to anchor C 21 to the cost of subjecting a good hospital to an unnecessary process audit and then to obtain ratios relating the other C lm to C 21. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 22

23 Utility Elicitation (continued) Since the utility structure we use is based on the idea that hospitals have to be treated in an all-or-nothing way in acting on the basis of their apparent quality, the approach taken was (i) to quantify the monetary loss L of incorrectly subjecting a good hospital to a detailed but unnecessary process audit and then (ii) to translate this from the hospital to the patient level. Rough correspondence may be made between left-hand part of contingency table above at patient level and hospital-level table with rows representing truth ( bad in row 1, good in row 2) and columns representing decision taken ( process audit in column 1, no process audit in column 2). Unnecessary process audits then correspond to cell (2, 1) in these tables (hospitals where a process audit is not needed will typically have an excess of patients who are predicted to die but actually live). Discussions with health experts in the U.S. and U.K. suggested that detailed process audits cost on the order of L =$5,000 per hospital (in late 1980s U.S. dollars), and RAND data indicated that the mean number of pneumonia patients per hospital per year in the U.S. at the time of the RAND quality of care study was Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 23

24 Utility Elicitation (continued) This fixed C 21 at approximately $5, = $69.6. Our health experts judged that C 12 should be the largest in absolute value of the C lm, and averaging across the expert opinions, expressed as orders of magnitude base 2, the elicitation results were C 12 C 21 = 2, C 11 C 21 = 1, and 2 C 22 C 21 = 1, finally yielding (C 8 11, C 12, C 21, C 22 ) = $(34.8, 139.2, 69.6, 8.7). The results in Fouskakis and Draper (2008) use these values; Draper and Fouskakis (2000) present a sensitivity analysis on the choice of the C lm which demonstrates broad stability of the findings when the utility values mentioned above are perturbed in reasonable ways. With the C lm in hand, the overall expected utility function to be maximized over I is then simply E [U(I)] = E [U D (I) + U P (I)], (10) where this expectation is over all possible cross-validation splits of the data. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 24

25 Results The number of possible cross-validation splits is far too large to evaluate the expectation in (10) directly; in practice we therefore use Monte Carlo methods to evaluate it, averaging over N random modeling and validation splits. Results. We explored this approach in two settings: a Small World created by focusing only on the p = 14 variables in the original RAND scale (2 14 = 16, 384 is a small enough number of possible models to do brute-force enumeration of the estimated expected utility of all models), and the Big World defined by all p = 83 available predictors (2 83. = is far too large for brute-force enumeration; we compared a variety of stochastic optimization methods including simulated annealing, genetic algorithms, and tabu search on their ability to find good variable subsets). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 25

26 Results: Small World Estimated Expected Utility Number of Variables The 20 best models included the same three variables 18 or more times out of 20, and never included six other variables; the five best models were minor variations on each other, and included 4 6 variables (last column in table on page 16). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 26

27 Approach (2) The best models save almost $8 per patient over the full 14-variable model; this would amount to significant savings if the observed-versus-expected assessment method were applied widely. Approach (2) (alternative cost-benefit) Maximizing expected utility, as in Approach (1) above, is a natural Bayesian way forward in this problem, but (a) the elicitation process was complicated and (b) the utility structure we examine is only one of a number of plausible alternatives, with utility framed from only one point of view; the broader question for a decision-theoretic approach is whose utility should drive the problem formulation. It s well known (e.g., Arrow, 1963; Weerahandi and Zidek, 1981) that Bayesian decision theory can be problematic when used normatively for group decision-making, because of conflicts in preferences among members of the group; in the context of the problem addressed here, it can be difficult to identify a utility structure acceptable to all stakeholders (including patients, doctors, hospitals, citizen watchdog groups, and state and federal regulatory agencies) in the quality-of-care-assessment process. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 27

28 Approach (2) (continued) As an alternative, in Approach (2) we propose a prior distribution that accounts for the cost of each variable and results in a set of posterior model probabilities which correspond to a generalized cost-adjusted version of the Bayesian information criterion (BIC). This provides a principled approach to performing a cost-benefit trade-off that avoids ambiguities in identification of an appropriate utility structure. Details. Bayesian parametric model comparison and variable selection are based on specifying a model m, its likelihood f(y θ m, m), the prior distribution of model parameters f(θ m m) and the corresponding prior model weight (or probability) f(m), where θ m is a parameter vector under model m and y is the data vector. Parametric inference is based on the posterior distribution f(θ m y, m), and quantifying model uncertainty by estimating the posterior model probability f(m y) is also an important issue. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 28

29 Parametric Model Comparison Hence, when we consider a set of competing models M = {m 1, m 2,, m M }, we focus on the posterior probability of model m M, defined as f(y m)f(m) f(m y) = m l M f(y m l)f(m l ) = 1 P O ml,m = B ml,m m l M f(m l ) f(m) 1, m l M (11) where P O mi,m j = f(m i y) is the posterior model odds and B f(m j y) m i,m j Bayes factor for comparing models m i and m j. is the When we limit ourselves in the comparison of only two models we typically focus on P O mi,m j and B mi,m j, which have the desirable property of insensitivity to the selection of the model space M. By definition the Bayes factor is the ratio of the posterior model odds over the prior model odds; thus large values of B mi,m j (usually greater than 12, say) indicate strong posterior support of model m i against model m j. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 29

30 Variable Selection in Logistic Regression The posterior model probabilities and integrated likelihoods f(y m i ) in (11) are rarely analytically tractable; we use a combination of Laplace approximations and Markov Chain Monte Carlo (MCMC) methodology to approximate posterior odds and Bayes factors. In the sickness-at-admission problem at issue here, we use a simple logistic regression model with response Y i = 1 if patient i dies and 0 otherwise. We further denote by X ij the sickness predictor variable j for patient i and by γ j an indicator, often used in Bayesian variable selection problems, taking the value 1 if variable j is included in the model and 0 otherwise; thus in this case M = {0, 1} p, where p is the total number of variables. In order to map the set of binary model indicators γ onto a model m we can use a representation of the form m(γ) = p i=1 2i 1 γ i. Hence the model formulation can be summarized as (Y i γ) [ ] pi (γ) η i (γ) = log 1 p i (γ) indep = Bernoulli[p i (γ)], p β j γ j X ij, (12) j=0 Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 30

31 Prior on Model Parameters η(γ) = X diag(γ) β = Xγ β γ, defining X i0 = 1 for all i = 1,..., n and γ 0 = 1 with prior probability one since here the intercept is always included in all models. Here p i (γ) is the death probability (which may be thought of as the sickness score) for patient i under model γ, η(γ) = [η 1 (γ),..., η n (γ)] T, γ = (γ 0, γ 1,..., γ p ) T, β = (β 0, β 1,..., β p ) T, and X = (X ij, i = 1,..., n; j = 0, 1,..., p); the vector β γ stands for the subvector of β which is included in the model specified by γ, i.e., β γ = (β i : γ i = 1, i = 0, 1,..., p), and is equivalent to the θ m vector defined above; similarly Xγ is the submatrix of X with columns corresponding to variables included in the model specified by γ. Prior on model parameters. We proceed in two steps: (1) First we build a prior on β that is a modified version of the unit information prior for this problem (to avoid Lindley s paradox); then (2) We adjust this prior for differences in marginal costs of variables. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 31

32 Sensitivity to Prior Variance Step (1). One important problem in Bayesian model evaluation using posterior model probabilities is their sensitivity to the prior variance of the model parameters: large variance of the β γ (used to represent prior ignorance) will increase the posterior probabilities of the simpler models considered in the model space M (Lindley s paradox). We address this issue by using ideas proposed by Ntzoufras et al. (2003): we use a prior distribution of the form with prior covariance matrix given by Σγ = n f(β γ γ) = N(µ γ, Σγ) (13) [ I(βγ)] 1, where n is the total sample size and I(β γ ) is the information matrix I(β γ ) = X T γ W γxγ; here W γ is a diagonal matrix which in the Bernoulli case takes the form W γ = diag {p i (γ)[1 p i (γ)]}. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 32

33 Unit Information Prior This is the unit information prior of Kass and Wasserman (1996), which corresponds to adding one data point to the data. Here we use this prior as a base, but we specify p i (γ) in the information matrix according to our prior information; in this manner we avoid (even minimal) reuse of the data in the prior. When little prior information is available, a reasonable prior mean for β γ is µ γ = 0. This corresponds to a prior mean on the log-odds scale of zero, from which a sensible prior estimate for all model probabilities is p i (γ) = 1/2; with this choice (13) becomes [ ( ) ] 1 f(β γ γ) = N 0, 4n X T γxγ. ( ) (14) This prior distribution can also be motivated by combining the idea of imaginary data with the power prior approach of Chen et al. (2000); it turns out that (14) introduces additional information to the posterior equivalent to adding one data point to the likelihood and therefore we support a priori the simplest model with a weight of one data point. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 33

34 Laplace Approximation Step (2). To introduce costs we again proceed in two sub-steps: (2a) First we specify a Laplace approximation (and the BIC approximation that corresponds to it) for the posterior model odds in our problem, using the prior in Step (1), and (2b) Then we see how to adjust the approximations in Step (2a) to account for cost differences among the variables. Step (2a). We denote by P O kl the posterior odds of model γ (k) versus model γ (l) ; then we have [ ] 2 log P O kl = 2 log f(γ (k) y) log f(γ (l) y). (15) Following the approach of Raftery (1996), we can approximate the posterior distribution of a model γ using the following Laplace approximation: 2 log f(γ y) = 2 log f(y β γ, γ) 2 log f( β γ γ) dγ log(2π) log Ψγ 2 log f(γ) + O(n 1 ), (16) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 34

35 Details where β γ is the posterior mode of f(β γ y, γ), dγ = p j=0 γ j is the dimension of the model γ, and Ψγ is minus the inverse of the Hessian matrix of h(β γ ) = log f(y β γ, γ) + log f(β γ γ) evaluated at the posterior mode β γ. Under the model formulation given by equation (12) and the prior distribution (14) we have that 1 Ψγ = 2 log f(y β γ, γ) β 2 2 log f(β γ γ) γ βγ = β β 2 γ γ βγ = β γ exp (Xγ,i β ) 1 γ = X T γ diag [ ( 1 + exp Xγ,i β )] n X γ, (17) γ where Xγ,i is row i of the matrix Xγ for i = 1,..., n. By substituting the prior (14) in expression (16) we get 2 log f(γ y) = 2 log f(y βγ, γ) + φ(γ) 2 log f(γ) + O(n 1 ), (18) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 35

36 Penalized Log Likelihood Ratio where φ(γ) = 1 4n β T γ X T γxγ β γ + dγ log(4n) + log Ψ 1 γ X T γxγ. (19) From the above expression it s clear that the logarithm of a posterior model probability can be regarded as a penalized log-likelihood evaluated at the posterior mode of the model, in which the term φ(γ) 2 log f(γ) can be interpreted as the penalty imposed upon the log-likelihood. In pairwise model comparisons, we can directly use the posterior model odds (15), which can now be written as { } f(y β γ (k), γ (k) ) ( 2 log P O kl = 2 log f(y β + φ γ (k)) φ γ (l), γ (l) ) (γ (l)) 2 log f(γ(k) ) f(γ (l) ) + O(n 1 ). (20) Therefore, the comparison of the two models is based on a penalized log-likelihood ratio, where the penalty is now given by ψ(γ (k), γ (l) ) = φ(γ (k) ) φ(γ (l) ) 2 log f(γ(k) ) f(γ (l) ). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 36

37 Decomposing the Penalty Term Each penalty term is divided into two parts: φ(γ) and 2 log f(γ). The first term, φ(γ), has its source in the marginal likelihood f(y γ) of model γ and can be thought of as a measure of discrepancy between the data and the prior information for the model parameters; the second part comes from the prior model probabilities f(γ). Indifference on the space of all models, usually expressed by the uniform distribution (i.e., f(γ) 1), eliminates the second term from the model comparison procedure, since the penalty term in (20) will then be based only on the difference of the first penalty terms φ(γ (k) ) φ(γ (l) ). For this reason the penalty term φ(γ) is the imposed penalty which appears in the penalized log-likelihood expression of the Bayes factor BF kl with a uniform prior on model space. A simpler but less accurate approximation of log P O kl can be obtained following the arguments of Schwartz (1978): Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 37

38 BIC Approximation [ ] f(y ˆβ γ (k), γ (k) ) 2 log P O kl = 2 log f(y ˆβ γ (l), γ (l) ) + ( ) d γ (k) d γ (l) log n 2 log f(γ(k) ) f(γ (l) ) + O(1) (21) = BIC kl 2 log f(γ(k) ) f(γ (l) ) + O(1), where BIC kl is the Bayesian Information Criterion for choosing between models γ (k) and γ (l) and ˆβ γ is vector of maximum likelihood estimates of β γ. Since BIC kl is an O(1) approximation, it might diverge from the exact value of the logarithm of the Bayes factor even for large samples; even so, it has often been shown to provide a reasonable measure of evidence (for finite n) and its straightforward calculation has encouraged its widespread use in practice. Step (2b). From the above argument and equations (18) and (20), it s clear that an additional penalty can be directly imposed on the posterior model probabilities and odds via the prior model probabilities f(γ). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 38

39 Cost Adjustment Therefore we may use prior model probabilities to induce prior preferences for specific variables depending on their costs. For this reason we propose to use prior model probabilities of the form [ ( ) f(γ j ) exp γ ( ) ] j cj c 0 log n for j = 1,..., p, (22) 2 where c j is the marginal cost per observation for variable X j and (as will be seen below) the desire for our approach to yield a cost-adjusted generalization of BIC compels the definition c 0 = min{c j, j = 1,..., p}. We further assume that the constant term is included in all models by 2 log f(γ) = j=1 c 0 specifying f(γ 0 = 1) = 1, resulting in p c j p [ ( )] γ j log n dγ log n + 2 log 1 + n 1 1 c j 2 c 0. (23) c 0 If all variables have the same cost or we re indifferent concerning the cost then we can set c j = c 0 for j = 1,..., p, which reduces to the uniform prior on model space (f(γ) 1) and posterior odds equal to the usual Bayes factor. j=1 Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 39

40 Cost Adjustment (continued) When comparing two models γ (k) and γ (l), the additional penalty imposed on the log-likelihood ratio due to the cost-adjusted prior model 2 log [ ] f(γ (k) ) f(γ (l) ) = = probabilities is given by p j=1 ( γ (k) j [ Cγ (k) C γ (l) c 0 ) γ (l) cj ( ) j log n d c γ (k) d γ (l) log n 0 ( d γ (k) d γ (l)) ] log n, (24) where Cγ = p j=1 γ jc j is the total cost of model γ; thus two models of the same dimension and cost will have the same prior weight. In the simpler case where we compare two nested models that differ only on the status of variable j, the prior model ratio simplifies to [ ] ( ) f(γj = 1, γ \j ) cj 2 log = 1 log n, (25) f(γ j = 0, γ \j ) c 0 where γ \j is the vector of γ excluding element γ j. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 40

41 Cost-Adjusted Laplace Approximation The above expression can be viewed as ( a prior ) penalty for including the cj variable j in the model, while the term c 0 1 can be interpreted as the proportional additional penalty imposed upon ( 2 log BF ) if the variable X j is included in the model due to its increased cost. Using the prior model odds (24) in the approximate posterior model odds (20) we obtain [ ] f(y βγ (k), γ (k) ) 2 log P O kl = 2 log f(y β + ψ(γ (k), γ (l) ) + O(n 1 ), (26) γ (l), γ (l) ) ψ(γ (k), γ (l) ) = 1 4n where the penalty term is given by ( βt γ (k)x T γ (k)x β γ (k) γ (k) β γ T (l)x T γ (l)x β γ (l) γ (l)) + ( ) d γ (k) d γ (l) log(4) + log log Ψ 1 γ (k) X T γ (k) X γ (k) Ψ 1 γ (l) X T γ (l) X γ (l) + C γ (k) C γ (l) c 0 log n. (27) Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 41

42 Cost-Adjusted BIC Finally we consider the BIC-based approximation (21) to the logarithm of the posterior model odds with the prior model odds (24), yielding ( ) 2 log P O kl = 2 log [ ] f(y ˆβγ (k), γ (k) ) f(y ˆβ γ (l), γ (l) ) + C γ (k) C γ (l) c 0 log n + O(1). (28) The penalty term dγ log n of model γ used in (21) has been replaced in the above expression by the cost-dependent penalty c 1 0 C γ log n; ignoring costs is equivalent to taking c j = c 0 for all j, yielding c 1 0 C γ = dγ, the original BIC expression. Therefore, we may interpret the quantity log n as the imposed penalty for each variable included in the model γ when no costs are considered (or when costs are equal). Moreover, this baseline penalty term is inflated proportionally to the cost ratio c j c 0 for each variable X j ; for example, if the cost of a variable X j is twice the minimum cost (c j = 2 c 0 ) then the imposed penalty is equivalent to adding two variables with the minimum cost. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 42

43 MCMC Implementation For this reason, (28) can be considered as a cost-adjusted generalization of BIC when prior model probabilities of type (22) are adopted. MCMC implementation. As noted earlier, in our quality of care study with p = 83 predictors there are on the order of possible models. In such situations, sampling algorithms will not be able to estimate posterior model probabilities with high accuracy in a reasonable amount of CPU time due to the large model space. For this reason, we implemented the following two-step method: (1) First we use a model search tool to identify variables with high marginal posterior inclusion probabilities f(γ j y), and we create a reduced model space consisting only of those variables whose marginal probabilities are above a threshold value. According to Barbieri and Berger (2004) this method of selecting variables based on their marginal probabilities may lead to the identification of models with better predictive abilities than approaches based on maximizing posterior model probabilities. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 43

44 MCMC Implementation (continued) Although Barbieri and Berger proposed 0.5 as a threshold value for f(γ j = 1 y), we used the lower value of 0.3, since our aim was only to identify and eliminate variables not contributing to models with high posterior probabilities. (2) Then we use a model search tool in the reduced model space to estimate posterior model probabilities (and the corresponding odds). To ensure stability of our findings we explored the use of two model search tools in step (1): A reversible-jump MCMC algorithm (RJMCMC), as implemented for variable selection in generalized linear models by Dellaportas et al. (2002) and Ntzoufras et al.(2003); and the MCMC model composition (MC 3 ) algorithm (Madigan and York, 1995). More specifically, we implemented reversible-jump moves within Gibbs for the model indicators γ j, by proposing the new model to differ from the current one in each step by a single term j with probability one. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 44

45 MCMC Implementation (continued) The algorithm can be summarized as follows: (1) For j = 1,..., p, use RJMCMC to compare the current model γ with the proposed one γ with components γ j = 1 γ j and γ k = γ k for k j with probability one; the updating sequence of γ j is randomly determined in each step. (2) For j = 0,..., p, if γ j = 1 then generate model parameters β j from the corresponding posterior distribution f(β j β \j, γ, y), otherwise set β j = 0. In our context the MC 3 algorithm may be summarized by the following steps: (1) For j = 1,..., p, propose a move from the current model γ to a new one γ with components γ j = 1 γ j and γ k = γ k for k j with probability one; the updating sequence of γ j is randomly determined in each step. (2) Accept the proposed model γ with probability [ ] α = min 1, f(γ y) = min ( ) 1, P Oγ,γ f(γ y). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 45

46 MCMC Implementation (continued) Since the posterior model odds P Oγ,γ used in MC3 are not analytically available here, we also explored two methods for calculating them approximating the acceptance probabilities with cost-adjusted Laplace (equation 26) and cost-adjusted BIC (equation 28) and in addition we further explored one additional form of sensitivity analysis: initializing the MCMC runs at the null model (with no predictors) and the full model (with all predictors). All of this was done both for the benefit-only analysis (specified by setting all variable costs equal) and the cost-benefit approach. In moving from the full to the reduced model space to implement step (1) of our two-step method, for both the benefit-only and cost-benefit analyses we found a striking level of agreement across (a) the two model search tools, (b) the two methods to approximate the acceptance probabilities in MC 3, and (c) the two choices for initializing the MCMC runs in the subset of variables defining the reduced model space; this made it unnecessary to perform a similar sensitivity analysis in step (2). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 46

47 Results Results are therefore presented below only for RJMCMC (starting from the full model). Convergence of the RJMCMC algorithm was checked using ergodic mean plots of the marginal inclusion probabilities for the full model space and the posterior model probabilities for the reduced space. In what follows we refer to the cost-benefit results as RJMCMC, but we could equally well have used the term MC 3 with cost-adjusted BIC (or just cost-adjusted BIC for short), because the results from the two methods were in such close agreement. Results. The table below presents the marginal posterior probabilities of the variables that exceeded the threshold value of 0.30, in each of the benefit-only and cost-benefit analyses, together with their data collection costs (in minutes of abstraction time rather than US$), in the Big World of all 83 predictors. In both the benefit-only and cost-benefit situations our methods reduced the initial list of p = 83 available candidates down to 13 predictors. Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 47

48 Results (continued) Marginal Posterior Probabilities Variable Analysis Index Name Cost Benefit-Only Cost-Benefit 1 SBP Score Age Blood Urea Nitrogen Apache II Coma Score Shortness of Breath Day 1? Septic Complications? Initial Temperature Heart Rate Day Chest Pain Day 1? Cardiomegaly Score Hematologic History Score Apache Respiratory Rate Score Admission SBP Respiratory Rate Day Confusion Day 1? Apache ph Score Morbid + Comorbid Score Musculoskeletal Score Note that the most expensive variables with high marginal posterior probabilities in the benefit-only analysis were absent from the set of promising variables in the cost-benefit analysis (e.g., Apache II Coma Score). Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 48

49 Results (continued) Common variables in both analyses: X 1 + X 2 + X 3 + X 5 + X 12 + X 70 Benefit-Only Analysis Common Variables Additional Model Posterior k Within Each Analysis Variables Cost Probabilities P O 1k 1 X 4 + X 15 + X 37 + X 73 +X 8 +X 27 +X X 8 +X X X 27 +X Cost-Benefit Analysis Common Variables Additional Model Posterior k Within Each Analysis Variables Cost Probabilities P O 1k 1 X 46 + X 51 +X 49 +X X 14 +X 49 +X X 13 +X 49 +X X 13 +X 14 +X 49 +X X 14 +X X X 37 +X X 13 +X 14 +X X Introduction to Bayesian Data Analysis 5: Biostatistical applications of Bayesian decision theory 49

Bayesian Decision Theory in Biostatistics

Bayesian Decision Theory in Biostatistics Bayesian Decision Theory in Biostatistics David Draper (joint work with Dimitris Fouskakis, Ioannis Ntzoufras and Ken Pietz) Department of Applied Mathematics and Statistics University of California, Santa

More information

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care

Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Incorporating cost in Bayesian variable selection, with application to cost-effective measurement of quality of health care Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical

More information

arxiv: v1 [stat.ap] 17 Aug 2009

arxiv: v1 [stat.ap] 17 Aug 2009 The Annals of Applied Statistics 2009, Vol. 3, No. 2, 663 690 DOI: 10.1214/08-AOAS207 c Institute of Mathematical Statistics, 2009 arxiv:0908.2313v1 [stat.ap] 17 Aug 2009 BAYESIAN VARIABLE SELECTION USING

More information

BAYESIAN VARIABLE SELECTION USING COST-ADJUSTED BIC, WITH APPLICATION TO COST-EFFECTIVE MEASUREMENT OF QUALITY OF HEALTH CARE

BAYESIAN VARIABLE SELECTION USING COST-ADJUSTED BIC, WITH APPLICATION TO COST-EFFECTIVE MEASUREMENT OF QUALITY OF HEALTH CARE The Annals of Applied Statistics 2009, Vol. 3, No. 2, 663 690 DOI: 10.1214/08-AOAS207 Institute of Mathematical Statistics, 2009 BAYESIAN VARIABLE SELECTION USING COST-ADJUSTED BIC, WITH APPLICATION TO

More information

Bayesian Model Specification: Toward a Theory of Applied Statistics

Bayesian Model Specification: Toward a Theory of Applied Statistics Bayesian Model Specification: Toward a Theory of Applied Statistics David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Penalized Loss functions for Bayesian Model Choice

Penalized Loss functions for Bayesian Model Choice Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayesian Modeling, Inference, Prediction and Decision-Making

Bayesian Modeling, Inference, Prediction and Decision-Making Bayesian Modeling, Inference, Prediction and Decision-Making 1: Background and Basics David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz Short Course (Days

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Bayesian Modeling, Inference, Prediction and Decision-Making

Bayesian Modeling, Inference, Prediction and Decision-Making Bayesian Modeling, Inference, Prediction and Decision-Making 2: Exchangeability and Conjugate Modeling David Draper (draper@ams.ucsc.edu) Department of Applied Mathematics and Statistics University of

More information

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Statistical Methods in Particle Physics Lecture 1: Bayesian methods Statistical Methods in Particle Physics Lecture 1: Bayesian methods SUSSP65 St Andrews 16 29 August 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Multiple QTL mapping

Multiple QTL mapping Multiple QTL mapping Karl W Broman Department of Biostatistics Johns Hopkins University www.biostat.jhsph.edu/~kbroman [ Teaching Miscellaneous lectures] 1 Why? Reduce residual variation = increased power

More information

Bayesian Inference in Astronomy & Astrophysics A Short Course

Bayesian Inference in Astronomy & Astrophysics A Short Course Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning

More information

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models Ioannis Ntzoufras, Department of Statistics, Athens University of Economics and Business, Athens, Greece; e-mail: ntzoufras@aueb.gr.

More information

Case Studies in Bayesian Data Science

Case Studies in Bayesian Data Science Case Studies in Bayesian Data Science 4: Optimal Bayesian Analysis in Digital Experimentation David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz Short Course

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Bayesian Inference: What, and Why?

Bayesian Inference: What, and Why? Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

AMS 206: Bayesian Statistics. 2: Exchangeability and Conjugate Modeling

AMS 206: Bayesian Statistics. 2: Exchangeability and Conjugate Modeling AMS 206: Bayesian Statistics 2: Exchangeability and Conjugate Modeling David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ams.ucsc.edu www.ams.ucsc.edu/

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box 90251 Durham, NC 27708, USA Summary: Pre-experimental Frequentist error probabilities do not summarize

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

(1) Introduction to Bayesian statistics

(1) Introduction to Bayesian statistics Spring, 2018 A motivating example Student 1 will write down a number and then flip a coin If the flip is heads, they will honestly tell student 2 if the number is even or odd If the flip is tails, they

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Predicting AGI: What can we say when we know so little?

Predicting AGI: What can we say when we know so little? Predicting AGI: What can we say when we know so little? Fallenstein, Benja Mennen, Alex December 2, 2013 (Working Paper) 1 Time to taxi Our situation now looks fairly similar to our situation 20 years

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

CONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE 1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,

More information

Probability, Entropy, and Inference / More About Inference

Probability, Entropy, and Inference / More About Inference Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference

More information

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract

Illustrating the Implicit BIC Prior. Richard Startz * revised June Abstract Illustrating the Implicit BIC Prior Richard Startz * revised June 2013 Abstract I show how to find the uniform prior implicit in using the Bayesian Information Criterion to consider a hypothesis about

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

Uni- and Bivariate Power

Uni- and Bivariate Power Uni- and Bivariate Power Copyright 2002, 2014, J. Toby Mordkoff Note that the relationship between risk and power is unidirectional. Power depends on risk, but risk is completely independent of power.

More information

A Bayesian Approach to Phylogenetics

A Bayesian Approach to Phylogenetics A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

Post-Selection Inference

Post-Selection Inference Classical Inference start end start Post-Selection Inference selected end model data inference data selection model data inference Post-Selection Inference Todd Kuffner Washington University in St. Louis

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation

Contents. Decision Making under Uncertainty 1. Meanings of uncertainty. Classical interpretation Contents Decision Making under Uncertainty 1 elearning resources Prof. Ahti Salo Helsinki University of Technology http://www.dm.hut.fi Meanings of uncertainty Interpretations of probability Biases in

More information

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation

Biost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Group Sequential Designs: Theory, Computation and Optimisation

Group Sequential Designs: Theory, Computation and Optimisation Group Sequential Designs: Theory, Computation and Optimisation Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj 8th International Conference

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke

Bayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke State University of New York at Buffalo From the SelectedWorks of Joseph Lucke 2009 Bayesian Statistics Joseph F. Lucke Available at: https://works.bepress.com/joseph_lucke/6/ Bayesian Statistics Joseph

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Principles of Statistical Inference

Principles of Statistical Inference Principles of Statistical Inference Nancy Reid and David Cox August 30, 2013 Introduction Statistics needs a healthy interplay between theory and applications theory meaning Foundations, rather than theoretical

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Causal Inference with Big Data Sets

Causal Inference with Big Data Sets Causal Inference with Big Data Sets Marcelo Coca Perraillon University of Colorado AMC November 2016 1 / 1 Outlone Outline Big data Causal inference in economics and statistics Regression discontinuity

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017 CPSC 340: Machine Learning and Data Mining MLE and MAP Fall 2017 Assignment 3: Admin 1 late day to hand in tonight, 2 late days for Wednesday. Assignment 4: Due Friday of next week. Last Time: Multi-Class

More information

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru

More information

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Equivalence of random-effects and conditional likelihoods for matched case-control studies

Equivalence of random-effects and conditional likelihoods for matched case-control studies Equivalence of random-effects and conditional likelihoods for matched case-control studies Ken Rice MRC Biostatistics Unit, Cambridge, UK January 8 th 4 Motivation Study of genetic c-erbb- exposure and

More information

Whaddya know? Bayesian and Frequentist approaches to inverse problems

Whaddya know? Bayesian and Frequentist approaches to inverse problems Whaddya know? Bayesian and Frequentist approaches to inverse problems Philip B. Stark Department of Statistics University of California, Berkeley Inverse Problems: Practical Applications and Advanced Analysis

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Case Studies in Bayesian Data Science

Case Studies in Bayesian Data Science Case Studies in Bayesian Data Science 4: The Bootstrap as an Approximate BNP Method David Draper Department of Applied Mathematics and Statistics University of California, Santa Cruz draper@ucsc.edu Short

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Empirical Bayes, Hierarchical Bayes Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 5: Due April 10. Project description on Piazza. Final details coming

More information

Frequentist Statistics and Hypothesis Testing Spring

Frequentist Statistics and Hypothesis Testing Spring Frequentist Statistics and Hypothesis Testing 18.05 Spring 2018 http://xkcd.com/539/ Agenda Introduction to the frequentist way of life. What is a statistic? NHST ingredients; rejection regions Simple

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics David Giles Bayesian Econometrics 1. General Background 2. Constructing Prior Distributions 3. Properties of Bayes Estimators and Tests 4. Bayesian Analysis of the Multiple Regression Model 5. Bayesian

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Just Enough Likelihood

Just Enough Likelihood Just Enough Likelihood Alan R. Rogers September 2, 2013 1. Introduction Statisticians have developed several methods for comparing hypotheses and for estimating parameters from data. Of these, the method

More information

Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

More information