Bayes Testing and More

Size: px
Start display at page:

Download "Bayes Testing and More"

Transcription

1 Bayes Testing and More STA 732. Surya Tokdar Bayes testing The basic goal of testing is to provide a summary of evidence toward/against a hypothesis of the kind H 0 : θ Θ 0, for some scientifically important subset Θ 0 of the parameter space Θ. For a data model X f(x θ), θ Θ, a Bayesian would start by specifying a prior pdf π(θ) for θ. The prior then combines with the data X = x to produce a posterior pdf π(θ x) for θ. At this stage, we can simply summarize the evidence toward H 0 by P (H 0 x) = Pr(θ Θ 0 X = x) = π(θ x)dθ Θ 0 and the evidence against H 0 is simply 1 P (H 0 x). This probability represents our updated belief about the statement H 0. If a reject/accept H 0 type decision is indeed warranted, then we could do it by subjecting P r(θ Θ 0 X = x) to a cut-off of our choice. That is, we reject H 0 if Pr(θ Θ 0 X = x) < k for some (positive) cut-off k. How do we choose this cut-off? Loss function To guide the choice of a cut-off, we need to carefully think about the consequences of our decisions. We now have to pretend that θ is going to be observed (in future) and our decision is going to be checked against the observed value. If the decision matches the observed value, we incur no penalty, otherwise we are penalized a positive amount. Let d 0 denote we decide θ Θ 0 and d 1 denote we decide θ Θ 0. Then we incur a penalty if we go for d 0 and the observed θ turns out to be in Θ \ Θ 0, or if we go for d 1 and θ turns out to be in Θ 0. These two penalties can potentially differ in the amount we lose. This is expressed in the following loss table: θ Θ 0 θ Θ \ Θ 0 d 0 0 w 0 d 1 w 1 0 If we denote by loss(d, θ) the loss incurred when we go for a decision d {d 0, d 1 } and the parameter value is later observed to be θ, then loss(d 0, θ) = 0, θ Θ 0, loss(d 0, θ) = w 0, θ Θ \ Θ 0, loss(d 1, θ) = w 1, θ Θ 0, loss(d 1, θ) = 0, θ Θ \ Θ 0. 1

2 Therefore the posterior expected loss of a decision d r(d) = E[loss(d, θ) X = x] = loss(d, θ)π(θ x)dθ can be simplified to r(d 0 ) = w 0 π(θ x)dθ = w 0 Pr(θ Θ \ Θ 0 X = x) Θ\Θ 0 r(d 1 ) = w 1 Θ 0 π(θ x)dθ = w 1 Pr(θ Θ 0 X = x). If we go for the decision that minimizes our posterior expected loss, then we are committed to reject H 0 if (and only if) r(d 1 ) < r(d 0 ) Pr(θ Θ 0 X = x) Pr(θ Θ \ Θ 0 X = x) < w 0 w 1 Pr(θ Θ 0 X = x) < w 0 w 0 + w 1 the last equivalence follows from the fact that Pr(θ Θ\Θ 0 X = x) = 1 Pr(θ Θ 0 X = x). Tying back to the preceding section, we see that the cut-off k = w 0 /(w 0 + w 1 ) is determined by the relative gravity of the two possible mistakes we can make. Notice that the above approach starkly differs from the controlling errors foundation of the classical testing procedures. In the Bayesian setting, once the post-data belief about θ is expressed by the posterior π(θ x), the actual decisions are entirely based on expected costs associated with the two decisions where expectations are evaluated via π(θ x). Unlike the classical setting, there is no frequentist guarantee that s sought here. Issues with testing point nulls Consider the statistical analysis done by Laplace on female birthrate. He had modeled X =number of female births among n births as X Bin(n, p) with p Unif(0, 1) = Be(1, 1). The observed data were n = and X = which lead to the posterior pdf Be(249146, ). For testing H 0 : p 0.5 against H 1 : p < 0.5 Laplace would report Pr(p 0.5) = One can argue that what Laplace really wanted to study was whether H 0 : p = 0.5 against H 1 : p 0.5. This presents a unique challenge. Because p is modeled with a pdf over [0, 1], the posterior is also a pdf over [0, 1] and hence Pr(p = 0.5 X = x) = Pr(p = 0.5) = 0. Note that this zero does not reflect that the posterior concentrates away from p = 0.5. It is simply an artifact of our prior on p which treats p as a continuous random variable, and so the probability of any single value is simply zero. There are a couple of different ways to go about this. 2

3 Bayesian tail area probability The goal of testing a point null H 0 : θ = θ 0 can be interpreted as judging the plausibility of a special value θ 0 (i.e., for female birth rate p = 0.5 is special because it captures equal odds). This can be effectively done by communicating how central θ 0 is to the posterior pdf π(θ x). We could look at all 100(1 α)%, equal-tail, posterior credible intervals for θ [given by the α/2 and (1 α/2)th posterior quantiles of θ] and check what is the largest value of α for which this includes θ 0. This limiting α value is simply 2 min(pr(θ > θ 0 X = x), P (θ < θ 0 X = x)). If this summary is close to zero, it reflects that θ 0 is far out in the tails of the π(θ x) pdf. I refer to the above number a Bayesian tail area probability that quantifies evidence in support of H 0 [with obvious analogy to p-values for classical testing.] Ignorance range Some statisticians contest the basic premise of a point null, arguing that it gives an extreme abstraction of a range of interesting values. That is, with H 0 : θ = θ 0 we perhaps want to capture H 0 : θ θ 0 < d for some small positive number d. Thus one could instead report P ( θ θ 0 < d X = x) for all (interesting) d > 0. The best way to report this would be to make a plot P ( θ θ 0 < d X = x) as a function of d > 0. Formal testing There is in fact one other way to approach the point null testing problem. It requires using a prior distribution that recognizes that θ 0 is a special value and assigns it a positive probability. For female birthrate, this can be achieved if we describe p as follows: Pr(p = 0.5) = p 0, p [p 0.5] π 1 (p). The above indeed defines a random variable p which takes values in [0, 1], but it is described by a mixture of a point mass at 0.5 and a pdf over [0, 1]. In fact one can write the prior pdf of p as: π(p) = p 0 δ 0.5 (p) + (1 p 0 )π 1 (p) where δ a (x) denotes the Kronecker Delta function (δ a (x) = 1 if x = a, and is zero otherwise). This leads to the following calculation of posterior pdf π(p x) = const p x (1 p) n x π 1 (p) = p 0 (x)δ 0.5 (p) + (1 p 0 (x))π 1 (p x) where π 1 (p x) = const p x (1 p) n x π 1 (p) and p 0 (x) = p 0 p px (1 p) n x π 1 (p)dp (0.5) n 3

4 Notice that Pr(p = 0.5 X = x) is precisely p 0 (x). And therefore we could report p 0 (x) as a summary of evidence in support of H 0, as it precisely gives P (H 0 x). However, such a formal framework for hypothesis testing is not universally accepted. A major concern being the use of a drastically different prior on θ than what one would have used if only a credible interval was to be reported. The difference in the choice of prior can have a pronounced effect on the posterior inference. The difference is often stark when apparently low-information priors are used for both cases. See the next example [known as Lindley s paradox]. Example. Imagine a city where 49,581 boys and 48,870 girls have been born over a certain period of time. The number of female births X is modeled with X Bin(n, p), with n = and p [0, 1]. For the non-informative choice π(p) = Unif(0, 1) we get P (p 0.5 X = 49581, n = 98451) = 0.012, and so a Bayesian tail area probability is = 0.024, indicating moderately strong evidence against H 0. For a lowinformation point-null prior with p 0 = 0.5 and π 1 (p) = Unif(0, 1), we get p 0 (x) = 0.95, indicating rather strong evidence toward H 0. Several points are to be noted here. Under the formal Bayes approach, the continuous part of the posterior distribution is still Be(49582, 48871) which puts only probability to p being 0.5 or smaller. So the continuous part supports p being fairly different from 0.5, whereas the discrete part assigns a 95% posterior probability to p = 0.5. This is to be interpreted as it is fairly likely that p = 0.5, but if it is not, then it is likely to be substantially different from 0.5. A useful graphical summary of this is as follows, where a vertical bar shows the posterior probability of H 0 and the curve shows π 1 (p x) scaled appropriately One has to critically judge the role of the point null hypothesis toward the scientific goal of the study. In the female birthrate example above, if we instead tested for H 0 : p = 0.51 with π(p) = 0.5 δ 0.51 (p) Unif(0, 1) we will come up with a very similar conclusion: it is fairly likely that p = 0.51, but if it is not, then it is likely to be substantially different from The two conclusions 4

5 are conflicting. If testing for p = 0.51 is not deemed dramatically different than testing for p = 0.5, then none of the two point null hypotheses makes sense. Both are fake nulls which can lead to misleading and conflicting answers. We could write p 0 (x) 1 p 0 (x) = p 0 f(x p = 0.5) 1 p 0 f(x p)π1 (p)dp. The second ratio on the right, called the Bayes factor (more below) gives the ratio between the conditional data pdfs under the null [p = 0.5] and the alternative [p π 1 (p)] models. The fact that under the alternative model p is very likely to be different from p = 0.5 does not mean that the alternative model is in better agreement with the observed data than the null model. The point is, posterior summaries under an assumed model carry little information on how well the model fits the data. A related point is that Bayes tail area probabilities may lose relevance if a point null is deemed important and may produce very different answers than a formal Bayes analysis. Another related point is that while flat priors offer a lot of fidelity to any observed data, with the posterior being determined mostly by the likelihood function, they also carry little support to any observed data when compared against a more precise model. One needs extra caution when comparing a flat prior model with a precise prior model. Model comparison and Bayes factor In the point-null approach, we actually considered two different models: M 0 : X f(x θ 0 ) M 1 : X f(x θ), θ π 1 (θ) along with prior model probabilities, P (M 0 ) = p 0 and P (M 1 ) = 1 p 0. The quantity p 0 (x) is precisely p 0 (x) = P (M 0 x). This setting generalizes to a more complex framework with potentially many models: M 1 : X f 1 (x θ 1 ), θ 1 π 1 (θ 1 ), θ 1 Θ 1 M 2 : X f 2 (x θ 2 ), θ 2 π 2 (θ 2 ), θ 2 Θ 2. M k : X f k (x θ k ), θ k π k (θ k ), θ k Θ k 5

6 where each model can have its own distinct family of pdfs/pmfs with different parameters living on different spaces. The specification is completed by attaching prior model probabilities: P (M 1 ) = p 1,, P (M k ) = p k with p i 0 and i p i = 1. Bayes rule gives that the posterior probability of model M j is P (M j X = x) = p j (x) = p i Θ j f j (x θ j )π j (θ j )dθ j k i=1 p i Θ i f i (x θ i )π i (θ i )dθ i and the conditional posterior distribution of θ j under model M j is π j (θ j x) = f j (x j θ j )π j (θ j ) Θ j f j (x j θ j )π j (θ j )dθ j. Bayes factor The posterior odds of model M i to model M j is p i (x) p j (x) = p i Θ i f i (x θ i )π i (θ i )dθ i = p i BF ij (x) p j Θ j f j (x θ j )π j (θ j )dθ j p j where BF ij (x), called Bayes factor of M i to M j is the ratio of the marginal likelihoods of the two models. Many people prefer reporting the Bayes factor to the posterior odds, as the former does not depend on the prior odds. Any reader can multiply the reported Bayes factor with her prior odds to obtain her odds of posterior probabilities. Marginal likelihood calculations If X f(x θ), θ π(θ) is a conjugate model then the marginal likelihood f(x) = f(x θ)π(θ)dθ can be calculated in closed form [this is really the normalizing constant Θ in π(θ x) = f(x θ)π(θ)/f(x)]. For example, if X Bin(n, p) and p bet(a, b), then f(x) = ( ) n 1 x 0 p x (1 p) n x px (1 p) n x dp = B(a, b) ( ) n B(a + x, b + n x). x B(a, b) An alternative way to calculate the marginal likelihood is this nifty trick: f(x) = f(x θ )π(θ ) π(θ x) at every θ where the posterior pdf is positive. In the binomial model above, I could use the following code to get f(x) in log-scale (always preferred due to numerical stability) 6

7 p.star <- (x + a) / (n + a + b) log.f.x <- (dbinom(x, n, p.star, log = TRUE) + dbeta(p.star, a, b, log = TRUE) - dbeta(p.star, a + x, b + n - x, log = TRUE)) For a non-conjugate model, calculation of the marginal likelihood is a fairly challenging task, usually more challenging than sampling θ from the posterior π(θ x). Common numerical techniques include quadrature (when dim(θ) is small), or stochastic calculation based on importance sampling Monte Carlo frequently coupled with sequential sampling strategies [see Tokdar and Kass (2010).] The idea of importance sampling is (k) IID to find an importance density q(θ) on Θ and with samples θ q(θ), k = 1,, M approximate f(x) by ˆf(x) = 1 M k f(x θ (k) )π(θ (k) ). q(θ (k) ) This works because by SLLN, ˆf(x) f(x θ)π(θ)dθ provided q(θ) > 0 at every θ where f(x θ)π(θ) > 0. However, the variance of this Monte Carlo estimate could be extremely huge if q(θ) looks very different from π(θ x), and you will need a very very large M to get a reliable answer. A simple technique that works for standard regular models is as follows. 1. Run an optimizer on the log-posterior to find the posterior mode ˆθ and the hessian H (curvature of log π(θ x) = const l x (θ) log π(θ) at θ = ˆθ). 2. By Bernstein-von Mises theorem, π(θ x) N(ˆθ, H 1 ). This itself could be a good choice, except that normal distributions have tails that decay quickly. If the posterior pdf is slightly heavier, then again the importance estimate will have a high variance. 3. Instead it is recommended to take q(θ) = t ν (ˆθ, ah 1 ), the multivariate t pdf with a modest df ν (3 is a good small choice which guarantees two finite moments) and a scaling a > 1 appropriate to cover the range of the posterior pdf (a = 5 should suffice in most cases). See the code at the end of the handout. Improper prior In the birthrate example above, with the point-null model, we used a Unif(0, 1) prior on p given p 0.5. What happens if we used a uniform prior on log p? Recall that 1 p this corresponds to the improper Be(0, 0) prior on p with pdf π 1 (p) = c/{p(1 p)}, with c arbitrary. For our data with x = > 0 and n x = > 0 the resulting posterior is a proper Be(49518, 48870) pdf. But, p 0 (x) = c B(49581, 48870) 7

8 which depends on the choice of c. This is a common problem with using improper priors for comparing models, though some solutions now exist in the literature (see Berger and Pericci 1996 for reporting the intrinsic Bayes factor while testing with improper priors). Multiple Testing Although multiple testing may refer to many different statistical inference problems, we restrict ourselves to situations where a moderate to large number of related hypotheses are to be tested together. Two common situations are large scale significance testing (e.g., in microarray studies) and variable selection in linear regression models. Most large scale significance testing can be conceptualized as follows: we have IND data X 1,, X m on m objects (say genes) which are modeled as X i N(µ i, σ 2 ) and it is desired to test which of the means µ i are non-zero. In Gaussian linear regression of the form Y i = α + zi T IID β + ϵ i, ϵ i N(0, σ 2 ), it may be desired to determine which of the coordinates of β are non-zero. In this set of notes, I will only discuss the large scale significance testing problem. Similar concerns and concepts apply to regression (HW 4). Two excellent papers on these issues are Scott and Berger (2006) [ and Scott and Berger (2010) [ For large scale significance testing, either from the classical or the Bayesian perspective, a foundational point has been to treat the m separate cases not in isolation (as you d do for IID cases) but in unison within a framework of exchangeability. The idea is to learn from all cases even though separate decisions are to be taken on each. This concept underlies all modern classical multiple testing approaches based on false discovery rate and its variants, e.g., the method by Benjamini and Hochberg (2005). Scott and Berger (2006) recommends the following Bayesian approach. Conditionally on σ 2, assign the following product prior on (µ 1,, µ m ) determined by a common null propensity parameter p and a non-zero mean spread V : µ i (σ 2, p, V ) IND pδ 0 (µ i ) + (1 p)n(µ i 0, V ), i = 1,, m, and assign these new parameters the following prior (p, V ) σ 2 ap a 1 1 σ 2 (1 + V/σ 2 ) 2. In some cases σ may be assumed known, otherwise assign a default prior σ 2 = 1/σ 2. For the conditional prior on p, a default choice could be a = 1 leading to the uniform pdf. In many situations only a small proportion of cases are expected to be non-zero, and so one could choose a large value of a to reflect such prior belief. The posterior 8

9 probability of a zero mean is: p i := P (µ i = 0 x) [ = p p V/σ 2 exp { } ] x 2 1 i V 2σ 2 (σ 2 + V ) π(p, V, σ 2 x) dp dv dσ 2 and the integral is easily evaluated by an efficient importance sampling Monte Carlo. Berger [ provides the following toy example to illustrate why such a prior choice makes sense for the large scale significance testing problem. Assume σ = 1 is known. Consider the following ten signal observations: 8.48, 5.43, 4.81, 2.64, 2.40, 3.32, 4.07, 4.81, 5.81, Next, generate n = 10, 50, 500, and 5000 N(0, 1) noise observations. Mix them together and try to identify the signals. Here are results from such an experiment: The ten signal observations #noise n p i < Clearly, the joint analysis provides a multiplicity adjustment, the same signals are deemed weaker when lots of noise observations are added. In contrast, if one treated the cases independently, with the following prior for the i-th case: µ i π 0 δ 0 (µ i ) + (1 π 0 )N(µ i 0, V i ), V i σ 2 1/{σ 2 (1+V i /σ 2 )}, then a the number of noise observations with p i < 0.6 will have grown linearly in n. One final point about this. The I used the model of Scott and Berger (2006) to illustrate the issue of multiplicity and the need of a joint (hierarchical) model. There are other important issues that one needs to care about in large scale significance testing. Brad Efron has a series of interesting work on this (with his two groups model). I did some work on this incorporating non-parametric Bayes within Efron s frameowrk ( P-value calibration In formal Bayesian testing, we are able to quantify in p(h 0 x) our certainty about H 0 (modulo the prior and the data). In classical testing, on often use the p-value to reflect strength of evidence against H 0. How do these two measures compare? An excellent read on this is Berger (Stat Sci, 2003, ss/ ) who summarizes a long series of work by Berger, Wolpert, Sellke, Delampadi, Bayarri and others. 9

10 Berger (2003) argues almost in every common situation, a p-value carries less evidence against H 0 than what its numerical value suggests. A p-value = 0.05 usually reflects a chance for H 0 and at worst at least 25% chance of H 0 (under equal prior odds). Recall that for a family classical tests based on a single test statistics T (x) and all possible thresholds c, the p-value is p(x) = max θ Θ0 P [X θ] (T (X) > T (x)) where the maximum is usually attained at some fixed point θ 0 Θ 0. So under the null, p(x) Unif(0, 1), and under the alternative one will expect p(x) to be smaller. Thus one could formulate a testing problem on p(x) itself: H 0 : p(x) Unif(0, 1) and H 0 : p(x) f(x) where f(x) is a pdf on [0, 1] concentrated around 0. Sellke, Berger and Bayarri (Am Stat 2001) consider various reasonable non-parametric choices of f(x) and show that they yield o a lower bound on the Bayes factor B 01 (p) e p log p and so a lower bound on the posterior null probability P (H 0 x) (1+[ e p log p] 1 ) 1. The main reason behind this discrepancy between the numeric value p(x) = p and the lower bound on P (H 0 x) is as follows. If we had only observed p(x) < p then the lower bound will indeed be p(h 0 x) p. But we do get to see a precise value p for p(x) and the information p(x) = p is very different (and less harsh on H 0 ) than the information p(x) < p. The lower bound calibrates the strength of evidence against H 0 based on p(x) = p. 10

11 ## a function to calculate negative loglik + negative log.prior neg.lp <- function(theta,...) {} ## this return log(sum(exp(lx))), but is numerically more stable logsum <- function(lx) return(max(lx) + log(sum(exp(lx - max(lx))))) ## the importance sampling log f(x) calculator imp <- function(neg.lp, theta.start, nu = 3, a = 5, nsamp = 1e4){ d <- length(theta.start) op1 <- optim(theta.start, neg.lp) op <- nlm(neg.lp, op1$par, hessian = TRUE) theta.hat <- op$est H <- op$hessian / a R <- chol(h) u.samp <- rgamma(n.samp, nu / 2, 1 / 2) u.mat <- outer(rep(1, d), u.samp) z.samp <- matrix(rnorm(d * n.samp), nrow = d) theta.samp <- (theta.hat + backsolve(r, z.samp, transpose = TRUE) * sqrt(nu / u.mat)) log.wt <- (-apply(theta.samp, 2, neg.lp) - log(sum(diag(r))) * log1p(colsums(z.samp^2) / nu)) return(logsum(log.wt) - log(n.samp)) } 11

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Statistical Inference: Maximum Likelihood and Bayesian Approaches Statistical Inference: Maximum Likelihood and Bayesian Approaches Surya Tokdar From model to inference So a statistical analysis begins by setting up a model {f (x θ) : θ Θ} for data X. Next we observe

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Spring, 2006 1. DeGroot 1973 In (DeGroot 1973), Morrie DeGroot considers testing the

More information

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007 Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Part 2: One-parameter models

Part 2: One-parameter models Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 20 Lecture 6 : Bayesian Inference

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

Bayesian Econometrics

Bayesian Econometrics Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

ST 740: Model Selection

ST 740: Model Selection ST 740: Model Selection Alyson Wilson Department of Statistics North Carolina State University November 25, 2013 A. Wilson (NCSU Statistics) Model Selection November 25, 2013 1 / 29 Formal Bayesian Model

More information

Bernoulli and Poisson models

Bernoulli and Poisson models Bernoulli and Poisson models Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes rule

More information

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW

A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW A CONDITION TO OBTAIN THE SAME DECISION IN THE HOMOGENEITY TEST- ING PROBLEM FROM THE FREQUENTIST AND BAYESIAN POINT OF VIEW Miguel A Gómez-Villegas and Beatriz González-Pérez Departamento de Estadística

More information

Bayesian Inference for Normal Mean

Bayesian Inference for Normal Mean Al Nosedal. University of Toronto. November 18, 2015 Likelihood of Single Observation The conditional observation distribution of y µ is Normal with mean µ and variance σ 2, which is known. Its density

More information

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33

Hypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33 Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

Bayesian Inference: Concept and Practice

Bayesian Inference: Concept and Practice Inference: Concept and Practice fundamentals Johan A. Elkink School of Politics & International Relations University College Dublin 5 June 2017 1 2 3 Bayes theorem In order to estimate the parameters of

More information

Why Try Bayesian Methods? (Lecture 5)

Why Try Bayesian Methods? (Lecture 5) Why Try Bayesian Methods? (Lecture 5) Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ p.1/28 Today s Lecture Problems you avoid Ambiguity in what is random

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017

Bayesian philosophy Bayesian computation Bayesian software. Bayesian Statistics. Petter Mostad. Chalmers. April 6, 2017 Chalmers April 6, 2017 Bayesian philosophy Bayesian philosophy Bayesian statistics versus classical statistics: War or co-existence? Classical statistics: Models have variables and parameters; these are

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Chapter 5. Bayesian Statistics

Chapter 5. Bayesian Statistics Chapter 5. Bayesian Statistics Principles of Bayesian Statistics Anything unknown is given a probability distribution, representing degrees of belief [subjective probability]. Degrees of belief [subjective

More information

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation. PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.. Beta Distribution We ll start by learning about the Beta distribution, since we end up using

More information

Inference for a Population Proportion

Inference for a Population Proportion Al Nosedal. University of Toronto. November 11, 2015 Statistical inference is drawing conclusions about an entire population based on data in a sample drawn from that population. From both frequentist

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

An Overview of Objective Bayesian Analysis

An Overview of Objective Bayesian Analysis An Overview of Objective Bayesian Analysis James O. Berger Duke University visiting the University of Chicago Department of Statistics Spring Quarter, 2011 1 Lectures Lecture 1. Objective Bayesian Analysis:

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Seminar über Statistik FS2008: Model Selection

Seminar über Statistik FS2008: Model Selection Seminar über Statistik FS2008: Model Selection Alessia Fenaroli, Ghazale Jazayeri Monday, April 2, 2008 Introduction Model Choice deals with the comparison of models and the selection of a model. It can

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

A Bayesian perspective on GMM and IV

A Bayesian perspective on GMM and IV A Bayesian perspective on GMM and IV Christopher A. Sims Princeton University sims@princeton.edu November 26, 2013 What is a Bayesian perspective? A Bayesian perspective on scientific reporting views all

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs

Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Sequential Monitoring of Clinical Trials Session 4 - Bayesian Evaluation of Group Sequential Designs Presented August 8-10, 2012 Daniel L. Gillen Department of Statistics University of California, Irvine

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Model comparison. Christopher A. Sims Princeton University October 18, 2016

Model comparison. Christopher A. Sims Princeton University October 18, 2016 ECO 513 Fall 2008 Model comparison Christopher A. Sims Princeton University sims@princeton.edu October 18, 2016 c 2016 by Christopher A. Sims. This document may be reproduced for educational and research

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling

DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling DS-GA 1003: Machine Learning and Computational Statistics Homework 7: Bayesian Modeling Due: Tuesday, May 10, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to the questions below, including

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Principles of Bayesian Inference

Principles of Bayesian Inference Principles of Bayesian Inference Sudipto Banerjee and Andrew O. Finley 2 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department

More information

On the Bayesianity of Pereira-Stern tests

On the Bayesianity of Pereira-Stern tests Sociedad de Estadística e Investigación Operativa Test (2001) Vol. 10, No. 2, pp. 000 000 On the Bayesianity of Pereira-Stern tests M. Regina Madruga Departamento de Estatística, Universidade Federal do

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

General Bayesian Inference I

General Bayesian Inference I General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

Conditional probabilities and graphical models

Conditional probabilities and graphical models Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within

More information

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006 Hypothesis Testing Part I James J. Heckman University of Chicago Econ 312 This draft, April 20, 2006 1 1 A Brief Review of Hypothesis Testing and Its Uses values and pure significance tests (R.A. Fisher)

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY

MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.

More information

Introduction to Bayesian Statistics 1

Introduction to Bayesian Statistics 1 Introduction to Bayesian Statistics 1 STA 442/2101 Fall 2018 1 This slide show is an open-source document. See last slide for copyright information. 1 / 42 Thomas Bayes (1701-1761) Image from the Wikipedia

More information

Divergence Based priors for the problem of hypothesis testing

Divergence Based priors for the problem of hypothesis testing Divergence Based priors for the problem of hypothesis testing gonzalo garcía-donato and susie Bayarri May 22, 2009 gonzalo garcía-donato and susie Bayarri () DB priors May 22, 2009 1 / 46 Jeffreys and

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Bayesian Analysis of RR Lyrae Distances and Kinematics

Bayesian Analysis of RR Lyrae Distances and Kinematics Bayesian Analysis of RR Lyrae Distances and Kinematics William H. Jefferys, Thomas R. Jefferys and Thomas G. Barnes University of Texas at Austin, USA Thanks to: Jim Berger, Peter Müller, Charles Friedman

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Bios 6649: Clinical Trials - Statistical Design and Monitoring

Bios 6649: Clinical Trials - Statistical Design and Monitoring Bios 6649: Clinical Trials - Statistical Design and Monitoring Spring Semester 2015 John M. Kittelson Department of Biostatistics & Informatics Colorado School of Public Health University of Colorado Denver

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons:

STAT 263/363: Experimental Design Winter 2016/17. Lecture 1 January 9. Why perform Design of Experiments (DOE)? There are at least two reasons: STAT 263/363: Experimental Design Winter 206/7 Lecture January 9 Lecturer: Minyong Lee Scribe: Zachary del Rosario. Design of Experiments Why perform Design of Experiments (DOE)? There are at least two

More information

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016

Bayesian Statistics. Debdeep Pati Florida State University. February 11, 2016 Bayesian Statistics Debdeep Pati Florida State University February 11, 2016 Historical Background Historical Background Historical Background Brief History of Bayesian Statistics 1764-1838: called probability

More information

Department of Statistics

Department of Statistics Research Report Department of Statistics Research Report Department of Statistics No. 208:4 A Classroom Approach to the Construction of Bayesian Credible Intervals of a Poisson Mean No. 208:4 Per Gösta

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Bayesian tests of hypotheses

Bayesian tests of hypotheses Bayesian tests of hypotheses Christian P. Robert Université Paris-Dauphine, Paris & University of Warwick, Coventry Joint work with K. Kamary, K. Mengersen & J. Rousseau Outline Bayesian testing of hypotheses

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Unified Frequentist and Bayesian Testing of a Precise Hypothesis

Unified Frequentist and Bayesian Testing of a Precise Hypothesis Statistical Science 1997, Vol. 12, No. 3, 133 160 Unified Frequentist and Bayesian Testing of a Precise Hypothesis J. O. Berger, B. Boukai and Y. Wang Abstract. In this paper, we show that the conditional

More information

Stat 451 Lecture Notes Simulating Random Variables

Stat 451 Lecture Notes Simulating Random Variables Stat 451 Lecture Notes 05 12 Simulating Random Variables Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 22 in Lange, and Chapter 2 in Robert & Casella 2 Updated:

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015)

Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015) Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015) This is a series of 3 talks respectively on: A. Probability Theory

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota.

Two examples of the use of fuzzy set theory in statistics. Glen Meeden University of Minnesota. Two examples of the use of fuzzy set theory in statistics Glen Meeden University of Minnesota http://www.stat.umn.edu/~glen/talks 1 Fuzzy set theory Fuzzy set theory was introduced by Zadeh in (1965) as

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

ML Testing (Likelihood Ratio Testing) for non-gaussian models

ML Testing (Likelihood Ratio Testing) for non-gaussian models ML Testing (Likelihood Ratio Testing) for non-gaussian models Surya Tokdar ML test in a slightly different form Model X f (x θ), θ Θ. Hypothesist H 0 : θ Θ 0 Good set: B c (x) = {θ : l x (θ) max θ Θ l

More information