New Bayesian methods for model comparison
|
|
- Alexina Chase
- 5 years ago
- Views:
Transcription
1 Back to the future New Bayesian methods for model comparison Murray Aitkin Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison p. 1/??
2 Thanks! to everyone, especially Irit, who had the 70th idea Irit and Brian, who organized the celebrations! with much help... (:-) Bayesian Model Comparison p. 2/??
3 The statistics future Is the statistics future Bayesian? MCMC dominates complex data analysis (latent structure, missing data) for good reasons: ML becomes impossibly complicated; standard errors from the information matrix become unreliable; the likelihood ratio test for model comparisons becomes unworkable or inapplicable. Bayesian Model Comparison p. 3/??
4 MCMC But MCMC also has problems: convergence; parametrization; specification of prior (hyper-)parameters; computation of integrated likelihoods for model comparison. Bayesian Model Comparison p. 4/??
5 Contributing to the Bayesian future back to basics... The Bayesian future will come sooner when these problems are resolved. My contribution to this will appear July 2010: Statistical Inference: an Integrated Bayesian/Likelihood Approach Chapman and Hall/CRC Aim: to develop and evaluate general Bayesian model comparisons for arbitrary models through posterior likelihood ratios/posterior deviance differences. Work supported by UK Social Science Research Council ( ), Australian Research Council ( , ), US National Center for Education Statistics (2004-5). Bayesian Model Comparison p. 5/??
6 The basics of problem resolution Model comparisons should allow flat/noninformative priors, as in parameter inference achieved by model comparisons through posterior likelihoods/deviances, not integrated likelihoods which allows unified use of flat/noninformative priors for reference Bayes analysis, and model diagnostics alternative to posterior predictive p-values. Bayesian Model Comparison p. 6/??
7 Summary of book contents Broad range of applications of posterior likelihoods/deviances: New tests of independence in sparse contingency tables; New tests for goodness of fit compared with the saturated model; New test for number of components in a finite mixture distribution; A new alternative to posterior predictive model checks; New simple Bayesian procedures analogous to the t-test and other standard frequentist procedures; New Bayesian nonparametric" test procedures alternatives to the Wilcoxon and other rank tests. Extension of the Bayesian bootstrap to clustered and stratified survey designs full Bayes/likelihood analysis without approximating models. Bayesian Model Comparison p. 7/??
8 The Poisson-geometric choice (Cox 1961, 1962) The data from Cox (1962) are n = 30 event counts y i from either a Poisson or a geometric distribution, and are tabulated below as frequencies f. How to compare the models? y > 3 f Could compare each model (frequentist) deviance with the saturated multinomial deviance. Bayesian Model Comparison p. 8/??
9 Likelihoods The Poisson and geometric likelihoods and deviances (parametrised in terms of the means θ 1 and θ 2 ) are L P (θ 1 ) = i e θ 1 θ y i 1 /y i! = e nθ 1 θ T 1 /F D P (θ 1 ) = 2log L P (θ 1 ) = 2[nθ 1 T log θ 1 + log F] ( ) yi θ2 1 L G (θ 2 ) = i 1 + θ θ 2 = θ T 2 (1 + θ 2 ) T+n D G (θ 2 ) = 2log L G (θ 2 ) = 2[(T + n)log(1 + θ 2 ) T log θ 2 ] where T = i y i = 26,F = i y i! = 384. Bayesian Model Comparison p. 9/??
10 Likelihoods 3.e-16 3.e-16 2.e-16 likelihood 2.e-16 1.e-16 5.e-17 0.e mean Bayesian Model Comparison p. 10/??
11 Deviances deviances mean Bayesian Model Comparison p. 11/??
12 Saturated model The multinomial model has Pr[Y = y j ] = p j with likelihood and deviance L M ({p j }) = j p n j j D M ({p j }) = 2 j n j log p j. Bayesian Model Comparison p. 12/??
13 LRTs The frequentist deviances are D P (ˆθ 1 ) = ; D G (ˆθ 2 ) = ; D M (ˆp j ) = D P (ˆθ 1 ) D M (ˆp j ) = on 2 df; D G (ˆθ 2 ) D M (ˆp j ) = on 2 df; D P (ˆθ 1 ) D G (ˆθ 2 ) = Reject geometric, do not reject Poisson using asymptotic χ 2 2 distribution for LRT but are tests valid? How are Poisson and geometric to be compared directly? If θ is known, e.g to be 0.8, we have direct comparison: Bayesian Model Comparison p. 13/??
14 Likelihood ratio At θ = 0.8, D P (0.8) = ; D G (0.8) = ; D P (0.8) D G (0.8) = 5.931; L P (0.8)/L G (0.8) = e = 19.39; with equal model priors, Pr[Poisson data, mean = 0.8] = 19.39/20.39 = We have very strong evidence in favour of the Poisson. But we are not given the mean we must pay a price in less precision for less information. How do we express the evidence for Poisson over geometric? We want a Bayesian analysis which does not require informative priors on the means, or use integrated likelihoods. Bayesian Model Comparison p. 14/??
15 Solution: keep the uncertainty! posterior likelihoods We give the general approach, originally due to Dempster (1974, 1997), and extended by Aitkin (1997) and Aitkin, Boys and Chadwick (2005). The Poisson and geometric likelihoods are uncertain, because of our uncertainty about θ 1 and θ 2 in these models. This uncertainty is expressed through the posterior distributions of θ 1 and θ 2, given the data and priors. The Poisson likelihood L P (θ 1 ) is a function of θ 1, so we map the posterior distribution of θ 1 into that of L P (θ 1 ). The geometric likelihood L G (θ 2 ) is a function of θ 2, so we map the posterior distribution of θ 2 into that of L G (θ 2 ). This is very simply done by simulation, making random draws from the posteriors: Bayesian Model Comparison p. 15/??
16 Posterior likelihoods make M random draws θ [m] 1 from the posterior distribution of θ 1 under the Poisson model; substitute these draws into the Poisson likelihood, to give M random draws L [m] P = L P(θ [m] 1 ) from the posterior distribution of the Poisson likelihood; make M independent random draws θ [m] 2 from the posterior distribution of θ 2 under the geometric model and prior; substitute these draws into the geometric likelihood, to give M random draws L [m] G = L G(θ [m] 2 ) from the posterior distribution of the geometric likelihood; compute the M values of the likelihood ratio of Poisson to geometric by pairing the sets of likelihood draws: LR [m] PG = L[m] P /L[m] G. Bayesian Model Comparison p. 16/??
17 Posterior deviances We generally work with posterior deviances rather than posterior likelihoods, for reasons we show shortly they are much better behaved. We compute the two sets of posterior deviance draws: calculate D [m] P = 2log L[m] P, D[m] G 2log L[m] G ; compute the M values of the deviance difference of Poisson to geometric by pairing the independent Poisson and geometric deviance draws: DD [m] PG = D[m] P D[m] G ; compute the M values of the likelihood ratio of Poisson to geometric by exponentiating the deviance difference draws: LR [m] PG = e 0.5DD[m] P G ; Bayesian Model Comparison p. 17/??
18 Posterior deviances compute the M values of the posterior probability of the Poisson, given equal prior probabilities (the indifference case): Pr [m] [Poisson data] = L [m] P /[L[m] P + L[m] G ]. The M values for each function define the posterior distribution: we order them to give a picture of the cdf for that function. Bayesian Model Comparison p. 18/??
19 What prior? Since we are working with the posterior of θ, the prior is less important: we are not integrating over the prior. In particular, we can work with flat or diffuse priors without any problem. For the Cox example, we use flat priors on the means θ 1 and θ 2, so the posterior distributions are the normalised likelihoods: the posterior distribution of θ 1 is gamma(t + 1,n), and that of θ 2 is beta(t + 1,n 1). These give a reference analysis; this could be extended to informative priors if we wanted to use them. We show the posterior deviance distributions for each model, and the posterior distribution of the deviance difference. Bayesian Model Comparison p. 19/??
20 Deviance distributions Poisson and geometric deviance distributions p deviance Bayesian Model Comparison p. 20/??
21 Deviance difference distribution 1.0 Deviance difference distribution Bayesian Model Comparison p. 21/??
22 Posterior Poisson probability distribution cdf posterior probability Poisson Bayesian Model Comparison p. 22/??
23 Model preference Of the 10,000 deviance differences, 96 are negative (geometric deviance smaller than Poisson deviance) a proportion of (simulation SE 0.001). The empirical posterior probability that the Poisson model fits better than the geometric (in likelihood) is (SE 0.001). The median deviance difference (Poisson-geometric) is 6.01 almost the same as the frequentist deviance difference and the central 95% credible interval for the true deviance difference is [ 1.58, 10.64]. The median likelihood ratio (Poisson/geometric) is 20.0, and the 95% credible interval for the likelihood ratio is [2.20, 204.4]. The median posterior probability of the Poisson model, given equal prior probabilities, is (very close to the value given that the mean is 0.8), and the 95% credible interval for it is [0.647, 0.995]. Bayesian Model Comparison p. 23/??
24 Conclusion The evidence in favour of the Poisson is quite strong though not as strong as the ratio of maximized likelihoods suggests because of the diffuseness of the posterior deviance difference distribution from the small sample. Bayesian Model Comparison p. 24/??
25 Some simple asymptotics For regular models f(y θ) with flat priors, giving an MLE ˆθ internal to the parameter space, the second-order Taylor expansion of the deviance 2log L(θ) = 2l(θ) about ˆθ gives: 2l(θ) L(θ) π(θ y). = 2l(ˆθ) 2(θ ˆθ) l (ˆθ) (θ ˆθ) l (ˆθ)(θ ˆθ) = 2l(ˆθ) + (θ ˆθ) I(ˆθ)(θ ˆθ). = c exp[ (θ ˆθ) I(ˆθ)(θ ˆθ)/2]. = c exp[ (θ ˆθ) I(ˆθ)(θ ˆθ)/2] Bayesian Model Comparison p. 25/??
26 Asymptotic distributions So asymptotically, given the data y, we have the posterior distributions: θ N(ˆθ,I(ˆθ) 1 ), (θ ˆθ) I(ˆθ)(θ ˆθ) χ 2 p, D(θ) D(ˆθ) + χ 2 p, L(θ) L(ˆθ) exp( χ 2 p/2). D(θ) and L(θ) are (approximately) pivotal functions they have the same distributions (for a flat prior on θ) for frequentists and Bayesians. The likelihood L(θ) has a scaled exp( χ 2 p/2) distribution. The deviance D(θ) = 2log L(θ) has a shifted χ 2 p distribution, shifted by the frequentist deviance D(ˆθ), where p is the dimension of θ. Bayesian Model Comparison p. 26/??
27 Cox example We extend the previous figure of the two deviance distributions with the corresponding asymptotic distributions: the asymptotic Poisson deviance distribution D P (θ 1 ) D P (ˆθ 1 ) + χ 2 1 = χ 2 1, the asymptotic geometric deviance distribution D G (θ 2 ) D G (ˆθ 2 ) + χ 2 1 = χ 2 1. The empirical distributions are shown as solid curves, the asymptotic distributions are dashed curves. The agreement is very close for the Poisson, slightly worse for the geometric whose likelihood is more skewed. Bayesian Model Comparison p. 27/??
28 Empirical and asymptotic deviance distributions cdf deviance Bayesian Model Comparison p. 28/??
29 Model validation the saturated model So the evidence points strongly to the Poisson, if the Poisson and geometric are the only candidates. But what about other models? from a ML point of view, the saturated" multinomial would always fit better! We can easily extend the model comparison to three models, including the multinomial. The multinomial likelihood and deviance, for counts n j at observed values y j with probabilities p j, are L M ({p j }) = j p n j j, D M ({p j }) = 2 j n j log p j. Bayesian Model Comparison p. 29/??
30 Dirichlet prior and posterior We use the conjugate Dirichlet prior: giving the Dirichlet posterior π({p j }) = Γ( j a j) j Γ(a j) pa j 1 j, π({p j } {n j }) = Γ[ j (a j + n j )] j Γ(a j + n j ) pa j+n j 1 j. For a non-informative analysis we take a j = 0 j, giving the posterior π({p j } {n j }) = Γ( j n j) j Γ(n j) pn j 1 j. Bayesian Model Comparison p. 30/??
31 Deviance draws We make M draws p [m] j from the posterior, substitute them in the multinomial deviance to give M multinomial deviance draws D [m] M = D M({p [m] j }), order these and plot their empirical and asymptotic cdfs with those for the Poisson and geometric models. Bayesian Model Comparison p. 31/??
32 Poisson, geometric and multinomial deviances cdf deviance Bayesian Model Comparison p. 32/??
33 Model comparisons Three major points: The agreement between empirical and asymptotic cdfs is not as close for the multinomial as for the parametric models: the heavier parametrization requires a larger sample size for asymptotic behaviour; the sample of 1 in the last category gives a highly skewed posterior in this parameter. Of the geometric-multinomial deviances, 605 are negative an empirical proportion of strong evidence against the geometric. Of the Poisson-multinomial deviances, 6154 are negative an empirical proportion of we cannot choose clearly between the Poisson and the multinomial there is no convincing preference for one over the other. Bayesian Model Comparison p. 33/??
34 Goodness of fit From a goodness-of-fit point of view, this tells us that we can use the Poisson as an adequate representation of the data the always true" multinomial is not convincingly better than the Poisson. (This is as we would hope it should be, since the data were generated from a Poisson.) Proponents of the Bayes factor may dislike this approach many want a guarantee that as n the true distribution is identified with probability 1. The deviance distribution approach gives a different conclusion that the multinomial (which is always true) and the competing Poisson are equally plausible, or not very differently plausible. This is enough for us to conclude that the Poisson is an adequate representation. Bayesian Model Comparison p. 34/??
35 Galaxy example The galaxy recession velocity study mixtures of normals. The data are the recession velocities of 82 galaxies from 6 well-separated sections of the Corona Borealis region. Do these velocities clump" into groups or clusters, or does the velocity density increase initially and then gradually tail off? This has implications for theories of evolution of the universe. Investigate by fitting mixtures of normal distributions to the velocity data; the number of mixture components necessary to represent the data is the parameter of particular interest. Bayesian Model Comparison p. 35/??
36 Recession velocities (/1000)of 82 galaxies velocity Bayesian Model Comparison p. 36/??
37 Mixture of normals model The general model for a K-component normal mixture has different means µ k and variances σk 2 in each component: f(y) = K π k f(y µ k,σ k ) k=1 where f(y µ k,σ k ) = { 1 exp 1 } 2πσk 2σk 2 (y µ k ) 2 and the π k are positive with K k=1 π k = 1. Bayesian Model Comparison p. 37/??
38 1-7 components cdf deviance Bayesian Model Comparison p. 38/??
39 Changepoint in the volumes of Nile floods volume year Bayesian Model Comparison p. 39/??
40 Where is the changepoint? - t-statistics We model volume in year i as normal N(µ 1,σ 2 ) for i i c, and normal N(µ 2,σ 2 ) for i > i c. For each i between 2 and 99 we could compute the two-sample t-statistics, to try to identify the changepoint. Bayesian Model Comparison p. 40/??
41 t-statistics t year Bayesian Model Comparison p. 41/??
42 Inconclusive? The maximum t-statistic is at How confident are we of the maximum t determining the changepoint? If we could rely on the maximized likelihoods to define the evidence for each i, we could compute the posterior probabilities of the changepoint at each time i: Bayesian Model Comparison p. 42/??
43 Posterior changepoint probabilities change probability year Bayesian Model Comparison p. 43/??
44 Posterior deviance distributions cdf deviance Bayesian Model Comparison p. 44/??
45 Reduce complexity Only the 7 best matter - no overlap between best and 8th best. Can ignore others. Bayesian Model Comparison p. 45/??
46 7 best posterior deviance distributions cdf deviance Bayesian Model Comparison p. 46/??
47 Thank you all for coming! Bayesian Model Comparison p. 47/??
A Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationApproximating models. Nancy Reid, University of Toronto. Oxford, February 6.
Approximating models Nancy Reid, University of Toronto Oxford, February 6 www.utstat.utoronto.reid/research 1 1. Context Likelihood based inference model f(y; θ), log likelihood function l(θ; y) y = (y
More informationIntroduction to Bayesian Methods
Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationE. Santovetti lesson 4 Maximum likelihood Interval estimation
E. Santovetti lesson 4 Maximum likelihood Interval estimation 1 Extended Maximum Likelihood Sometimes the number of total events measurements of the experiment n is not fixed, but, for example, is a Poisson
More informationBayesian Inference: Posterior Intervals
Bayesian Inference: Posterior Intervals Simple values like the posterior mean E[θ X] and posterior variance var[θ X] can be useful in learning about θ. Quantiles of π(θ X) (especially the posterior median)
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationApproximate Bayesian computation for spatial extremes via open-faced sandwich adjustment
Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment Ben Shaby SAMSI August 3, 2010 Ben Shaby (SAMSI) OFS adjustment August 3, 2010 1 / 29 Outline 1 Introduction 2 Spatial
More informationBayesian Inference in Astronomy & Astrophysics A Short Course
Bayesian Inference in Astronomy & Astrophysics A Short Course Tom Loredo Dept. of Astronomy, Cornell University p.1/37 Five Lectures Overview of Bayesian Inference From Gaussians to Periodograms Learning
More informationBayesian Asymptotics
BS2 Statistical Inference, Lecture 8, Hilary Term 2008 May 7, 2008 The univariate case The multivariate case For large λ we have the approximation I = b a e λg(y) h(y) dy = e λg(y ) h(y ) 2π λg (y ) {
More informationA union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling
A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics
More informationConfidence Distribution
Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationBayesian Econometrics
Bayesian Econometrics Christopher A. Sims Princeton University sims@princeton.edu September 20, 2016 Outline I. The difference between Bayesian and non-bayesian inference. II. Confidence sets and confidence
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationStatistical modelling of a terrorist network with the latent class model and Bayesian model comparisons
Statistical modelling of a terrorist network with the latent class model and Bayesian model comparisons Murray Aitkin and Duy Vu, and Brian Francis murray.aitkin@unimelb.edu.au duy.vu@unimelb.edu.au b.francis@lancaster.ac.uk
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationBayesian Model Comparison
BS2 Statistical Inference, Lecture 11, Hilary Term 2009 February 26, 2009 Basic result An accurate approximation Asymptotic posterior distribution An integral of form I = b a e λg(y) h(y) dy where h(y)
More informationDavid Giles Bayesian Econometrics
9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationLecture 16: Mixtures of Generalized Linear Models
Lecture 16: Mixtures of Generalized Linear Models October 26, 2006 Setting Outline Often, a single GLM may be insufficiently flexible to characterize the data Setting Often, a single GLM may be insufficiently
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationNancy Reid SS 6002A Office Hours by appointment
Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Light touch assessment One or two problems assigned weekly graded during Reading Week http://www.utstat.toronto.edu/reid/4508s14.html
More informationStat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016
Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016 1 Theory This section explains the theory of conjugate priors for exponential families of distributions,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationHypothesis Testing. Econ 690. Purdue University. Justin L. Tobias (Purdue) Testing 1 / 33
Hypothesis Testing Econ 690 Purdue University Justin L. Tobias (Purdue) Testing 1 / 33 Outline 1 Basic Testing Framework 2 Testing with HPD intervals 3 Example 4 Savage Dickey Density Ratio 5 Bartlett
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationMathematical Statistics
Mathematical Statistics MAS 713 Chapter 8 Previous lecture: 1 Bayesian Inference 2 Decision theory 3 Bayesian Vs. Frequentist 4 Loss functions 5 Conjugate priors Any questions? Mathematical Statistics
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More information4.5.1 The use of 2 log Λ when θ is scalar
4.5. ASYMPTOTIC FORM OF THE G.L.R.T. 97 4.5.1 The use of 2 log Λ when θ is scalar Suppose we wish to test the hypothesis NH : θ = θ where θ is a given value against the alternative AH : θ θ on the basis
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationPenalized Loss functions for Bayesian Model Choice
Penalized Loss functions for Bayesian Model Choice Martyn International Agency for Research on Cancer Lyon, France 13 November 2009 The pure approach For a Bayesian purist, all uncertainty is represented
More informationA new Bayesian approach for determining the number of components in a finite mixturee
Metron manuscript No. (will be inserted by the editor) A new Bayesian approach for determining the number of components in a finite mixturee METRON DOI:10.1007/s40300-015-0068-1 Murray Aitkin Duy Vu Brian
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationBasic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015)
Basic of Probability Theory for Ph.D. students in Education, Social Sciences and Business (Shing On LEUNG and Hui Ping WU) (May 2015) This is a series of 3 talks respectively on: A. Probability Theory
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationBayesian Inference. Chapter 2: Conjugate models
Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationST495: Survival Analysis: Hypothesis testing and confidence intervals
ST495: Survival Analysis: Hypothesis testing and confidence intervals Eric B. Laber Department of Statistics, North Carolina State University April 3, 2014 I remember that one fateful day when Coach took
More informationSubject CS1 Actuarial Statistics 1 Core Principles
Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and
More informationThe outline for Unit 3
The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationStatistical Inference: Estimation and Confidence Intervals Hypothesis Testing
Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationNancy Reid SS 6002A Office Hours by appointment
Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,
More informationHPD Intervals / Regions
HPD Intervals / Regions The HPD region will be an interval when the posterior is unimodal. If the posterior is multimodal, the HPD region might be a discontiguous set. Picture: The set {θ : θ (1.5, 3.9)
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationIntroduction to Bayesian Inference
Introduction to Bayesian Inference Tom Loredo Dept. of Astronomy, Cornell University http://www.astro.cornell.edu/staff/loredo/bayes/ June 10, 2006 Outline 1 The Big Picture 2 Foundations Axioms, Theorems
More informationDefault priors and model parametrization
1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)
More informationHeriot-Watt University
Heriot-Watt University Heriot-Watt University Research Gateway Prediction of settlement delay in critical illness insurance claims by using the generalized beta of the second kind distribution Dodd, Erengul;
More informationSTAT 135 Lab 5 Bootstrapping and Hypothesis Testing
STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationBayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida
Bayesian Statistical Methods Jeff Gill Department of Political Science, University of Florida 234 Anderson Hall, PO Box 117325, Gainesville, FL 32611-7325 Voice: 352-392-0262x272, Fax: 352-392-8127, Email:
More informationLast week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36
Last week Nuisance parameters f (y; ψ, λ), l(ψ, λ) posterior marginal density π m (ψ) =. c (2π) q el P(ψ) l P ( ˆψ) j P ( ˆψ) 1/2 π(ψ, ˆλ ψ ) j λλ ( ˆψ, ˆλ) 1/2 π( ˆψ, ˆλ) j λλ (ψ, ˆλ ψ ) 1/2 l p (ψ) =
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationPost-exam 2 practice questions 18.05, Spring 2014
Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationHypothesis testing: theory and methods
Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable
More informationStatistical Tools and Techniques for Solar Astronomers
Statistical Tools and Techniques for Solar Astronomers Alexander W Blocker Nathan Stein SolarStat 2012 Outline Outline 1 Introduction & Objectives 2 Statistical issues with astronomical data 3 Example:
More informationOther Noninformative Priors
Other Noninformative Priors Other methods for noninformative priors include Bernardo s reference prior, which seeks a prior that will maximize the discrepancy between the prior and the posterior and minimize
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More information