Deciding, Estimating, Computing, Checking

Similar documents
Deciding, Estimating, Computing, Checking. How are Bayesian posteriors used, computed and validated?

Bayesian Regression Linear and Logistic Regression

Bayesian Methods for Machine Learning

Bayesian Models in Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Stat 5101 Lecture Notes

Learning Bayesian network : Given structure and completely observed data

Non-informative, proper and improper priors. Statistical Data models, Non-parametrics, Dynamics. Dirichlet Distributionprior for discrete distribution

Fundamental Probability and Statistics

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Pattern Recognition and Machine Learning

Announcements. Proposals graded

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Machine Learning using Bayesian Approaches

Practical Statistics

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

Unobservable Parameter. Observed Random Sample. Calculate Posterior. Choosing Prior. Conjugate prior. population proportion, p prior:

Statistical Data models, Non-parametrics, Dynamics

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

STA 4273H: Statistical Machine Learning

Introduction to Bayesian Statistics 1

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Parameter Estimation, Sampling Distributions & Hypothesis Testing

Bayesian Inference and MCMC

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Principles of Bayesian Inference

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Contents. Part I: Fundamentals of Bayesian Inference 1

HANDBOOK OF APPLICABLE MATHEMATICS

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

TDA231. Logistic regression

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

Hypothesis Testing. Part I. James J. Heckman University of Chicago. Econ 312 This draft, April 20, 2006

Testing Statistical Hypotheses

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Confidence Distribution

Accouncements. You should turn in a PDF and a python file(s) Figure for problem 9 should be in the PDF

Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices.

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Introduction to Probabilistic Machine Learning

A Bayesian Approach to Phylogenetics

Statistical Methods in Applied Computer Science

INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Principles of Bayesian Inference

(1) Introduction to Bayesian statistics

CS 361: Probability & Statistics

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Inference. Chapter 1. Introduction and basic concepts

Principles of Bayesian Inference

Computational Perception. Bayesian Inference

Machine Learning 4771

Physics 509: Bootstrap and Robust Parameter Estimation

CPSC 340: Machine Learning and Data Mining. MLE and MAP Fall 2017

Hypothesis Testing. BS2 Statistical Inference, Lecture 11 Michaelmas Term Steffen Lauritzen, University of Oxford; November 15, 2004

Density Estimation. Seungjin Choi

Formal Statement of Simple Linear Regression Model

Testing Statistical Hypotheses

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

PIER HLM Course July 30, 2011 Howard Seltman. Discussion Guide for Bayes and BUGS

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Comparison of Three Calculation Methods for a Bayesian Inference of Two Poisson Parameters

Statistical Methods for Astronomy

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Introduction to Bayesian Methods

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Machine Learning Overview

Data Analysis and Uncertainty Part 2: Estimation

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

ECE521 week 3: 23/26 January 2017

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Detection theory. H 0 : x[n] = w[n]

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Monte Carlo in Bayesian Statistics

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

an introduction to bayesian inference

BUGS Bayesian inference Using Gibbs Sampling

Machine Learning Linear Classification. Prof. Matteo Matteucci

Introduction. Chapter 1

Overview. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Discussion of Dempster by Shafer. Dempster-Shafer is fiducial and so are you.

Confidence Intervals and Hypothesis Tests

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

STA414/2104 Statistical Methods for Machine Learning II

CPSC 540: Machine Learning

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

STA 4273H: Sta-s-cal Machine Learning

Transcription:

Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making: take action maximizing your utility. Must know cost to decide state is A when it is B. (Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more) Estimation: cost of deciding state is λ when it is λ 1

Maximum expected utility decision Estimating the state 2

Loss functions HW 2 HW2 Loss functions, + - Mean: Easy to compute, necessary for estimating probabilities, sensitive to outliers Median: Robust, scale-invariant, only applicable in 1D Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise 3

Computing Posteriors Finite state space: easy Discretized state space: easy post=prior.*likelihood; post=post/sum(post)) Analytical prior conjugate wrt likelihood: easy High-dimensional state space (eg 3D image), difficult, MCMC Conjugate families Normal prior N(mu,s2) Normal likelihood N(mu,s2 ) Then posterior is normal N(mup,s2p), where (x-mu)^2/s2+(x-mu )^2/s2 =(x-mup)^2/s2p+c i.e., 1/s2+1/s2 =1/s2p mu/s2+mu /s2 =mup/s2p Unknown variances is more difficult 4

Conjugate families Beta conjugate wrt Bernoulli trials Dirichlet conjugate wrt discrete Wishart conjugate wrt multivariate normal, Fairly complete table in Wikipedia Wikipedia on conjugate distributions 5

Markov Chain Monte Carlo 6

Markov Chain Monte Carlo MCMC and mixing Target π Small prop Good prop Large prop 7

Testing and Cournot s Principle Standard Bayesian analysis does not reject a model: it selects the best of those considered. An event with small probability will not happen Assume a model M for an experiment and a low probability event R in result data Perform experiment. If R happened, something was wrong: Assumed model M obvious choice Thus, assumption that M was right is rejected Test statistic Define model to test, the null hypothesis H Define real valued function t(d) on data space. Find distribution of t(d) induced by H Define rejection region R such that P(t(D) R) is low (1% or 5%) R is typically tails of distribution, t(d)<l or t(d)>u where [l,u] is a high probability interval If t(d) in rejection set, the null hypothesis H has been rejected at significance level P(t(D) R) (1% or 5%) 8

Kolmogorov-Smirnov test Is sample from given distribution? Test statistic d is max deviation of empirical cumulative distribution from theoretical. If d*sqrt(n) > 2.5, Sample is (probably) not from target distr Kolmogorov-Smirnov test >> rn=randn(10,1); >> jj=[1:10]; >> jj=jj/10; >> KS(sort(rn),rnn) ans= 1.4142 >> 9

Kolmogorov-Smirnov test Combining Bayesian and frequentist inference Posterior for parameter Generating testing set (Gelman et al, 2003) 10

Graphical posterior predictive model checking takes first place in authoritative book. Left column is 0-1 coding of logistic regression of six subjects response (row) to stimulus(column). Replications using posterior and likelihood distribution in right six columns. There is clear microstructure in left column not present in the right ones. Thus, the fitting appears to have been done with inappropriate(invalid) model. Cumulative counts of real coal-mining disasters (lower red) Comparing with 100 scenarios of same number of simulated disasters occuring randomly: The real data cannot reasonably be produced by a constant-intensity process. 11

The useful concept of p-value Multiple testing The probability of rejecting a true null hypothesis at 99% is 1%. Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 Bonferroni correction, FWE control: in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 % 12

Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory. Fiducial inference Fiducial inference is fairly undeveloped, and also controversial. It is similar in idea to Neyman s confidence interval which is used a lot despite philosophical problems and lack of general understanding. Objective is to find region in which a distributions parameters lie, with confidence c. Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c. However, this is before data has been seen, and estimator is not sufficient statistic. Somewhat scruffy. 13

Hedged prediction scheme Vovk/Gammerman Given sequence z1=(x1,y1), z2=(x2,y2), zn=(xn,yn) AND new x(n+1), predict y(n+1) xi typically (high-dimensional) feature vector yi discrete (classification), or real (regression) Predict y(n+1) Y with (say) 95% confidence, or Predict y(n+1) precisely and state confidence (classification only) Predict y(n+1) giving the sequence maximum randomness using computable approximation to Kolmogorov randomness Can be based on SVM method 14