Deciding, Estimating, Computing, Checking

Deciding, Estimating, Computing, Checking How are Bayesian posteriors used, computed and validated? Fundamentalist Bayes: The posterior is ALL knowledge you have about the state Use in decision making: take action maximizing your utility. Must know cost to decide state is A when it is B. (Engaging target as Bomber when it is Civilian, as Civilian when it is Bomber, waiting for more) Estimation: cost of deciding state is λ when it is λ 1

Maximum expected utility decision Estimating the state 2

Loss functions HW 2 HW2 Loss functions, + - Mean: Easy to compute, necessary for estimating probabilities, sensitive to outliers Median: Robust, scale-invariant, only applicable in 1D Mode, Maximum A Posteriori, necessary for discrete unordered state space, very non-robust otherwise 3

Computing Posteriors Finite state space: easy Discretized state space: easy post=prior.*likelihood; post=post/sum(post)) Analytical prior conjugate wrt likelihood: easy High-dimensional state space (eg 3D image), difficult, MCMC Conjugate families Normal prior N(mu,s2) Normal likelihood N(mu,s2 ) Then posterior is normal N(mup,s2p), where (x-mu)^2/s2+(x-mu )^2/s2 =(x-mup)^2/s2p+c i.e., 1/s2+1/s2 =1/s2p mu/s2+mu /s2 =mup/s2p Unknown variances is more difficult 4

Conjugate families Beta conjugate wrt Bernoulli trials Dirichlet conjugate wrt discrete Wishart conjugate wrt multivariate normal, Fairly complete table in Wikipedia Wikipedia on conjugate distributions 5

Markov Chain Monte Carlo 6

Markov Chain Monte Carlo MCMC and mixing Target π Small prop Good prop Large prop 7

Testing and Cournot s Principle Standard Bayesian analysis does not reject a model: it selects the best of those considered. An event with small probability will not happen Assume a model M for an experiment and a low probability event R in result data Perform experiment. If R happened, something was wrong: Assumed model M obvious choice Thus, assumption that M was right is rejected Test statistic Define model to test, the null hypothesis H Define real valued function t(d) on data space. Find distribution of t(d) induced by H Define rejection region R such that P(t(D) R) is low (1% or 5%) R is typically tails of distribution, t(d)<l or t(d)>u where [l,u] is a high probability interval If t(d) in rejection set, the null hypothesis H has been rejected at significance level P(t(D) R) (1% or 5%) 8

Kolmogorov-Smirnov test Is sample from given distribution? Test statistic d is max deviation of empirical cumulative distribution from theoretical. If d*sqrt(n) > 2.5, Sample is (probably) not from target distr Kolmogorov-Smirnov test >> rn=randn(10,1); >> jj=[1:10]; >> jj=jj/10; >> KS(sort(rn),rnn) ans= 1.4142 >> 9

Kolmogorov-Smirnov test Combining Bayesian and frequentist inference Posterior for parameter Generating testing set (Gelman et al, 2003) 10

Graphical posterior predictive model checking takes first place in authoritative book. Left column is 0-1 coding of logistic regression of six subjects response (row) to stimulus(column). Replications using posterior and likelihood distribution in right six columns. There is clear microstructure in left column not present in the right ones. Thus, the fitting appears to have been done with inappropriate(invalid) model. Cumulative counts of real coal-mining disasters (lower red) Comparing with 100 scenarios of same number of simulated disasters occuring randomly: The real data cannot reasonably be produced by a constant-intensity process. 11

The useful concept of p-value Multiple testing The probability of rejecting a true null hypothesis at 99% is 1%. Thus, if you repeat test 100 times, each time with new data, you will reject sometime with probability 0.63 Bonferroni correction, FWE control: in order to reach significance level 1% in an experiment involving 1000 tests, each test should be checked with significance 1/1000 % 12

Fiducial Inference R A Fisher (1890--1962). In his paper Inverse Probability, he rejected Bayesian Analysis on grounds of its dependency on priors and scaling. He launched an alternative concept, 'fiducial analysis'. Although this concept was not developed after Fishers time, the standard definition of confidence intervals has a similar flavor. The fiducial argument was apparently the starting point for Dempster in developing evidence theory. Fiducial inference Fiducial inference is fairly undeveloped, and also controversial. It is similar in idea to Neyman s confidence interval which is used a lot despite philosophical problems and lack of general understanding. Objective is to find region in which a distributions parameters lie, with confidence c. Region is given by an algorithm: If stated probabilistic assumptions hold, region contains parameters with probability c. However, this is before data has been seen, and estimator is not sufficient statistic. Somewhat scruffy. 13

Hedged prediction scheme Vovk/Gammerman Given sequence z1=(x1,y1), z2=(x2,y2), zn=(xn,yn) AND new x(n+1), predict y(n+1) xi typically (high-dimensional) feature vector yi discrete (classification), or real (regression) Predict y(n+1) Y with (say) 95% confidence, or Predict y(n+1) precisely and state confidence (classification only) Predict y(n+1) giving the sequence maximum randomness using computable approximation to Kolmogorov randomness Can be based on SVM method 14