Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice

Similar documents
Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice

Markov Chain Monte Carlo methods

The Bayesian Approach to Multi-equation Econometric Model Estimation

Addition to PGLR Chap 6

Pattern Recognition and Machine Learning

Supplementary information

Bayesian Networks in Educational Assessment

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Bayesian Methods for Machine Learning

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

Reconstruction of individual patient data for meta analysis via Bayesian approach

STAT 425: Introduction to Bayesian Analysis

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

The GENMOD Procedure (Book Excerpt)

1 Inference for binomial proportion (Matlab/Python)

ST 740: Markov Chain Monte Carlo

Bayesian Inference. Chapter 1. Introduction and basic concepts

Empirical Validation of the Critical Thinking Assessment Test: A Bayesian CFA Approach

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

SAS/STAT 14.2 User s Guide. The GENMOD Procedure

Firth's penalized likelihood logistic regression: accurate effect estimates AND predictions?

0.1 normal.bayes: Bayesian Normal Linear Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Item Parameter Calibration of LSAT Items Using MCMC Approximation of Bayes Posterior Distributions

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Count data page 1. Count data. 1. Estimating, testing proportions

Down by the Bayes, where the Watermelons Grow

Monetary and Exchange Rate Policy Under Remittance Fluctuations. Technical Appendix and Additional Results

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Subject CS1 Actuarial Statistics 1 Core Principles

Luke B Smith and Brian J Reich North Carolina State University May 21, 2013

Bayesian course - problem set 5 (lecture 6)

Markov Chain Monte Carlo

Package horseshoe. November 8, 2016

Including historical data in the analysis of clinical trials using the modified power priors: theoretical overview and sampling algorithms

Bayes: All uncertainty is described using probability.

0.1 poisson.bayes: Bayesian Poisson Regression

Stat 5101 Lecture Notes

16 : Approximate Inference: Markov Chain Monte Carlo

Package LBLGXE. R topics documented: July 20, Type Package

A default prior distribution for logistic and other regression models

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Spatial Statistics Chapter 4 Basics of Bayesian Inference and Computation

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Default Priors and Effcient Posterior Computation in Bayesian

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Bayesian Classification Methods

STA 4273H: Statistical Machine Learning

Computational statistics

BayesSummaryStatLM: An R package for Bayesian Linear Models for

Simple logistic regression

STA 4273H: Sta-s-cal Machine Learning

Principles of Bayesian Inference

Infer relationships among three species: Outgroup:

STA 4273H: Statistical Machine Learning

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Inference. Chapter 9. Linear models and regression

17 : Markov Chain Monte Carlo

An adaptive kriging method for characterizing uncertainty in inverse problems

Disease mapping with Gaussian processes

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Web Appendix for The Dynamics of Reciprocity, Accountability, and Credibility

The STS Surgeon Composite Technical Appendix

Machine Learning Linear Classification. Prof. Matteo Matteucci

Online Appendix: Bayesian versus maximum likelihood estimation of treatment effects in bivariate probit instrumental variable models

Assessment of the South African anchovy resource using data from : posterior distributions for the two base case hypotheses

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Introduction to Probabilistic Machine Learning

Problems with Penalised Maximum Likelihood and Jeffrey s Priors to Account For Separation in Large Datasets with Rare Events

SAMSI Astrostatistics Tutorial. More Markov chain Monte Carlo & Demo of Mathematica software

SAS/STAT 15.1 User s Guide The BGLIMM Procedure

An overview of Bayesian analysis

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Sawtooth Software. CVA/HB Technical Paper TECHNICAL PAPER SERIES

Inference for a Population Proportion

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

The lmm Package. May 9, Description Some improved procedures for linear mixed models

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Bayesian Analysis of Bivariate Count Data

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

SAS for Bayesian Mediation Analysis

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Quantile POD for Hit-Miss Data

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

On the Use of Cauchy Prior Distributions for Bayesian Logistic Regression

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Penalized Loss functions for Bayesian Model Choice

Markov Chain Monte Carlo (MCMC) and Model Evaluation. August 15, 2017

Kernel Logistic Regression and the Import Vector Machine

BAYESIAN ESTIMATION OF LINEAR STATISTICAL MODEL BIAS

MCMC algorithms for fitting Bayesian models

Bayesian Defect Signal Analysis

Using Bayesian Priors for More Flexible Latent Class Analysis

Nonparametric Bayesian Methods (Gaussian Processes)

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Transcription:

Comparing Priors in Bayesian Logistic Regression for Sensorial Classification of Rice

INTRODUCTION Visual characteristics of rice grain are important in determination of quality and price of cooked rice. Stickiness is a visual characteristic of texture that is typically measured by an expensive and time consuming sensorial analysis of cooked rice. Here, PROC MCMC is used to automatically estimate rice stickiness (sticky or loose) with a Bayesian binomial logistic regression model applied to linear combinations of five viscosity measurements of cooked rice. The priors used in the Bayesian modelling were based on four different suggestions of literature: (1) a noninformative prior; (2) a default prior (Gelman et al., 2008); (3) a prior based on odds ratio (Sullivan and Greenland, 2012); and (4) power priors (Ibrahim and Chen, 2000). Then, the quality of each adjustment is evaluated by comparison of agreement between sensorial and predictive (model) classification of stickiness. Working data set (years and 2014): MATERIAL AND METHODS The data has 144 samples from harvesting of rice for years 2013 (n = 72) and 2014 (n = 72). PEGAJB is the sensorial classification of stickiness provided by researchers from EMBRAPA (0 = sticky; 1 = loose). Variables C1 and C2 are scores obtained from the first two principal components obtained from viscosity measures: apparent amylose content, gelatinization temperature, peak, breakdown and final viscosity. BAYESIAN LOGISTIC REGRESSION Stickiness as a response Y i with a binomial distribution, y i p i ~ binomial n i, p i, where p i is the success probability (sticky) and links to regression covariates C 1, C 2 through a logit transformation logit p i = log p i 1 p i = β 0 + β 1 C 1 + β 2 C 2. Once a priori distribution is defined for a parameter β i, p β i, the posteriori distribution is found by p β i Y = L β i Y p(β i ) L β i Y p β i dβ i, with L(β i Y) the likelihood of β i. The priors on β 0, β 1 and β 2 were defined according to four different prior distributions suggested in literature: NONINFORMATIVE PRIOR: A noninformative prior distribution is proportional to the likelihood function, resulting in low (or none) impact in parameters of the posteriori distribution. One common prior distribution is given by the uniform distribution. INFORMATIVE PRIOR A: Sullivan and Greeland (2012) suggest the creation of a normal prior for each β based on its corresponding odds ratio (Kleinbaum and Klein, 2010) obtained from original data. INFORMATIVE PRIOR B: according to Gelman et al. (2014) propose independent Student-t prior distribution for parameters of classical logistic regression models. The priors are applied after scaling all no binary variables to have mean 0 and standard deviation 0.5. In this study we used C 1, C 2 with mean 0 and corresponding variance. A conservative choice is to use independent Cauchy with center 0 and scale 10 to the constant term (β 0 ) and Cauchy with center 0 and scale 2.5 to each of the coefficients (β 1, β 2 ) in the logistic regression. POWER PRIORS: Ibrahim et al. (2015) suggested priors to be constructed from historical data. The formulation of the power prior is π β D 0, a 0 L β D 0 a 0 π 0 β c 0 with c 0 the hyperparameter for a inicial prior (π 0 ) and a 0 is a scalar parameter (0 a 0 1) that controls the influence of the historical data on weights the historical data relative to the likelihood of the current study (L β D 0 ). When a 0 = 1, the posterior distribution is based completely on historical data. Howerver, if a 0 = 0, the prior does not depend on historical data at all. Data from 2013 was used as historical data. CLASSIFICATION ERROR The apparent error rate (APR) was obtained as (Johnson and Wichern, 2007), APR = n d n, with n d representing the number of discordances between researcher classification and model prediction and n the total number of samples. The result was used as the main indicator of goodness of fit for each model.

MATERIAL AND METHODS CONTINUED SAS IMPLEMENTATION NONINFORMATIVE PRIOR: ODS GRAPHICS ON; PROC MCMC DATA=RICECP14 NBI = 4000 NMC = 10000 SEED = 1966 NTHIN=15 STATISTICS = SUMMARY; RUN; PARMS BETA0 0 BETA1 0 BETA2 0; PRIOR BETA0 ~ UNIFORM(-15,15); PRIOR BETA1 ~ UNIFORM(-15,15); PRIOR BETA2 ~ UNIFORM(-15,15); P = LOGISTIC(BETA0 + BETA1*PRIN1 + BETA2*PRIN2); MODEL PEGAJB ~ BINARY(P); ODS OUTPUT PostSummaries=PARMSM1; ODS GRAPHICS OFF; Figure 1: Diagnostics for β 0. With ODS statement, PROC MCMC produces three main graphics which aid convergence diagnostic checking: (1) trace plots allow to check whether the mean of the Markov chain has stabilized and appears constant over the graph, and whether the chain has good mixing and is dense. The plots show if the chains appear to reached their stationary distributions; (2) autocorrelation plots indicate the degree of autocorrelation for each of the posterior samples, with high autocorrelations indicating slow mixing; (3) kernel density plots estimate the posterior marginal distributions for each parameter. Figure 1 shows the diagnostic plots for parameter β 0 using priors β 0 ~ U 15; 15, β 1 ~ U 15; 15, β 2 ~ U 15; 15 Five statement options were used. NBI specifies the number of burn-in iterations. NMC specifies the number of MCMC iterations after the burn-in process. SEED specifies a random seed for simulation, useful for reproducible coding. NTHIN controls the thinning rate of the simulation, keeping every nth simulation sample and discarding the rest. STATISTICS specifies options for posterior statistics. Summary statistics prints n, mean, standard deviation and percentiles for each parameter. The PARMS statement declares model parameters and optional starting values. PRIOR statement specifies prior distribution for each parameter. The programming statement does the logistic transformation and assigns it to P. The MODEL statement specifies that the response variable PEGAJB has distribution of Bernoulli with probability of success P. Finally, the ODS OUTPUT option creates a data set PARMSM1 with the summary statistics from the posterior distribution. INFORMATIVE PRIOR A: PROC LOGISTIC DATA= RICECP14 DESCENDING; MODEL PEGAJB = C1 C2; ODS OUTPUT OddsRatios=ODSRATIOM2(KEEP=LowerCL UpperCL); RUN; The Bayesian model use odds ratio of parameters with significant estimates. Then use same structure of the program for noninformative prior, but with normal priors given by statement PRIOR BETA ~ NORMAL(M,V). INFORMATIVE PRIOR B: here the only difference from the noninformative model is the prior distribution to be applied. It was assigned independent Cauchy with center 0 and scale 10 to the constant term (β 0 ) and Cauchy with center 0 and scale 2.5 to each of the coefficients (β 1, β 2 ) in the logistic regression. The prior specifications are given by PRIOR BETA ~ NORMAL(M,V), with M the center and V scale of the Cauchy. POWER PRIORS: PROC MCMC DATA=SENSRICECP NBI=4000 NMC=10000 SEED=1966 NTHIN=15 STATISTICS=SUMMARY; BY A; PARMS (BETA0 BETA1 BETA2) 0; PRIOR BETA0 ~ CAUCHY(0,10); PRIOR BETA1 ~ CAUCHY(0,2.5); PRIOR BETA2 ~ CAUCHY(0,2.5); P = LOGISTIC(BETA0 + BETA1*C1 + BETA2*C2); LLIKE = LOGPDF('BERNOULLI',PEGAJB,P); IF (YEAR = 2013) THEN LLIKE = A * LLIKE; MODEL GENERAL(LLIKE); ODS OUTPUT PostSummaries=PARMSM4;

SAS IMPLEMENTATION CONTINUED The SAS programming statements define the probability to be used in the distribution for the log-likelihood function. LOGPDF function compute the logarithm of a Bernoulli probability density with parameter P applied at PEGAJB. Then, a different data is created considering the results for each weight of the historical data (2013). The GENERAL command in the MODEL statement defines the log-prior function to be used in the model. Note that separate code could be used for each value of A, but this code is faster to process and smaller than using separate codes. RESULTS The convergence diagnostic plots were very similar for all parameters and models. The plots indicate evidence of convergence for the chain, as observed in Figure 1 for β 0 with noninformative prior. The APR was the same in all models, only 5,56% of apparent error. From the 72 predicted samples only 3 were classified as loose when it was sticky and 1 was classified as sticky when it was loose. Display 1 has the results observed in all 4 models. Sensorial Predicted Frequency Percent STICKY LOOSE Total STICKY 33 45.83 LOOSE 1 1.39 Total 34 47.22 3 4.17 35 48.61 38 52.78 36 50.00 36 50.00 72 100.00 Display 1: Sensorial Stickiness by Predicted Stickness. The main difference when using the four implemented priors was in the estimates and standard deviations of posterior densities for each parameter. The informative prior of Sullivan and Greeland (Model 2) resulted in a model with only two parameters (β 0, β 1 ) with lower standard error for each parameter (Figure 2). The result observed with the power prior with a 0 = 1, the one the posterior distribution is based completely on historical data, was the closest to the one presented by Model 2 (Figure 3). If one is interested in parameter estimates and interpretation, then we recommend the use of these models. This study focuse only on quality of stickiness prediction, and for this purpose all models had the same performance. Figure 2: Posterior Mean and Credibility Intervals (α = 5%) for parameters of models with noninformative prior (1), informative prior A (2), informative prior B (3). CONCLUSIONS Figure 3: Posterior Mean and Credibility Intervals (α = 5%) for parameters of power priors models (a 0 = A). Sensorial prediction/classification of Rice Stickiness is very efficient when using Bayesian Logistic Regression with PROC MCMC. The use of different priors suggested in literature did produced the same quality of adjustment with difference only in the number of parameters used and size of credibility intervals. If one is interested in exploring parameters properties then we suggest the use of informative prior of Sullivan and Greeland (2012) or power prior based completely on historical data, if available. PROC MCMC is easy to use, fast to compute and very well documented. MAIN REFERENCES Sullivan, S.G. and Greenland, S. 2012. Bayesian regression in SAS Software. Int. J. Epidemilogy, 1-10. Ibrahim, J.G.; Chen, M.H.; Gwon, Y and Chen, F. 2015. The power prior: theory and applications. Statistics in Medicine, Vol. 34, Issue 28, 3724-3749. Gelman, A.; Jakulin, A.; Pittau, M.G. and Su, Y.S. 2008. A weekly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, Vol. 2, Issue 4, 1360-1383. Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A. and Rubin, D.B. (2014) Bayesian Data Analysis. CRC Press.