Frequentist Accuracy of Bayesian Estimates

Size: px
Start display at page:

Download "Frequentist Accuracy of Bayesian Estimates"

Transcription

1 Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University

2 Bayesian Inference Parameter: µ Ω Observed data: x Prior: π(µ) Probability distributions: Parameter of interest: { fµ (x), µ Ω } θ = t(µ) E{θ x} = t(µ)f µ (x)π(µ) dµ Ω What if we don t know g? / f µ (x)π(µ) dµ Ω Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 2 / 30

3 Jeffreysonian Bayes Inference Uninformative Priors Jeffreys: π(µ) = I(µ) 1/2 where I(µ) = cov { µ log f µ (x) } (the Fisher information matrix) Can still use Bayes theorem but how accurate are the estimates? Today: frequentist variability of E { t(µ) x } Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 3 / 30

4 General Accuracy Formula µ and x R p x (µ, V µ ) α x (µ) = x log f µ (x) = ( ) T..., log f µ(x) x i,... Lemma Ê = E { t(µ) x } has gradient x E = cov { t(µ), α x (µ) x }. Theorem The delta-method standard deviation of E is sd ( Ê ) = [ cov { t(µ), α x (µ) x }T V x cov { t(µ), α x (µ) x }] 1/2. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 4 / 30

5 Implementation Posterior Sample µ 1, µ 2,..., µ B (MCMC) ˆθ i = t(µ i ) = t i and α i = α x (µ i ) ŝd ( Ê ) = ĉov = [ ] 1/2 ĉov T V x ĉov B i=1 ( α i ᾱ ) ( t i t )/ B Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 5 / 30

6 Diabetes Data Efron et al. (2004), LARS n = 442 subjects p = 10 predictors: age, sex, bmi, glu,... Response: y = disease progression at one year Model: y = X α + e [e N n (0, I)] n 1 n p p 1 n 1 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 6 / 30

7 Diabetes data. Lasso: min{rss/2 + gamma*l1(beta)} Cp minimized at Step 7, with gamma =.37 beta vals Step 7 tc gamma: ltg bmi ldl map tch hdl glu age sex step Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 7 / 30

8 Bayesian Lasso Park and Casella (2008) Model: y N n (Xα, I) µ = α and x = y Prior: π(α) = e γl 1(α) [γ = 0.37] Then posterior mode at Lasso ˆα γ Subject 125: θ 125 = x T 125 α How accurate are Bayes posterior inferences for θ 125? Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 8 / 30

9 Bayesian Analysis MCMC: posterior sample { α i for i = 1, 2,..., 10, 000 } Gives { θ = x T 125,i 125 α, i = 1, 2,..., 10, 000} i θ 125,i ± General accuracy formula: frequentist sd for Ê = Other subjects freq sd < Bayes sd Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 9 / 30

10 Figure MCMC values for Theta =Subject 125 estimate; mean=.248, stdev=.0721; Frequentist Sd for E{theta data} is.0708, giving Coefficient of Variation 29% Frequency ^ [ ].248 ^ MCMC mu125 values Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 10 / 30

11 Posterior CDF of θ 125 Apply GAC to s i = I { t i < c } so E{s data} = Pr{θ 125 c data} c = 0.3 : Ê = ± Bayes GAC sd estimate Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 11 / 30

12 Figure 3. MCMC posterior cdf of mu125, Diabetes data, Prior exp{.37*l1(alpha)}; verts are + One Frequentist Standard Dev Prob{mu125 < c data} [ ] c value Upper 95% credible limit is Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 12 / 30

13 Exponential Families f α (ˆβ ) = e αt ˆβ ψ(α) f 0 (ˆβ ) x = ˆβ and µ = α General Accuracy Formula α = α x (µ) For Ê = E { t(α) ˆβ } ŝd = [cov ( t, α ) T ˆβ Vˆα cov ( t, α )] 1/2 ˆβ with Vˆα = cov α=ˆα {ˆβ } = ψ(ˆα). Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 13 / 30

14 Bayesian Estimation Using the Parametric Bootstrap Efron (2012) Parametric bootstrap: fˆα ( ) {ˆβ 1, ˆβ 2,..., ˆβ B } Want to calculate: Ê = E { t(α) ˆβ } for prior π(α) Importance sampling: Ê B / B t i π i R i 1 1 π i R i t i = t (ˆα i ), πi = π (ˆα i ), and Ri = conversion factor (Easy in exponential families.) R i = fˆα (ˆβ ) / ) fˆα (ˆβ i i Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 14 / 30

15 Bootstrap for GAC p i = π i R i / B 1 π jr j is weight on ith Bootstrap Replication Ê = B i=1 p it i ĉov = B i=1 p i ˆα i ( t i Ê) estimates cov ( α, t ˆβ ) ŝd = [ ] 1/2 ĉov T Vˆα ĉov Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 15 / 30

16 Prostate Cancer Study Singh et al. (2002) Microarray study: 102 men 52 prostate cancer, 50 healthy controls 6033 genes z i test statistic for H 0i : no difference H 0i : z i N(0, 1) Goal: identify genes involved in prostate cancer Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 16 / 30

17 Figure 4. Prostate study: 6033 z values and matching N(0,1) density Frequency z value Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 17 / 30

18 Poisson GLM Histogram: 49 bins, c j midpoint of bin j y j = #{z i in bin j} Poisson GLM: y j ind Poi(µ j ) log(µ) = poly(c, degree=8) [MLE: glm(y poly(c,8), Poisson)] Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 18 / 30

19 Bayesian Estimation for the Poisson Model Model: y Poi(µ), µ = e Xα [X = poly(c,8)] Prior: Jeffreys prior for α / cdf: α µ cdf: F α (z) = µ i µ i Fdr parameter: [MLE: Fdrˆα (3) = 0.183] c i z t(α) = 1 Φ(3) 1 F(3) = Fdr(3) Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 19 / 30

20 Figure 5. Prostate study: posterior density for Fdr(3) based on 4000 parametric bootstraps from Poisson poly(8) gave posterior (m,sd) =(.183,.025); Frequentist Sd for.183 equalled.026; CV=14% density E{t x}= Fdr(3) values Internal coefficient of variation =.0023 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 20 / 30

21 Model Selection Calculations Full model: y Poi(µ), µ = e Xα [ α R 9, µ R 49] Submodel M m : {µ : only 1st m + 1 coordinates of β 0} Bayesian model selection: prior probabilities on and within each M m Poor man s model estimates: partition R 49 into preference regions R m = {µ closest to M m } Calculate posterior probabilities Pr{R m y} Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 21 / 30

22 Posterior Model Probabilities Distance: minimum AIC from µ to point in R m Preferred model: is one with smallest distance R m posterior prob sd Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 22 / 30

23 Empirical Bayes Accuracy z k = observed value for gene k θ k = true effect size z k N(θ k, 1) Unknown prior g( ) θ 1, θ 2,..., θ N [N = 6033] From observations z 1, z 2,..., z N we wish to estimate E z = E { t(θ) / } z = t(θ)f θ (z)g(θ) dθ f θ (z)g(θ) dθ [if t(θ) = δ 0 (θ) then E z = Pr{θ = 0 z}]. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 23 / 30

24 Parametric Families of Priors p-parameter exponential family: log g α (θ) = q(θ) T 1 p α the unknown parameter α + constant p 1 Marginal: f α (z) = f θ (z)g α (θ) dθ e.g., q(θ) = ( δ 0 (θ), θ, θ 2, θ 3, θ 4, θ 5) T [ spike and slab ] MLE: α g α f α (z 1, z 2,..., z N ) marginal MLE ˆα Ê z = t(θ)f θ (z)gˆα (θ) dθ / f θ (z)gˆα (θ) dθ ±?? Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 24 / 30

25 Delta-Method Standard Deviation of Êz sd ( { ) (dez ) T ( ) } 1/2 Ê z = I 1 dez α dα dα Fisher information matrix: q α = q(θ)g α (θ) dθ h α (z) = f θ (z)g α (θ) [q(θ) q] dθ hα (z)h α (z) T I α = N f α (z) dz de z dα = E z w(θ)gα (θ) [q(θ) q] dθ where w(θ) = t(θ)f θ (z)g α (θ) / tfg α f θ (z)g α (θ) / fg α Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 25 / 30

26 Figure 6. Estimated prob{abs(theta)>2 z value} Prostate study; Bars show + one freq stdev. Using g model {0,ns(5)} prob{ theta >2 z} z value Prob{abs(theta)>2 z=3} = Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 26 / 30

27 Estimated False Discovery Rate Local false discovery rate: fdr(z) = Pr {θ = 0 z} = E { t(θ) z } where t(θ) = δ 0 (θ) Next: applied to prostate study Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 27 / 30

28 Figure 7. Estimated prob{theta=0 z value} Prostate study; Bars show + one freq stdev. Using g model {0,ns(5)} prob{theta=0 z} z value Prob{theta=0 z=3} = , coeff of var =.21 Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 28 / 30

29 For z = 3: Pr { θ > 2 z = 3} = 0.43 ± 0.07 fdr(3) = 0.32 ± 0.07 locfdr : exfam modeling of z s, not θ s, gave fdr(3) = 0.35 ± 0.03 but doesn t work for θ > 2, etc. Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 29 / 30

30 References Park and Casella (2008) The Bayesian Lasso JASA Singh et al. (2002) Gene expression correlates of clinical prostate cancer behavior Cancer Cell Efron, Hastie, Johnstone and Tibshirani (2004) Least angle regression Ann. Statist Efron (2012) Bayesian inference and the parametric bootstrap Ann. Appl. Statist Efron (2010) Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction IMS Monographs, Cambridge University Press Bradley Efron (Stanford University) Frequentist Accuracy of Bayesian Estimates 30 / 30

Frequentist Accuracy of Bayesian Estimates

Frequentist Accuracy of Bayesian Estimates Frequentist Accuracy of Bayesian Estimates Bradley Efron Stanford University RSS Journal Webinar Objective Bayesian Inference Probability family F = {f µ (x), µ Ω} Parameter of interest: θ = t(µ) Prior

More information

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford University Importance Sampling for Bayes Posterior Distribution Newton and Raftery (1994 JRSS-B) Nonparametric Bootstrap: good choice

More information

FREQUENTIST ACCURACY OF BAYESIAN ESTIMATES. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

FREQUENTIST ACCURACY OF BAYESIAN ESTIMATES. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California FREQUENTIST ACCURACY OF BAYESIAN ESTIMATES By Bradley Efron Technical Report 265 October 2013 Division of Biostatistics STANFORD UNIVERSITY Stanford, California FREQUENTIST ACCURACY OF BAYESIAN ESTIMATES

More information

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University

Tweedie s Formula and Selection Bias. Bradley Efron Stanford University Tweedie s Formula and Selection Bias Bradley Efron Stanford University Selection Bias Observe z i N(µ i, 1) for i = 1, 2,..., N Select the m biggest ones: z (1) > z (2) > z (3) > > z (m) Question: µ values?

More information

Empirical Bayes Deconvolution Problem

Empirical Bayes Deconvolution Problem Empirical Bayes Deconvolution Problem Bayes Deconvolution Problem Unknown prior density g(θ) gives unobserved realizations Θ 1, Θ 2,..., Θ N iid g(θ) Each Θ k gives observed X k p Θk (x) [p Θ (x) known]

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University

Correlation, z-values, and the Accuracy of Large-Scale Estimators. Bradley Efron Stanford University Correlation, z-values, and the Accuracy of Large-Scale Estimators Bradley Efron Stanford University Correlation and Accuracy Modern Scientific Studies N cases (genes, SNPs, pixels,... ) each with its own

More information

Bayesian inference and the parametric bootstrap

Bayesian inference and the parametric bootstrap Bayesian inference and the parametric bootstrap Bradley Efron Stanford University Abstract The parametric bootstrap can be used for the efficient computation of Bayes posterior distributions. Importance

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

The bootstrap and Markov chain Monte Carlo

The bootstrap and Markov chain Monte Carlo The bootstrap and Markov chain Monte Carlo Bradley Efron Stanford University Abstract This note concerns the use of parametric bootstrap sampling to carry out Bayesian inference calculations. This is only

More information

Bayesian Inference. Chapter 2: Conjugate models

Bayesian Inference. Chapter 2: Conjugate models Bayesian Inference Chapter 2: Conjugate models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in

More information

Empirical Bayes modeling, computation, and accuracy

Empirical Bayes modeling, computation, and accuracy Empirical Bayes modeling, computation, and accuracy Bradley Efron Stanford University Abstract This article is intended as an expositional overview of empirical Bayes modeling methodology, presented in

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Large-Scale Hypothesis Testing

Large-Scale Hypothesis Testing Chapter 2 Large-Scale Hypothesis Testing Progress in statistics is usually at the mercy of our scientific colleagues, whose data is the nature from which we work. Agricultural experimentation in the early

More information

The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes

The problem is: given an n 1 vector y and an n p matrix X, find the b(λ) that minimizes LARS/lasso 1 Statistics 312/612, fall 21 Homework # 9 Due: Monday 29 November The questions on this sheet all refer to (a slight modification of) the lasso modification of the LARS algorithm, based on

More information

MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California

MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING. Bradley Efron. Division of Biostatistics STANFORD UNIVERSITY Stanford, California MODEL SELECTION, ESTIMATION, AND BOOTSTRAP SMOOTHING By Bradley Efron Technical Report 262 August 2012 Division of Biostatistics STANFORD UNIVERSITY Stanford, California MODEL SELECTION, ESTIMATION, AND

More information

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University

Multiple Testing. Hoang Tran. Department of Statistics, Florida State University Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome

More information

Estimation of a Two-component Mixture Model

Estimation of a Two-component Mixture Model Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Advanced Statistics and Data Mining Summer School

More information

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data

A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

A G-Modeling Program for Deconvolution and Empirical Bayes Estimation

A G-Modeling Program for Deconvolution and Empirical Bayes Estimation A G-Modeling Program for Deconvolution and Empirical Bayes Estimation Balasubramanian Narasimhan Stanford University Bradley Efron Stanford University Abstract Empirical Bayes inference assumes an unknown

More information

Package IGG. R topics documented: April 9, 2018

Package IGG. R topics documented: April 9, 2018 Package IGG April 9, 2018 Type Package Title Inverse Gamma-Gamma Version 1.0 Date 2018-04-04 Author Ray Bai, Malay Ghosh Maintainer Ray Bai Description Implements Bayesian linear regression,

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Post-selection inference with an application to internal inference

Post-selection inference with an application to internal inference Post-selection inference with an application to internal inference Robert Tibshirani, Stanford University November 23, 2015 Seattle Symposium in Biostatistics, 2015 Joint work with Sam Gross, Will Fithian,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5

The locfdr Package. August 19, hivdata... 1 lfdrsim... 2 locfdr Index 5 Title Computes local false discovery rates Version 1.1-2 The locfdr Package August 19, 2006 Author Bradley Efron, Brit Turnbull and Balasubramanian Narasimhan Computation of local false discovery rates

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Adaptive Lasso for correlated predictors

Adaptive Lasso for correlated predictors Adaptive Lasso for correlated predictors Keith Knight Department of Statistics University of Toronto e-mail: keith@utstat.toronto.edu This research was supported by NSERC of Canada. OUTLINE 1. Introduction

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

A Large-Sample Approach to Controlling the False Discovery Rate

A Large-Sample Approach to Controlling the False Discovery Rate A Large-Sample Approach to Controlling the False Discovery Rate Christopher R. Genovese Department of Statistics Carnegie Mellon University Larry Wasserman Department of Statistics Carnegie Mellon University

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

The automatic construction of bootstrap confidence intervals

The automatic construction of bootstrap confidence intervals The automatic construction of bootstrap confidence intervals Bradley Efron Balasubramanian Narasimhan Abstract The standard intervals, e.g., ˆθ ± 1.96ˆσ for nominal 95% two-sided coverage, are familiar

More information

Aspects of Feature Selection in Mass Spec Proteomic Functional Data

Aspects of Feature Selection in Mass Spec Proteomic Functional Data Aspects of Feature Selection in Mass Spec Proteomic Functional Data Phil Brown University of Kent Newton Institute 11th December 2006 Based on work with Jeff Morris, and others at MD Anderson, Houston

More information

spikeslab: Prediction and Variable Selection Using Spike and Slab Regression

spikeslab: Prediction and Variable Selection Using Spike and Slab Regression 68 CONTRIBUTED RESEARCH ARTICLES spikeslab: Prediction and Variable Selection Using Spike and Slab Regression by Hemant Ishwaran, Udaya B. Kogalur and J. Sunil Rao Abstract Weighted generalized ridge regression

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method

Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Controlling the False Discovery Rate: Understanding and Extending the Benjamini-Hochberg Method Christopher R. Genovese Department of Statistics Carnegie Mellon University joint work with Larry Wasserman

More information

Robust methods and model selection. Garth Tarr September 2015

Robust methods and model selection. Garth Tarr September 2015 Robust methods and model selection Garth Tarr September 2015 Outline 1. The past: robust statistics 2. The present: model selection 3. The future: protein data, meat science, joint modelling, data visualisation

More information

Default priors and model parametrization

Default priors and model parametrization 1 / 16 Default priors and model parametrization Nancy Reid O-Bayes09, June 6, 2009 Don Fraser, Elisabeta Marras, Grace Yun-Yi 2 / 16 Well-calibrated priors model f (y; θ), F(y; θ); log-likelihood l(θ)

More information

Biostatistics Advanced Methods in Biostatistics IV

Biostatistics Advanced Methods in Biostatistics IV Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data

False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

STAT 740: Testing & Model Selection

STAT 740: Testing & Model Selection STAT 740: Testing & Model Selection Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 34 Testing & model choice, likelihood-based A common way to

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes: Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Bayesian Analysis for Partially Complete Time and Type of Failure Data

Bayesian Analysis for Partially Complete Time and Type of Failure Data Bayesian Analysis for Partially Complete Time and Type of Failure Data Debasis Kundu Abstract In this paper we consider the Bayesian analysis of competing risks data, when the data are partially complete

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Data Analysis and Uncertainty Part 2: Estimation

Data Analysis and Uncertainty Part 2: Estimation Data Analysis and Uncertainty Part 2: Estimation Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu 1 Topics in Estimation 1. Estimation 2. Desirable

More information

Time Series and Dynamic Models

Time Series and Dynamic Models Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The

More information

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art

Variability within multi-component systems. Bayesian inference in probabilistic risk assessment The current state of the art PhD seminar series Probabilistics in Engineering : g Bayesian networks and Bayesian hierarchical analysis in engeering g Conducted by Prof. Dr. Maes, Prof. Dr. Faber and Dr. Nishijima Variability within

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Least Angle Regression, Forward Stagewise and the Lasso

Least Angle Regression, Forward Stagewise and the Lasso January 2005 Rob Tibshirani, Stanford 1 Least Angle Regression, Forward Stagewise and the Lasso Brad Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani Stanford University Annals of Statistics,

More information

Bayesian Linear Models

Bayesian Linear Models Eric F. Lock UMN Division of Biostatistics, SPH elock@umn.edu 03/07/2018 Linear model For observations y 1,..., y n, the basic linear model is y i = x 1i β 1 +... + x pi β p + ɛ i, x 1i,..., x pi are predictors

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Bayesian Inference. Chapter 9. Linear models and regression

Bayesian Inference. Chapter 9. Linear models and regression Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering

More information

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters

Introduction. Start with a probability distribution f(y θ) for the data. where η is a vector of hyperparameters Introduction Start with a probability distribution f(y θ) for the data y = (y 1,...,y n ) given a vector of unknown parameters θ = (θ 1,...,θ K ), and add a prior distribution p(θ η), where η is a vector

More information

On Simple Step-stress Model for Two-Parameter Exponential Distribution

On Simple Step-stress Model for Two-Parameter Exponential Distribution On Simple Step-stress Model for Two-Parameter Exponential Distribution S. Mitra, A. Ganguly, D. Samanta, D. Kundu,2 Abstract In this paper, we consider the simple step-stress model for a two-parameter

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Introductory Bayesian Analysis

Introductory Bayesian Analysis Introductory Bayesian Analysis Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) satagopj@mskcc.org March 14, 2013 Bayesian Analysis Fit probability models

More information

INTRODUCTION TO BAYESIAN ANALYSIS

INTRODUCTION TO BAYESIAN ANALYSIS INTRODUCTION TO BAYESIAN ANALYSIS Arto Luoma University of Tampere, Finland Autumn 2014 Introduction to Bayesian analysis, autumn 2013 University of Tampere 1 / 130 Who was Thomas Bayes? Thomas Bayes (1701-1761)

More information

Lecture 2: Priors and Conjugacy

Lecture 2: Priors and Conjugacy Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.

More information

Package FDRreg. August 29, 2016

Package FDRreg. August 29, 2016 Package FDRreg August 29, 2016 Type Package Title False discovery rate regression Version 0.1 Date 2014-02-24 Author James G. Scott, with contributions from Rob Kass and Jesse Windle Maintainer James G.

More information

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling

A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling A union of Bayesian, frequentist and fiducial inferences by confidence distribution and artificial data sampling Min-ge Xie Department of Statistics, Rutgers University Workshop on Higher-Order Asymptotics

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017 Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries

More information

Tweedie s Formula and Selection Bias

Tweedie s Formula and Selection Bias Tweedie s Formula and Selection Bias Bradley Efron Stanford University Abstract We suppose that the statistician observes some large number of estimates z i, each with its own unobserved expectation parameter

More information

Introduction to Bayesian Methods

Introduction to Bayesian Methods Introduction to Bayesian Methods Jessi Cisewski Department of Statistics Yale University Sagan Summer Workshop 2016 Our goal: introduction to Bayesian methods Likelihoods Priors: conjugate priors, non-informative

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

On the Comparison of Fisher Information of the Weibull and GE Distributions

On the Comparison of Fisher Information of the Weibull and GE Distributions On the Comparison of Fisher Information of the Weibull and GE Distributions Rameshwar D. Gupta Debasis Kundu Abstract In this paper we consider the Fisher information matrices of the generalized exponential

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K. An Introduction to GAMs based on penalied regression splines Simon Wood Mathematical Sciences, University of Bath, U.K. Generalied Additive Models (GAM) A GAM has a form something like: g{e(y i )} = η

More information

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set:

LASSO. Step. My modified algorithm getlasso() produces the same sequence of coefficients as the R function lars() when run on the diabetes data set: lars/lasso 1 Homework 9 stepped you through the lasso modification of the LARS algorithm, based on the papers by Efron, Hastie, Johnstone, and Tibshirani (24) (= the LARS paper) and by Rosset and Zhu (27).

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions

Noninformative Priors for the Ratio of the Scale Parameters in the Inverted Exponential Distributions Communications for Statistical Applications and Methods 03, Vol. 0, No. 5, 387 394 DOI: http://dx.doi.org/0.535/csam.03.0.5.387 Noninformative Priors for the Ratio of the Scale Parameters in the Inverted

More information

Confidence Distribution

Confidence Distribution Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept

More information

Integrated Objective Bayesian Estimation and Hypothesis Testing

Integrated Objective Bayesian Estimation and Hypothesis Testing Integrated Objective Bayesian Estimation and Hypothesis Testing José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es 9th Valencia International Meeting on Bayesian Statistics Benidorm

More information

Probabilistic Inference for Multiple Testing

Probabilistic Inference for Multiple Testing This is the title page! This is the title page! Probabilistic Inference for Multiple Testing Chuanhai Liu and Jun Xie Department of Statistics, Purdue University, West Lafayette, IN 47907. E-mail: chuanhai,

More information

inferences on stress-strength reliability from lindley distributions

inferences on stress-strength reliability from lindley distributions inferences on stress-strength reliability from lindley distributions D.K. Al-Mutairi, M.E. Ghitany & Debasis Kundu Abstract This paper deals with the estimation of the stress-strength parameter R = P (Y

More information

David Giles Bayesian Econometrics

David Giles Bayesian Econometrics 9. Model Selection - Theory David Giles Bayesian Econometrics One nice feature of the Bayesian analysis is that we can apply it to drawing inferences about entire models, not just parameters. Can't do

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Neutral Bayesian reference models for incidence rates of (rare) clinical events

Neutral Bayesian reference models for incidence rates of (rare) clinical events Neutral Bayesian reference models for incidence rates of (rare) clinical events Jouni Kerman Statistical Methodology, Novartis Pharma AG, Basel BAYES2012, May 10, Aachen Outline Motivation why reference

More information

Selective Sequential Model Selection

Selective Sequential Model Selection Selective Sequential Model Selection William Fithian, Jonathan Taylor, Robert Tibshirani, and Ryan J. Tibshirani August 8, 2017 Abstract Many model selection algorithms produce a path of fits specifying

More information

Bayesian Adjustment for Multiplicity

Bayesian Adjustment for Multiplicity Bayesian Adjustment for Multiplicity Jim Berger Duke University with James Scott University of Texas 2011 Rao Prize Conference Department of Statistics, Penn State University May 19, 2011 1 2011 Rao Prize

More information

Distribution-free ROC Analysis Using Binary Regression Techniques

Distribution-free ROC Analysis Using Binary Regression Techniques Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

1 Hypothesis Testing and Model Selection

1 Hypothesis Testing and Model Selection A Short Course on Bayesian Inference (based on An Introduction to Bayesian Analysis: Theory and Methods by Ghosh, Delampady and Samanta) Module 6: From Chapter 6 of GDS 1 Hypothesis Testing and Model Selection

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information