Introductory Bayesian Analysis

Size: px

Start display at page:

Download "Introductory Bayesian Analysis"

Bathsheba Mason
5 years ago
Views:

1 Introductory Bayesian Analysis Jaya M. Satagopan Memorial Sloan-Kettering Cancer Center Weill Cornell Medical College (Affiliate) March 14, 2013

2 Bayesian Analysis Fit probability models to observed data Unknown parameters Summarize using probability distribution For example, P(mutation increases risk by 10% data) Posterior distribution Prior information External data Elicit from available data

3 Bayes theorem Prior from external source This lecture Loss function, Expected loss Bayesian analysis with data-adaptive prior Minimize squared error loss Bayesian penalized estimation Prior to minimize other loss functions Software packages Winbugs, SAS

4 Part 1. Bayes Theorem

5 Bayes Theorem Random variables: Y and θ Prior distributions: P(Y), P(θ) Conditional distributions: P(Y θ) and P(θ Y) Know P(θ Y), P(Y), and P(θ) Need P(Y θ) [posterior distribution] ( ) = P (! Y )! P Y P Y! ( ) P! ( ) = ( ) ( )! P Y P! Y P (! Y )P( Y )dy "

6 Example Say, 5% of the population has a certain disease. When a person is sick, a particular test is used to determine whether (s)he has this disease. The test gives a positive result 2% of the times when a person actually does not have the disease. The test gives a positive result 95% of the times when the person does indeed have the disease. Now, one person gets a positive test. What is the probability the person has this disease?

7 Example continued Y = 1 (disease) or 0 (no disease) θ = 1 (positive test) or 0 (negative test) KNOWN: P(Y = 1) = 0.05 P(Y = 0) = 1 P(Y = 1) = 0.95 P(θ = 1 Y = 0) = 0.02 P(θ = 1 Y = 1) = 0.95 NEED: P(Y = 1 θ = 1) ( ) = P (! = 1 Y= 1 ) P Y =1 P Y =1! = 1 P! = 1 Y= 1 = P! = 1 Y = 1 = ( ) P (! = 1 ) ( ) ( ) + P! = 1 Y = 0 ( ) P Y =1 ( ) P Y = 1 ( ) P Y = ! ! !0.95 = ( )

8 Example Breast Cancer Risk Case-control sampling Cases (Y = 1) have breast cancer Controls (Y = 0) do not have breast cancer Record BRCA1/2 mutation Mutation present (θ = 1) or absent (θ = 0) Observe P(θ = 1 Y = 1) and P(θ = 1 Y = 0) Mutation frequency in cases and controls Need: P(Y = 1 θ = 1) Disease risk among mutation carriers Satagopan et al (2001) CEBP, 10:

9 Breast cancer risk (continued) Use Bayes theorem P ( Y = 1! = 1 ) = P (! = 1 Y = 1 ) P( Y =1) ( ) P( Y =1) + P (! = 1 Y = 0 ) P Y = 0 P! = 1 Y = 1 P(θ = 1 Y = 1) = mutation frequency in cases P(θ = 1 Y = 0) = mutation frequency in controls ( ) P(Y = 1) = 1 P(Y = 0) = prior information Get prior from external source (SEER Registry)

10 Breast cancer risk (continued) BRCA Muta*on Case Control Present Absent Data for Age group P(θ = 1 Y = 1) = 25/204 P(θ = 1 Y = 0) = 23/1113 P(Y = 1) = Disease risk in the age group (SEER registry) P(Y = 1 θ = 1) = 7.6%

11 Part 2. Loss function, Bayes estimate

12 Loss Function and Expected Loss Parameter θ Decision (estimate) d(y) based on data Y Loss incurred = L(d(Y), θ) 0 Squared error loss L(d(Y), θ) = [d(y) - θ] 2 Absolute deviation L(d(Y), θ) = d(y) - θ Expected loss = Risk = R(d,θ) = E{L(d(Y), θ)} R ( d, θ ) L d( Y ) (, θ ) f ( Y θ ) = dy

13 Bayes Estimation There is no single d that has small R(d,θ) for all θ. No uniformly best d Bayes approach Get d that minimizes the average risk W(d). W(d) is also known as the Bayes risk W d ( ) L d( Y ) (, θ ) f ( Y θ ) dy dg( ) = θ Bayes estimate d B of d: W(d B ) W(d) For squared error loss, d B is the posterior mean of θ d B (Y) = E(θ Y)

14 Part 3. Bayesian analysis with dataadaptive prior parameters GxE example

15 Bayesian analysis of GxE interactions Case-control study Y = 1 (case) Y = 0 (control) Binary risk factors (say) Genetic factor: G = 0, 1 Environmental exposure: E = 0, 1 Mukherjee and Chatterjee (2008). Biometrics, 64: Is there a significant interaction between G and E? Estimate interaction odds ratio and standard error Test: Is this odds ratio = 1? Is this log(odds ratio) = 0?

16 Interaction odds ratio (OR GE ) Y = 0 (Control data) E = 1 E = 0 G = 1 N 011 N 010 G = 0 N 001 N 000 Y = 1 (Case data) E = 1 E = 0 G = 1 N 111 N 110 G = 0 N 101 N 100 OR 0 = Odds of E associated with G among controls OR 1 = Odds of E associated with G among cases OR 0 = N 011 N 000 N 001 N 010 OR 1 = N 111 N 100 N 101 N 110 OR GE = OR 1 OR 0 GE ( GE) ( )- log( OR ) ˆ β = log OR Var ˆ!GE = log OR = ˆ β case 1 - ˆ β control 1 ( ) = Var ( ) ˆ!case + Var ( ) ˆ!control

17 Gene-Environment independence in controls Y = 0 (Control data) E = 1 E = 0 G = 1 N 011 N 010 G = 0 N 001 N 000 OR 0 = N 011 N 000 N 001 N 010 = 1 OR GE = OR 1 Var ( ) ˆ!GE = Var ( ) ˆ!case < Var ( ) ˆ!case +Var ( ) ˆ!control Independence of G and E in controls unknown. So Test: β control = 0 If hypothesis is rejected, estimate interaction OR as β GE = β case - β control. Otherwise, estimate as β GE = β case Then test whether β GE = 0 for interaction Not a good idea!!

18 Weighted estimate Estimate based on preliminary test T for β 0 = 0 ˆ β GE, PT case ( T > c) ˆ GE = I(T < c) ˆ β +I β Weighted average of case-only and case-control estimates. Weights are indicator functions Can do better without requiring preliminary test!! ˆ β GE, w case ( 1- w) ˆ GE = w ˆ β + β Choose w to minimize squared error loss Bayes risk: E data { E ( ˆ β - β )} β GE data GE,w GE

19 Bayes estimate w is function of ( ) Var ˆ β and t 2 = Var( ˆ β ˆ β ) GE GE case Shrinkage ˆ βge, B = ˆ βcase + e estimation e is error due to assuming G and E independence in controls Alternative explanation: An estimate of e is: e Prior for e: N(0, σ 2 ). Bayes estimate of e is ˆ β GE - ˆ β ˆ = ( 2 e ˆ e~ N e, t ) case 2 σ ( e eˆ ) = eˆ σ E t M & C (2008) suggest estimating σ 2 as Var β ( ˆ ) GE Empirical Bayes estimate: ˆ β = ˆ β GE, B case + E ( e eˆ )

20 Advanced Colorectal Adenoma Example 610 cases and 605 controls G = NAT2 acetylation (yes, no) E = Smoking (never, past, current) Note: lack of G and E independence in controls Need case-control estimate EB estimate, credible interval. Is 0 in interval?

21 Summary Uncertainty about underlying assumption Two possible estimates Bayes estimate: weighted average of the two Shrinkage estimation Data-adaptive estimation of prior parameters Minimize squared error loss

22 Part 4. Bayesian penalized estimation Prior to minimize various loss functions

23 Part 4a. Bayesian Ridge Regression Minimize Squared Error Loss Normal Prior

24 GWAS data (Chen and Witte 2007, AJHG, 81: ) 57 unrelated individuals of European ancestry (CEU) HapMap project Outcome = Expression of the CHI3L2 gene Cheung et al 2005, Nature, 437: Risk factors = 39,186 SNPs from Chromosome 1 Illumina 550K array from HapMap SNP rs deemed causal for CHI3L2 expression Goal: How well are the neighboring SNPs ranked well?

25 Application to GWAS Y = continuous (or binary) outcome, length N (subjects) X m = m-th SNP, m = 1, 2,, M (=500K, say) For each SNP, model: Y = µ m + X m β m + error β m is effect of SNP m MLE, std err, p-value Find the significant SNPs Find the SNPs having the 500 smallest p-values Chen and Witte AJHG, 81:

26 Hierarchical modeling Incorporate external information about SNPs Bioinformatics data (Z matrix, user-specified) conservation, various functional categories β = Zπ + U β length G, Z is G K, π is K 1 U is N(0, t 2 T) T is specified Improved estimation via second stage model Prior for β is N(Zπ, t 2 T) Need {(β - Zπ) T -1 (β - Zπ)}/t 2 to be small: Penalization

27 Posterior inference via MCMC Markov chain Monte Carlo approach to get βs Specify prior for β, π, σ 2 π ~ N(0, *) 1/σ 2 ~ Gamma(**, $$) Specify prior for t 2 or fix t 2 Generate samples from full conditional distributions β Y, π, σ 2, t 2, π Y, β, σ 2, t 2, σ 2 Y, β, π, t 2, etc. Itera*on β parameters 1 β 1 β 2 β G 2 β 1 β 2 β G G β 1 β 2 β G Posterior Summaries Avg(β 1 ) Stdev (β 1 ) Avg(β 2 ) Stdev (β 2 ) Avg(β G ) Stdev (β G )

28 Chen and Witte GWAS Example Plot p-values of top 500 SNPs

29 So, what is going on? Y = µ m + X m β m + error MLE of β s ˆ β = ˆ β, ˆ β2,, Variance Vˆ β = Zπ + U, U ~ N(0, t 2 T) MLE of π s ( ˆ ) 1 βg ( ) ( ) T 1 T 2 Z SZ Z S ˆ, S = Vˆ + t T 1 ˆ π = β Bayes estimate of β s ~ β ( I -W ) ˆ β + WZ ˆ π, W = SVˆ Large t 2 : S 0 W and Small t 2 : W I and ~ β ˆ ~ β β = Z ˆ π Shrinkage estimation

30 Some Remarks Sensitivity to choice of prior parameters Instead of p-value, P(β m > 0), m = 1,, G The Bayes estimate ~ β must ideally not be too sensitive to the choice of Z The estimated value of π will depend upon Z, but ideally the Bayes estimate should not.

31 Part 4b. Bayesian LASSO Minimize Absolute deviation Laplace prior

32 Diabetes data (Efron et al 2004, The Annals of Statistics, 32: )

33 Application to the diabetes study Y = continuous (or other type of) outcome (N 1) X = N p vector of risk factors β = p 1 vector of effects (parameters of interest) Find the significant risk factors Y = Xβ + error Park and Casella (2008). J Am Stat Assoc, 103: Many p, potentially correlated risk factors etc Estimate β to minimize β - β 0 for some β 0 (LASSO) β 0 = 0 or β 0 = Zπ, Z given and π must be estimated

34 Bayesian LASSO β - β 0 1 exp{ - β - β 0 } LHS takes the form of a Laplace distribution Y = Xβ + error error ~ N(0, σ 2 I) Laplace prior for β with mean β 0 f = ( β ) 0 j λ = exp 2σ 1 2 2πσ exp λ σ 1 2t β β 2 j λ λ 2 2 ( β β ) exp t dt j 0 j 0 j 2 2σ 2 2σ Mixture of normal prior for β and an exponential prior for its variance

35 Bayesian LASSO setup Y β, σ 2 ~ N ( 2 Xβ, σ I ) β t 2 j σ j 2 σ ~ 2,t 2 j ~ N exponential ( 2 2 0, σ t ) ( 2 λ ) ~ Inverse Gamma, j j ( a,a ) 1 j = 1,, = 1,, p 2 p independent independent t j 2 are latent variables to facilitate MCMC steps a 1 and a 2 are specified (check for sensitivity) λ 2 : empirical estimation from data or specify prior Generally a Gamma(c 1, c 2 ) prior

36 Parameter Estimation Get full conditionals, apply MCMC Bayes estimate of β Posterior median Original LASSO: quadratic programming methods

37 Part 4c. Other Bayesian Penalization Methods Brief survey

38 Bridge Regression Estimate β by minimizing p j = 1 β j Z γ iπ γ is pre-specified γ = 1 is (Bayesian) LASSO γ = 2 is (Bayesian) Ridge Fu 1998, JCGS, 7:

39 Bayesian Elasticnet Estimate β by minimizing λ p j = 1 β j Z π i + p ( 1 - λ) ( β ) j Ziπ j= 1 Compromise between LASSO and Ridge penalties 2 Normal prior constrained within certain bounds Hans (2011). J Am Stat Assoc, 106:

40 Software Packages WinBUGS Specify model for outcome Specify priors Output estimated values of β and other parameters Uses MCMC methods Diagnostic plots contents.shtml SAS Proc MCMC HTML/default/viewer.htm#mcmc_toc.htm

41 References: Textbooks JS Maritz and T Lwin (1989). Empirical Bayes Methods. Chapman and Hall. JM Bernardo and AFM Smith (1993). Bayesian Theory. Wiley. BP Carlin and TA Louis (1996). Bayes and empirical Bayes methods for data analysis. Chapman and Hall. A Gelman, JB Carlin, HS Stern, DB Rubin (1996). Bayesian data analysis. Chapman and Hall. WR Gilks, S Richardson, DJ Spiegelhalter (1996). Markov chain Monte Carlo in practice. Chapman and Hall. T Hastie, R Tibshirani, J Friedman (2001). The Elements of Statistical Learning. Springer.

42 References: Some papers R Tibshirani (1996). Regression shrinkage and selection via the Lasso. JRSS Series B, 58: J Fu (1998). Penalized regression: The Bridge versus the Lasso. JCGS, 7: MA Newton and Y Lee (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics 56: JM Satagopan, K Offit, W Foulkes, ME Robson, S Wacholder, CM Eng, SE Karp, CB Begg (2001). The lifetime risks of breast cancer in Ashkenazi Jewish carriers of BRCA1 and BRCA2 mutations. Cancer Epidemiology,Biomarkers and Prevention 10:

43 References: Some papers CM Kendziorski, MA Newton, H Lan, MN Gould (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22: D Conti, V Cortessis, J Molitor, DC Thomas (2003). Bayesian modeling of complex metabolic pathways. Human Heredity, 56: B Efron, T Hastie, I Johnstone, R Tibshirani (2004). Least angle regression. The Annals of Statistics, 32: B Mukherjee, N Chatterjee (2008). Exploiting gene-environment independence for analysis of case-control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64:

44 References: Some papers GK Chen, JS Witte (2007). Enriching the analysis of genomewide association studies with hierarchical modeling. AJHG, 81: T Park, G Casella (2008). The Bayesian Lasso. JASA, 103: M Park, T Hastie (2008). Penalized logistic regression for.detecting gene interactions. Biostatistics, 9: C Hans (2011). Elastic net regression modeling with the orthant normal prior. JASA, 106: Many more: Bioinformatics, Genetic Epidemiology, JASA, JRSS Series B and C, PLoS One,

Bayesian Inference. Chapter 1. Introduction and basic concepts

Bayesian Inference. Chapter 1. Introduction and basic concepts Bayesian Inference Chapter 1. Introduction and basic concepts M. Concepción Ausín Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master