Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models
|
|
- Susan Jordan
- 5 years ago
- Views:
Transcription
1 Simultaneous Inference for Multiple Testing and Clustering via Dirichlet Process Mixture Models David B. Dahl Department of Statistics Texas A&M University Marina Vannucci, Michael Newton, & Qianxing Mo 3 rd Lehmann Symposium Rice University May 16, 2007 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 1 / 41
2 Michael Newton Quincy Mo Marina Vannucci D. B. Dahl, M. A. Newton (200?), Multiple Hypothesis Testing by Clustering Treatment Effects of Correlated Objects, Journal of the American Statistical Association, accepted. D. B. Dahl (2006), Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model, in Bayesian Inference for Gene Expression and Proteomics, Kim-Anh Do, Peter Müller, Marina Vannucci (Eds.), Cambridge University Press. D. B. Dahl, Q. Mo, M. Vannucci (200?), Simultaneous Inference for Multiple Testing and Clustering via a Dirichlet Process Mixture Model, Statistical Modelling: An International Journal, accepted. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 2 / 41
3 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 3 / 41
4 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 4 / 41
5 Two Statistical Tasks Multiple hypothesis testing: Goal: Detect shift in marginal distribution of gene expression. Statistical dependence among genes is a nuisance parameter. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 5 / 41
6 Two Statistical Tasks Multiple hypothesis testing: Goal: Detect shift in marginal distribution of gene expression. Statistical dependence among genes is a nuisance parameter. Clustering: Goal: Group genes that are highly correlated. Correlation may reflect underlying biological factors of interest. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 5 / 41
7 Two Statistical Tasks Multiple hypothesis testing: Goal: Detect shift in marginal distribution of gene expression. Statistical dependence among genes is a nuisance parameter. Clustering: Goal: Group genes that are highly correlated. Correlation may reflect underlying biological factors of interest. We propose a hybrid methodology... Main Idea Simultaneously infer clustering & test for differential expression David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 5 / 41
8 Two Statistical Tasks Multiple hypothesis testing: Goal: Detect shift in marginal distribution of gene expression. Statistical dependence among genes is a nuisance parameter. Clustering: Goal: Group genes that are highly correlated. Correlation may reflect underlying biological factors of interest. We propose a hybrid methodology... Main Idea Simultaneously infer clustering & test for differential expression Other work: Storey (2007), Yuan & Kendziorski (2006), Tibshirani & Wasserman (2006) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 5 / 41
9 Gene 1 14 Gene Expression Treatment David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 6 / 41
10 Gene 2 14 Gene Expression Treatment David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 6 / 41
11 14 Gene Expression Treatment David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 6 / 41
12 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 7 / 41
13 Elementary Setup Parameters θ 1,..., θ n for n observations. Hypotheses: H 0i : θ i = 0, vs. H ai : θ i > 0 Test statistics Z 1,..., Z n are independent and Z i N(θ i, 1) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 8 / 41
14 Method I Standard Z -test in which H 0i is rejected if Z i > z, where z is chosen to achieve the desired size. The test has power: 1 Φ(z θ i ) where Φ(x) is the standard normal distribution function evaluated at x. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 9 / 41
15 Method II Assumes a known clustering: c ij = I{θ i = θ j }. Test statistic: S i = Z i + c ij Z j. i j The test has power: 1 Φ(z n (i) θ i ) where n (i) = n j=1 c ij Method 2 is never less powerful than Method 1. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 10 / 41
16 Method III Clustering indicators c ij s are estimated: ĉ ij = { c ij γ is the error rate of clustering. with probability 1 γ 1 c ij with probability γ, Take Method II, but replace c ij with ĉ ij to form Ŝi. Under an assumption about the distribution of θ 1,..., θ n, the test has power: 1 Φ(z kθ i ) where k is a constant involving γ, n (i), etc. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 11 / 41
17 Power("Method 3") / Power("Method 1") n (i) = 30 n (i) = γ = Error Probability David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 12 / 41
18 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 13 / 41
19 BEMMA for Diff. Expression Dahl, Newton (200?) Bayesian Effects Model for Microarrays (BEMMA): Conjugate Dirichlet process mixture (DPM) model. Identifies differentially expressed genes by borrowing strength from genes likely to have the same parameters. Averages over clustering uncertainty. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 14 / 41
20 BEMMA for Diff. Expression Dahl, Newton (200?) Bayesian Effects Model for Microarrays (BEMMA): Conjugate Dirichlet process mixture (DPM) model. Identifies differentially expressed genes by borrowing strength from genes likely to have the same parameters. Averages over clustering uncertainty. Sampling model: y gtr µ g, τ gt, λ g N(y gtr µ g + τ gt, λ g ), where r is replicate, t is treatment, and g is gene. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 14 / 41
21 BEMMA for Diff. Expression Dahl, Newton (200?) Bayesian Effects Model for Microarrays (BEMMA): Conjugate Dirichlet process mixture (DPM) model. Identifies differentially expressed genes by borrowing strength from genes likely to have the same parameters. Averages over clustering uncertainty. Sampling model: y gtr µ g, τ gt, λ g N(y gtr µ g + τ gt, λ g ), where r is replicate, t is treatment, and g is gene. Genes g and g come from the same cluster iff: (τ g1,..., τ gt, λ g ) = (τ g 1,..., τ g T, λ g ) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 14 / 41
22 Nuisance Parameters Gene-specific means µ 1,..., µ G are not related to differential expression or clustering. 14 Gene Expression Treatment David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 15 / 41
23 Nuisance Parameters Gene-specific means µ 1,..., µ G are not related to differential expression or clustering. 14 Gene Expression Treatment Let d g be a vector whose elements are y gtr y g1 for t 2. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 15 / 41
24 Model Sampling distribution: d g τ g, λ g N N (d g Xτ g, λ g M), where τ g = (τ g2,..., τ gt ), M = ( I + 1 R 1 J ) 1, and X is a design matrix picking off the appropriate element of τ g. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 16 / 41
25 Model Sampling distribution: d g τ g, λ g N N (d g Xτ g, λ g M), where τ g = (τ g2,..., τ gt ), M = ( I + 1 R 1 J ) 1, and X is a design matrix picking off the appropriate element of τ g. Clustering based on (τ, λ) via a Dirichlet process prior: (τ g, λ g ) F F F DP(αF 0 ), where F 0 is conjugate to the likelihood. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 16 / 41
26 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
27 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
28 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
29 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
30 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
31 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
32 David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 17 / 41
33 Model Fitting The τ s and λ s may be integrated away, leaving only the clustering of the G genes. Sample from posterior clustering distribution using MCMC. Gibbs of MacEachern (1994) and Neal (1992) Merge-Split of Jain & Neal (2004) Merge-Split of Dahl (2003) After MCMC, it s easy to sample τ s and λ s given clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 18 / 41
34 Inference on Differential Expression Define a univariate parameter q g that encodes the hypothesis of interest. For example, the global F-test in one-way ANOVA setting is analogous to: T q g = Estimate q g under squared-error loss by computing its expection with respect to p(q g d 1,..., d G ). Rank genes for evidence for differential expression using the estimates ˆq 1,..., ˆq G. t=2 τ 2 gt David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 19 / 41
35 Simulation Study Some other methods for differential expression: EBarrays (Kendziorski, Newton, et al., 2003) LIMMA (Smyth 2004) Comparison based on proportion of false discoveries. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 20 / 41
36 Simulation Study Some other methods for differential expression: EBarrays (Kendziorski, Newton, et al., 2003) LIMMA (Smyth 2004) Comparison based on proportion of false discoveries. Simulate datasets: Time-course experiment: Three time points Two treatment conditions 300 of 1,200 genes are differentially expressed. Interest lies in genes that are differentially expressed at one or more time points. Four levels of clustering: Heavy Clustering: 12 clusters of 100 genes per cluster. Moderate Clustering: 60 clusters of 20 genes per cluster. Weak Clustering: 240 clusters of 5 genes per cluster. No Clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 20 / 41
37 Heavy Clustering False Discovery Rate Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 21 / 41
38 Moderate Clustering False Discovery Rate Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 22 / 41
39 Weak Clustering False Discovery Rate Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 23 / 41
40 No Clustering False Discovery Rate Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 24 / 41
41 Paraquat Experiment Old and young mice treated with paraquat injection. Sacrifice as baseline or 1, 3, 5, or 7 hours after injection. Three replicates per treatment. 10,043 probe sets on Affymetrix MG-U74A arrays. Background correction and normalization using RMA (Irizarry et al., 2003). Biologists are interested in genes whose expression between old and young is similar at baseline and very different at one hour. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 25 / 41
42 Estimated Treatment Effects for Probe Set 92885_at Deviation from Reference Yng Old Yng Old Yng Old Yng Old Yng Old Baseline 1 hour 3 hours 5 hours 7 hours David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 26 / 41
43 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 27 / 41
44 Inference on Clustering Dahl (200?) MCMC sampling algorithm produces B clusterings π (1),..., π (B) from the posterior clustering distribution. Point estimation methods: Maximum a posteriori (MAP) clustering Medvedovic & Sivaganesan (2002): hierarchical clustering using pairwise probabilities Dahl (2006): stochastic search to minimize posterior expected loss from Binder (1978) Lau & Green (200?): heuristic to minimize posterior expected loss from Binder (1978) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 28 / 41
45 Inference on Clustering Dahl (200?) MCMC sampling algorithm produces B clusterings π (1),..., π (B) from the posterior clustering distribution. Point estimation methods: Maximum a posteriori (MAP) clustering Medvedovic & Sivaganesan (2002): hierarchical clustering using pairwise probabilities Dahl (2006): stochastic search to minimize posterior expected loss from Binder (1978) Lau & Green (200?): heuristic to minimize posterior expected loss from Binder (1978) Selects the observed clustering closest to the matrix of pairwise probabilities in terms of squared distances: π LS = arg min π {π (1),...,π (B) } G i=1 j=1 G (δ i,j (π) ˆp i,j ) 2 (1) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 28 / 41
46 Heavy Clustering Degree of Clustering Adjusted Rand Index Clustering Method w/ 95% C.I. MCLUST (0.380, 0.447) BEMMA(least-squares) (0.373, 0.431) BEMMA(map) (0.362, 0.419) HCLUST(effects,average) (0.247, 0.308) HCLUST(effects,complete) (0.242, 0.279) HCLUST(correlation,complete) (0.144, 0.180) HCLUST(correlation,average) (0.141, 0.172) Heavy Table: Adjusted Rand Index for BEMMA and Other Methods. Large values of the adjusted Rand index indicate better agreement between the estimated and true clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 29 / 41
47 Moderate Clustering Degree of Clustering Adjusted Rand Index Clustering Method w/ 95% C.I. BEMMA(least-squares) (0.146, 0.163) MCLUST (0.136, 0.152) BEMMA(map) (0.119, 0.135) HCLUST(effects,complete) (0.111, 0.123) HCLUST(effects,average) (0.095, 0.107) HCLUST(correlation,average) (0.075, 0.083) HCLUST(correlation,complete) (0.068, 0.078) Moderate Table: Adjusted Rand Index for BEMMA and Other Methods. Large values of the adjusted Rand index indicate better agreement between the estimated and true clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 30 / 41
48 Weak Clustering Degree of Clustering Adjusted Rand Index Clustering Method w/ 95% C.I. MCLUST (0.048, 0.052) HCLUST(effects,complete) (0.043, 0.048) BEMMA(least-squares) (0.040, 0.043) HCLUST(effects,average) (0.035, 0.038) BEMMA(map) (0.030, 0.033) HCLUST(correlation,average) (0.027, 0.030) HCLUST(correlation,complete) (0.025, 0.029) Weak Table: Adjusted Rand Index for BEMMA and Other Methods. Large values of the adjusted Rand index indicate better agreement between the estimated and true clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 31 / 41
49 Posterior Distribution of the Number of Clusters Density Number of Clusters David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 32 / 41
50 Effects Intensity Plot of Cluster of 92885_at 250 Genes Grouped by Inferred Clusters Young Old Young Old Young Old Young Old Young Old Baseline 1 hour 3 hours 5 hours 7 hours David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 33 / 41
51 Effects Intensity Plot for All Clusters Genes Grouped by Inferred Clusters Young Old Young Old Young Old Young Old Young Old Baseline 1 hour 3 hours 5 hours 7 hours David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 34 / 41
52 Effects Intensity Plot for Smallest Clusters Genes Grouped by Inferred Clusters Young Old Young Old Young Old Young Old Young Old Baseline 1 hour 3 hours 5 hours 7 hours David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 35 / 41
53 Outline 1 Motivation 2 Z-scores Demonstration 3 BEMMA for Differential Expression Dahl, Newton (200?) 4 BEMMA for Clustering Dahl (200?) 5 SIMTAC for DE and Clustering Dahl, Mo, Vannucci (200?) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 36 / 41
54 SIMTAC Dahl, Mo, Vannucci (200?) Simultaneous Inference for Multiple Testing and Clustering (SIMTAC): Extension of BEMMA Separates clustering of regression coefficients from accommodation of heteroscedasticity Wider class of experimental designs No need to specify an arbitrary reference treatment Nonconjugate Dirichlet process mixture (DPM) model Sampling distribution: d g µ g, β g, λ g N K ( d g µ g j + Xβ g, λ g M ), Prior distribution: µ g N ( µ g m µ, p µ ) β g G β G β G β DP ( α β Gβ ) λ g G λ G λ G λ DP (α λ Gλ ) David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 37 / 41
55 Simulation Study Relationship of Regression Coefficients Number of Clusters Size of Encoding Equivalent and Differential Expression with this Each Cluster Time Point 1 Time Point 2 Time Point 3 Configuration 120 β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 = β g, β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 β g,2 1 5 β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 = β g,2 2 5 β g,3 = 0 β g,4 β g,1 β g,5 β g,2 2 5 β g,3 0 β g,4 β g,1 β g,5 β g,2 1 2 β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 = β g,2 4 2 β g,3 = 0 β g,4 β g,1 β g,5 β g,2 4 2 β g,3 0 β g,4 β g,1 β g,5 β g,2 4 1 β g,3 = 0 β g,4 = β g,1 β g,5 = β g, β g,3 = 0 β g,4 β g,1 β g,5 = β g,2 5 1 β g,3 = 0 β g,4 β g,1 β g,5 β g,2 5 1 β g,3 0 β g,4 β g,1 β g,5 β g,2 5 1 β g,3 = 0 β g,4 = β g,1 β g,5 β g,2 5 1 β g,3 0 β g,4 β g,1 β g,5 = β g,2 5 Table: Clusters in a Synthetic Dataset. For the 216 clusters in each synthetic dataset, this table shows the relationship among and the cluster sizes for the regression coefficients. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 38 / 41
56 Proportion of False Discoveries Proportion of False Discoveries ANOVA BEMMA SIMTAC Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 39 / 41
57 Difference in Proportion of False Discoveries Difference in Proportion of False Discoveries ANOVA SIMTAC BEMMA SIMTAC Number of Discoveries David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 40 / 41
58 Summary Dependence can be exploited to improve power in multiple testing. Dirichlet process mixture (DPM) models provide a powerful machinery to accomplish simultaneous inference on clustering and multiple hypothesis testing. BEMMA: Under weak clustering, BEMMA performs as well as its peers. Under heavier clustering, BEMMA performs substantially better. BEMMA has been successfully applied to a replicated microarray study with 10,000+ probesets and 10 treatment conditions. SIMTAC: Improvemed implementation of the the idea. Simulation results are encouraging... now applying to local data. Least-squares clustering: Convenient and conceptually appealing procedure for point estimation of clustering. David Dahl (Texas A&M) Multiple Testing & Clustering 3 rd Lehmann Symposium 41 / 41
Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model
Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering
More informationSpiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models
Bayesian Analysis (2009) 4, Number 4, pp. 707 732 Spiked Dirichlet Process Prior for Bayesian Multiple Hypothesis Testing in Random Effects Models Sinae Kim, David B. Dahl and Marina Vannucci Abstract.
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationDepartment of Statistics, The Wharton School, University of Pennsylvania
Submitted to the Annals of Applied Statistics BAYESIAN TESTING OF MANY HYPOTHESIS MANY GENES: A STUDY OF SLEEP APNEA BY SHANE T. JENSEN Department of Statistics, The Wharton School, University of Pennsylvania
More informationDavid B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison
AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University
More informationBAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA
BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA Intro: Course Outline and Brief Intro to Marina Vannucci Rice University, USA PASI-CIMAT 04/28-30/2010 Marina Vannucci
More informationHierarchical Mixture Models for Expression Profiles
2 Hierarchical Mixture Models for Expression Profiles MICHAEL A. NEWTON, PING WANG, AND CHRISTINA KENDZIORSKI University of Wisconsin at Madison Abstract A class of probability models for inference about
More informationDistance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics
Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.
More informationSequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models
Sequentially-Allocated Merge-Split Sampler for Conjugate and Nonconjugate Dirichlet Process Mixture Models David B. Dahl* Texas A&M University November 18, 2005 Abstract This paper proposes a new efficient
More informationData Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 4: Data Analysis and Uncertainty Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Why uncertainty? Why should data mining care about uncertainty? We
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationNon-specific filtering and control of false positives
Non-specific filtering and control of false positives Richard Bourgon 16 June 2009 bourgon@ebi.ac.uk EBI is an outstation of the European Molecular Biology Laboratory Outline Multiple testing I: overview
More informationColouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles
Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December
More informationParametric Empirical Bayes Methods for Microarrays
Parametric Empirical Bayes Methods for Microarrays Ming Yuan, Deepayan Sarkar, Michael Newton and Christina Kendziorski April 30, 2018 Contents 1 Introduction 1 2 General Model Structure: Two Conditions
More information28 Bayesian Mixture Models for Gene Expression and Protein Profiles
28 Bayesian Mixture Models for Gene Expression and Protein Profiles Michele Guindani, Kim-Anh Do, Peter Müller and Jeff Morris M.D. Anderson Cancer Center 1 Introduction We review the use of semi-parametric
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationA GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE
A GENERAL DECISION THEORETIC FORMULATION OF PROCEDURES CONTROLLING FDR AND FNR FROM A BAYESIAN PERSPECTIVE Sanat K. Sarkar 1, Tianhui Zhou and Debashis Ghosh Temple University, Wyeth Pharmaceuticals and
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationMultiple Testing. Hoang Tran. Department of Statistics, Florida State University
Multiple Testing Hoang Tran Department of Statistics, Florida State University Large-Scale Testing Examples: Microarray data: testing differences in gene expression between two traits/conditions Microbiome
More informationLow-Level Analysis of High- Density Oligonucleotide Microarray Data
Low-Level Analysis of High- Density Oligonucleotide Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley UC Berkeley Feb 23, 2004 Outline
More information29 Sample Size Choice for Microarray Experiments
29 Sample Size Choice for Microarray Experiments Peter Müller, M.D. Anderson Cancer Center Christian Robert and Judith Rousseau CREST, Paris Abstract We review Bayesian sample size arguments for microarray
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationA Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles
A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationResearch Article Spiked Dirichlet Process Priors for Gaussian Process Models
Hindawi Publishing Corporation Journal of Probability and Statistics Volume 200, Article ID 20489, 4 pages doi:0.55/200/20489 Research Article Spiked Dirichlet Process Priors for Gaussian Process Models
More informationPredicting Protein Functions and Domain Interactions from Protein Interactions
Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput
More informationA Bayesian Mixture Model for Differential Gene Expression
A Bayesian Mixture Model for Differential Gene Expression Kim-Anh Do, Peter Müller and Feng Tang 1 Abstract We propose model-based inference for differential gene expression, using a non-parametric Bayesian
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationOptimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays
Johns Hopkins University, Dept. of Biostatistics Working Papers 2-2-2004 Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays Peter Muller University of Texas M.D. Anderson
More informationA Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data
A Sequential Bayesian Approach with Applications to Circadian Rhythm Microarray Gene Expression Data Faming Liang, Chuanhai Liu, and Naisyin Wang Texas A&M University Multiple Hypothesis Testing Introduction
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationEmpirical Bayes Moderation of Asymptotically Linear Parameters
Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi
More informationGraphical Models for Query-driven Analysis of Multimodal Data
Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationNonparametric Bayesian modeling for dynamic ordinal regression relationships
Nonparametric Bayesian modeling for dynamic ordinal regression relationships Athanasios Kottas Department of Applied Mathematics and Statistics, University of California, Santa Cruz Joint work with Maria
More informationBayesian Wavelet-Based Functional Mixed Models
Bayesian Wavelet-Based Functional Mixed Models Jeffrey S. Morris U M.D. Anderson Cancer Center Raymond J. Carroll exas A&M University Functional Data Analysis Functional Data: Ideal units of observation:
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationLesson 11. Functional Genomics I: Microarray Analysis
Lesson 11 Functional Genomics I: Microarray Analysis Transcription of DNA and translation of RNA vary with biological conditions 3 kinds of microarray platforms Spotted Array - 2 color - Pat Brown (Stanford)
More informationEstimation of a Two-component Mixture Model
Estimation of a Two-component Mixture Model Bodhisattva Sen 1,2 University of Cambridge, Cambridge, UK Columbia University, New York, USA Indian Statistical Institute, Kolkata, India 6 August, 2012 1 Joint
More informationThe optimal discovery procedure: a new approach to simultaneous significance testing
J. R. Statist. Soc. B (2007) 69, Part 3, pp. 347 368 The optimal discovery procedure: a new approach to simultaneous significance testing John D. Storey University of Washington, Seattle, USA [Received
More informationControlling Bayes Directional False Discovery Rate in Random Effects Model 1
Controlling Bayes Directional False Discovery Rate in Random Effects Model 1 Sanat K. Sarkar a, Tianhui Zhou b a Temple University, Philadelphia, PA 19122, USA b Wyeth Pharmaceuticals, Collegeville, PA
More informationHybrid Dirichlet processes for functional data
Hybrid Dirichlet processes for functional data Sonia Petrone Università Bocconi, Milano Joint work with Michele Guindani - U.T. MD Anderson Cancer Center, Houston and Alan Gelfand - Duke University, USA
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationABTEKNILLINEN KORKEAKOULU
Two-way analysis of high-dimensional collinear data 1 Tommi Suvitaival 1 Janne Nikkilä 1,2 Matej Orešič 3 Samuel Kaski 1 1 Department of Information and Computer Science, Helsinki University of Technology,
More informationGenetic Association Studies in the Presence of Population Structure and Admixture
Genetic Association Studies in the Presence of Population Structure and Admixture Purushottam W. Laud and Nicholas M. Pajewski Division of Biostatistics Department of Population Health Medical College
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationSTAT 425: Introduction to Bayesian Analysis
STAT 425: Introduction to Bayesian Analysis Marina Vannucci Rice University, USA Fall 2017 Marina Vannucci (Rice University, USA) Bayesian Analysis (Part 1) Fall 2017 1 / 10 Lecture 7: Prior Types Subjective
More informationParameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1
Parameter Estimation William H. Jefferys University of Texas at Austin bill@bayesrules.net Parameter Estimation 7/26/05 1 Elements of Inference Inference problems contain two indispensable elements: Data
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationFalse discovery rate and related concepts in multiple comparisons problems, with applications to microarray data
False discovery rate and related concepts in multiple comparisons problems, with applications to microarray data Ståle Nygård Trial Lecture Dec 19, 2008 1 / 35 Lecture outline Motivation for not using
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationSummary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing
Summary and discussion of: Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing Statistics Journal Club, 36-825 Beau Dabbs and Philipp Burckhardt 9-19-2014 1 Paper
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationSimilarity Measures and Clustering In Genetics
Similarity Measures and Clustering In Genetics Daniel Lawson Heilbronn Institute for Mathematical Research School of mathematics University of Bristol www.paintmychromosomes.com Talk outline Introduction
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationDETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING THE FALSE DISCOVERY RATE FOR MICROARRAY DATA
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Dissertations and Theses in Statistics Statistics, Department of 2009 DETECTING DIFFERENTIALLY EXPRESSED GENES WHILE CONTROLLING
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationMarginal Specifications and a Gaussian Copula Estimation
Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationVCMC: Variational Consensus Monte Carlo
VCMC: Variational Consensus Monte Carlo Maxim Rabinovich, Elaine Angelino, Michael I. Jordan Berkeley Vision and Learning Center September 22, 2015 probabilistic models! sky fog bridge water grass object
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationPart 7: Hierarchical Modeling
Part 7: Hierarchical Modeling!1 Nested data It is common for data to be nested: i.e., observations on subjects are organized by a hierarchy Such data are often called hierarchical or multilevel For example,
More informationBayesian Inference of Multiple Gaussian Graphical Models
Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationIntroduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models
Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models Matthew S. Johnson New York ASA Chapter Workshop CUNY Graduate Center New York, NY hspace1in December 17, 2009 December
More informationThe Calibrated Bayes Factor for Model Comparison
The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationProbe-Level Analysis of Affymetrix GeneChip Microarray Data
Probe-Level Analysis of Affymetrix GeneChip Microarray Data Ben Bolstad http://www.stat.berkeley.edu/~bolstad Biostatistics, University of California, Berkeley University of Minnesota Mar 30, 2004 Outline
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationA Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification
A Unified Approach for Simultaneous Gene Clustering and Differential Expression Identification Ming Yuan and Christina Kendziorski (March 17, 2005) Abstract Although both clustering and identification
More informationHow To Use CORREP to Estimate Multivariate Correlation and Statistical Inference Procedures
How To Use CORREP to Estimate Multivariate Correlation and Statistical Inference Procedures Dongxiao Zhu June 13, 2018 1 Introduction OMICS data are increasingly available to biomedical researchers, and
More informationHybrid Copula Bayesian Networks
Kiran Karra kiran.karra@vt.edu Hume Center Electrical and Computer Engineering Virginia Polytechnic Institute and State University September 7, 2016 Outline Introduction Prior Work Introduction to Copulas
More informationExtending the Robust Means Modeling Framework. Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie
Extending the Robust Means Modeling Framework Alyssa Counsell, Phil Chalmers, Matt Sigal, Rob Cribbie One-way Independent Subjects Design Model: Y ij = µ + τ j + ε ij, j = 1,, J Y ij = score of the ith
More informationSummary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff
Summary of Extending the Rank Likelihood for Semiparametric Copula Estimation, by Peter Hoff David Gerard Department of Statistics University of Washington gerard2@uw.edu May 2, 2013 David Gerard (UW)
More informationExpression arrays, normalization, and error models
1 Epression arrays, normalization, and error models There are a number of different array technologies available for measuring mrna transcript levels in cell populations, from spotted cdna arrays to in
More informationData Preprocessing. Data Preprocessing
Data Preprocessing 1 Data Preprocessing Normalization: the process of removing sampleto-sample variations in the measurements not due to differential gene expression. Bringing measurements from the different
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationNuisance parameters and their treatment
BS2 Statistical Inference, Lecture 2, Hilary Term 2008 April 2, 2008 Ancillarity Inference principles Completeness A statistic A = a(x ) is said to be ancillary if (i) The distribution of A does not depend
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationBayesian Statistics. Debdeep Pati Florida State University. April 3, 2017
Bayesian Statistics Debdeep Pati Florida State University April 3, 2017 Finite mixture model The finite mixture of normals can be equivalently expressed as y i N(µ Si ; τ 1 S i ), S i k π h δ h h=1 δ h
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationBIOLOGY 111. CHAPTER 1: An Introduction to the Science of Life
BIOLOGY 111 CHAPTER 1: An Introduction to the Science of Life An Introduction to the Science of Life: Chapter Learning Outcomes 1.1) Describe the properties of life common to all living things. (Module
More informationIdentifying Mixtures of Mixtures Using Bayesian Estimation
Identifying Mixtures of Mixtures Using Bayesian Estimation arxiv:1502.06449v3 [stat.me] 20 Jun 2016 Gertraud Malsiner-Walli Department of Applied Statistics, Johannes Kepler University Linz and Sylvia
More information