Variable Selection in Structured High-dimensional Covariate Spaces
|
|
- Lorin Chase
- 5 years ago
- Views:
Transcription
1 Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May
2 Problem Formulation Consider the standard linear regression problem: Y = Xβ + ɛ, (1) where Y is n 1 response, X is n p covariates, ɛ is n 1 independent errors,
3 Problem Formulation Consider the standard linear regression problem: Y = Xβ + ɛ, (1) where Y is n 1 response, X is n p covariates, ɛ is n 1 independent errors, Many problems in genomics fall into the scenario: 1. Known structure exists in the covariate space. 2. p large (possibly > n), want sparse estimation of β.
4 Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i.
5 Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands.
6 Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands. Example of X i,: :
7 Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands. Example of X i,: : At each location, we observe a noisy measurement of the underlying copy number.
8 Example 1: Chromosome Copy Number Pollack et al. (2002) breast cancer data set:
9 Expression of key oncogenes: ERBB2 has elevated expression in 30% of breast cancers. It is correlated with more aggressive cancer. What controls the expression of ERBB2?
10 Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i.
11 Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i. Scale: i in the thousands, w in the thousands.
12 Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i. Scale: i in the thousands, w in the thousands. Examples of w: ACGCGTT ACGCGTG TCGCGTA TCGCGGA The motifs for a single transcription factor tend to be clustered. More on this later...
13 Review in Structured Variable Selection Lasso (L 1 penalty) type 1. Fused-Lasso (Tibshirani et al., 2005): 1-d smoothing. 2. Grouped Lasso (Yuan and Lin, 2006): variables added and dropped in groups.
14 Review in Structured Variable Selection Lasso (L 1 penalty) type 1. Fused-Lasso (Tibshirani et al., 2005): 1-d smoothing. 2. Grouped Lasso (Yuan and Lin, 2006): variables added and dropped in groups. Bayesian variable selection framework (next slide): 1. Gibbs sampler, George and McCulloch, (1993) 2. Many improvements since then... We apply BVS in structured settings to genomic analysis.
15 Review: Latent Variable Model Define latent variables: γ i {0, 1}, i = 1,..., p.
16 Review: Latent Variable Model Define latent variables: Conditioned on γ i : γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2)
17 Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3)
18 Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) Conjugate prior for variance: β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3) σ 2 γ IG(ν γ /2, ν γ λ γ /2)
19 Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) Conjugate prior for variance: Likelihood for observed data: β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3) σ 2 γ IG(ν γ /2, ν γ λ γ /2) Y β, X N(Xβ, σ 2 I).
20 First, a simple Markov prior Transition matrix P = ( p 1 p 1 q q Assume that γ 1 π, where π = ( 1 q ). 2 p q, stationary distribution with regards to P. ) 1 p 2 p q is the
21 First, a simple Markov prior Transition matrix P = ( p 1 p 1 q q Assume that γ 1 π, where π = ( 1 q ). 2 p q, stationary distribution with regards to P. ) 1 p 2 p q is the More interpretable to re-parameterize the prior as: r = π 1 π 0 = (1 p) q (1 q) 1 p : prior prob ratio of inclusion of a variable, w = : Fold change in probability of inclusion of next variable.
22 Two strategies for Gibbs sampler Auxiliary chain: f (γ Y) is sampled from the auxiliary Markov chain β 0, σ 0, γ 0, β 1, σ 1, γ 1,... Direct chain: f (γ Y) is directly sampled from γ 0, γ 1, γ 2,...
23 Auxiliary Chain Posterior distribution of β: β j N p (A γ j 1X Y, A γ j 1), (4) where A γ j 1 = [X X + D 1 γ j 1 R 1 D 1 γ j 1 ] 1. Hierarchical structure of the model γ i β i Y i implies f (γ j Y j 1, β j 1, σ j 1 ) = f (γ j β j 1 ). Posterior f (γ j β j 1 ) is an inhomogeneous Markov chain. Posterior distribution of σ j σ j IG ( n + νγ j 1 2, Y Xβj 2 + ν γ j 1λ γ j 1 2 ).
24 Auxiliary Chain: Computational Notes Two computational intensive tasks: 1. inverting a p p matrix to obtain A γ. 2. computing square root of A γ. Traditionally done by Cholesky-type decomposition, O(p 3 ) computation per sweep.
25 Auxiliary Chain: Computational Notes Two computational intensive tasks: 1. inverting a p p matrix to obtain A γ. 2. computing square root of A γ. Traditionally done by Cholesky-type decomposition, O(p 3 ) computation per sweep. Key Observation: very few γ changes in two consecutive sweeps. Key idea: 1. low-rank update of matrix inversion (e.g., SMW formula); 2. low-rank update of Cholesky decomposition (e.g., algorithm C1-C4 in Gill et. al (1974)). Combining the two, get a fast update algorithm of O(lp 2 ) computation, l is average no. of changed γ per sweep.
26 Direct Chain Build on the special Guassian mixture prior (3). γ i γ ( i) is sampled based on: P(γ i = 1 γ ( i), Y ) = P(γ i = 1 γ ( i) ) P(γ i = 1 γ ( i) ) + BF P(γ i = 0 γ ( i) ), BF is Bayes factor: BF = P(Y γ i =0,γ ( i) ) P(Y γ i =1,γ ( i) ). Collapsing over σ and β, BF = v Γ( n n i +1+ν ) 2 Γ( n n i +ν A 1 ( i) 2 ) 2 A i 1 2 Y Y Y X Ii A 1 X i I 1 n n i +ν Y +νλ 2 i A 2 0 Y Y Y X B I( i) A 1 1 ( i) X Y +νλ I ( i) 2 A n n i +1+ν 2. A i = X I i X Ii + D 2 I i for conjugate setup.
27 Direct Chain: Computational Notes Two main computation tasks: 1. inverting n i n i (n i n i ) matrix A i (A ( i) ). 2. computing determinant of A i and A i (equivalent to decomposition). Both required for each γ in every sweep, O( n 3 p) computation per sweep by standard Cholesky, n average model size.
28 Direct Chain: Computational Notes Two main computation tasks: 1. inverting n i n i (n i n i ) matrix A i (A ( i) ). 2. computing determinant of A i and A i (equivalent to decomposition). Both required for each γ in every sweep, O( n 3 p) computation per sweep by standard Cholesky, n average model size. Further speed-up: 1. A 1 i is updated from A 1 ( i) using block matrix inversion, computation O(n 2 i ). 2. A 1 2 i is updated from A 1 2 ( i) via low-rank update of Cholesky factor, computation O(ni 2). 3. One of A i and A ( i) is always same as A i 1 or A (i 1). Overall, our fast update algorithm is of O( n 2 p). For sparse models, this becomes doable for large p.
29 Simulation Study: design p = 200 predictors, 2 blocks of γ = 1. X i.i.d. N(0, 1), Y = Xβ γ + N(0, σ 2 ɛ ) iterations, at various levels of the smoothing parameter w. Interested in low signal/noise situations: σ 2 ɛ = 1, β = 0.8.
30 Simulation Results: Structure in γ is used to obtain better estimates
31 Finding regulators of ERBB2 gene p = 6000, n = 41, adjusted sparsity r so that average model size random restarts, 100,000 sweeps per round of Monte Carlo. For this data set, not much difference between w = 1 and w = 10. However, a few low signals did jump out consistently for w = 10.
32 Finding regulators of ERBB2 gene Gene PTPRN2 NR1H3 STAT1 MINK MLN64 ERBB2 Notes Tumor suppressor gene, target of methylation in human cancers. Transcription factors that regulate cell growth, sizeable body of data implicate these factors in oncogenesis of breast cancer. Co-regulated with ERBB2, Alpy et al, 2003, Oncogene Locus for ERBB2
33 Finding regulators of ERBB2 gene Gene PTPRN2 NR1H3 STAT1 MINK MLN64 ERBB2 Notes Tumor suppressor gene, target of methylation in human cancers. Transcription factors that regulate cell growth, sizeable body of data implicate these factors in oncogenesis of breast cancer. Co-regulated with ERBB2, Alpy et al, 2003, Oncogene Locus for ERBB2
34 Biological Motif Detection
35 Biological Motif Detection
36 Transcription Regulation Transcription factors regulate gene expression by helping (or inhibiting) transcription initiation.
37 Transcription Regulation Transcription factors regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner.
38 Transcription Regulation Transcription factors regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner. 3. play an important part in the much larger picture of expression regulation.
39 Transcription Regulation Transcription factors regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner. 3. play an important part in the much larger picture of expression regulation. Ultimate goal: Learn the grammar of transcription regulation.
40 Data description For each gene g: Promoter sequence S g Expression Y g
41 Data description For each gene g: Promoter sequence S g Expression Y g
42 Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g.
43 Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. MCB (ACGCGT), 21 minutes
44 Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. SCB (TTTCGCG), 21 minutes
45 Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. Arbitrary (TGATATC), 21 minutes
46 Modeling motif degeneracy 1. Inexact matches are allowed.
47 Modeling motif degeneracy 1. Inexact matches are allowed. 2. Not all positions created equal.
48 Information Content of Positions Pattern observed for most transcription factors: Dimeric binding: two such peaks separated by a short distance.
49 Information Content of Positions Pattern observed for most transcription factors: Dimeric binding: two such peaks separated by a short distance. This information has been noted by several studies: 1. Eisen, 2005, Genome Biology. 2. Kechris et al., Keles et al., 2002.
50 Hypercube model We consider all words of length L = 6, 7 to lie in a graph.
51 Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1.
52 Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1. The weight on the edge depends on the position of the differing letter.
53 Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1. The weight on the edge depends on the position of the differing letter. Hard to draw, here s a 2-D simplification:
54 A more general model: Ising prior for γ i P(γ) = e α γ+γ Bγ ψ(α,b), where α = (α 1,..., α p ), B = (b i,j ) p p are hyperparameters, and ψ(α, B) is the normalizing constant: ψ(α, B) = γ {0,1} p e α γ+γ Bγ.
55 A more general model: Ising prior for γ i P(γ) = e α γ+γ Bγ ψ(α,b), where α = (α 1,..., α p ), B = (b i,j ) p p are hyperparameters, and ψ(α, B) is the normalizing constant: ψ(α, B) = γ {0,1} p e α γ+γ Bγ. For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B.
56 Ising Prior: Posterior Computation For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B.
57 Ising Prior: Posterior Computation For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B. Apply to structured model selection: P(γ i = 1 γ ( i), Y ) = P(γ i = 1 γ ( i) ) P(γ i = 1 γ ( i) ) + BF P(γ i = 0 γ ( i) ),
58 Example: Regulatory Motifs for Yeast Cell Cycle
59 Periodic time series
60 PCA of cell cycle experiment
61 Motif Analysis Results I Motif length 6. B i,j = { 1, positions 1,6; 2, positions 2,3,4,5. Distance between top 100 motifs found by our model:
62 Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total.
63 Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description MCB binding site in CLN1 TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total.
64 Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT TGCTGG GGCTGG Description SWI5 binding site ACGGGT TCGCGG TCGGGT 16 new motifs total.
65 Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT MCM1 binding site in CLN3 TCGCGG TCGGGT 16 new motifs total.
66 Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total. REB1 binding sites
67 Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important.
68 Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important. For general graphs: model size is no longer as easy to control. Phase transition behavior. Asymmetry in graph: α should not be constant. Interpretation of hyperparameters not as straightforward.
69 Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important. For general graphs: model size is no longer as easy to control. Phase transition behavior. Asymmetry in graph: α should not be constant. Interpretation of hyperparameters not as straightforward. Favor a priori certain structures, not certain variables.
70 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces.
71 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies.
72 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection.
73 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > 1000.
74 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > Extensions: Hyperparameter selection for biological motif discovery.
75 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > Extensions: Hyperparameter selection for biological motif discovery. Convergence speed-up.
76 Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > Extensions: Hyperparameter selection for biological motif discovery. Convergence speed-up. Nonlinear regression models. Thank you!
Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces with Applications in Genomics
Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces with Applications in Genomics Fan Li Department of Statistical Science, Duke University Durham, NC 27708-0251, USA fli@stat.duke.edu
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationStochastic processes and
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl wieringen@vu nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University
More informationLikelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009
with with July 30, 2010 with 1 2 3 Representation Representation for Distribution Inference for the Augmented Model 4 Approximate Laplacian Approximation Introduction to Laplacian Approximation Laplacian
More informationExpression Data Exploration: Association, Patterns, Factors & Regression Modelling
Expression Data Exploration: Association, Patterns, Factors & Regression Modelling Exploring gene expression data Scale factors, median chip correlation on gene subsets for crude data quality investigation
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationIntegrated Anlaysis of Genomics Data
Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationSTAT 518 Intro Student Presentation
STAT 518 Intro Student Presentation Wen Wei Loh April 11, 2013 Title of paper Radford M. Neal [1999] Bayesian Statistics, 6: 475-501, 1999 What the paper is about Regression and Classification Flexible
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationStructural Learning and Integrative Decomposition of Multi-View Data
Structural Learning and Integrative Decomposition of Multi-View Data, Department of Statistics, Texas A&M University JSM 2018, Vancouver, Canada July 31st, 2018 Dr. Gen Li, Columbia University, Mailman
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More informationBayesian Linear Regression
Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationGibbs Sampling in Linear Models #2
Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationGroup lasso for genomic data
Group lasso for genomic data Jean-Philippe Vert Mines ParisTech / Curie Institute / Inserm Machine learning: Theory and Computation workshop, IMA, Minneapolis, March 26-3, 22 J.P Vert (ParisTech) Group
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationPhysician Performance Assessment / Spatial Inference of Pollutant Concentrations
Physician Performance Assessment / Spatial Inference of Pollutant Concentrations Dawn Woodard Operations Research & Information Engineering Cornell University Johns Hopkins Dept. of Biostatistics, April
More informationBayesian construction of perceptrons to predict phenotypes from 584K SNP data.
Bayesian construction of perceptrons to predict phenotypes from 584K SNP data. Luc Janss, Bert Kappen Radboud University Nijmegen Medical Centre Donders Institute for Neuroscience Introduction Genetic
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Inference of Multiple Gaussian Graphical Models
Bayesian Inference of Multiple Gaussian Graphical Models Christine Peterson,, Francesco Stingo, and Marina Vannucci February 18, 2014 Abstract In this paper, we propose a Bayesian approach to inference
More informationMetropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9
Metropolis Hastings Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Module 9 1 The Metropolis-Hastings algorithm is a general term for a family of Markov chain simulation methods
More informationMarkov Chain Monte Carlo Algorithms for Gaussian Processes
Markov Chain Monte Carlo Algorithms for Gaussian Processes Michalis K. Titsias, Neil Lawrence and Magnus Rattray School of Computer Science University of Manchester June 8 Outline Gaussian Processes Sampling
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large
More informationAges of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008
Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008 Context & Example Welcome! Today we will look at using hierarchical
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics, School of Public
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationApproximate Bayesian Computation
Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki and Aalto University 1st December 2015 Content Two parts: 1. The basics of approximate
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationInferring Transcriptional Regulatory Networks from High-throughput Data
Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationLecture 13 Fundamentals of Bayesian Inference
Lecture 13 Fundamentals of Bayesian Inference Dennis Sun Stats 253 August 11, 2014 Outline of Lecture 1 Bayesian Models 2 Modeling Correlations Using Bayes 3 The Universal Algorithm 4 BUGS 5 Wrapping Up
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationStatistics for high-dimensional data: Group Lasso and additive models
Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationA temporal hidden Markov regression model for the analysis. of gene regulatory networks
A temporal hidden Markov regression model for the analysis of gene regulatory networks Mayetri Gupta, Pingping Qu, and Joseph G. Ibrahim February 20, 2007 Abstract We propose a novel hierarchical hidden
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationLatent Variable models for GWAs
Latent Variable models for GWAs Oliver Stegle Machine Learning and Computational Biology Research Group Max-Planck-Institutes Tübingen, Germany September 2011 O. Stegle Latent variable models for GWAs
More informationHierarchical Modelling for Univariate Spatial Data
Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.
More informationLarge-scale Ordinal Collaborative Filtering
Large-scale Ordinal Collaborative Filtering Ulrich Paquet, Blaise Thomson, and Ole Winther Microsoft Research Cambridge, University of Cambridge, Technical University of Denmark ulripa@microsoft.com,brmt2@cam.ac.uk,owi@imm.dtu.dk
More informationLearning in Bayesian Networks
Learning in Bayesian Networks Florian Markowetz Max-Planck-Institute for Molecular Genetics Computational Molecular Biology Berlin Berlin: 20.06.2002 1 Overview 1. Bayesian Networks Stochastic Networks
More informationSimultaneous inference for multiple testing and clustering via a Dirichlet process mixture model
Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model David B Dahl 1, Qianxing Mo 2 and Marina Vannucci 3 1 Texas A&M University, US 2 Memorial Sloan-Kettering
More informationInferring Transcriptional Regulatory Networks from Gene Expression Data II
Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday
More informationStochastic processes and Markov chains (part II)
Stochastic processes and Markov chains (part II) Wessel van Wieringen w.n.van.wieringen@vu.nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, VU University Amsterdam, The
More informationIntroduction to Bayesian methods in inverse problems
Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationSupplement to Bayesian inference for high-dimensional linear regression under the mnet priors
The Canadian Journal of Statistics Vol. xx No. yy 0?? Pages?? La revue canadienne de statistique Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors Aixin Tan
More informationPart 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior Tom Heskes joint work with Marcel van Gerven
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationMULTILEVEL IMPUTATION 1
MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationComputer Vision Group Prof. Daniel Cremers. 14. Sampling Methods
Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More information27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1
10-708: Probabilistic Graphical Models, Spring 2015 27: Case study with popular GM III Lecturer: Eric P. Xing Scribes: Hyun Ah Song & Elizabeth Silver 1 Introduction: Gene association mapping for complex
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationMotivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble
Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationCoupled Hidden Markov Models: Computational Challenges
.. Coupled Hidden Markov Models: Computational Challenges Louis J. M. Aslett and Chris C. Holmes i-like Research Group University of Oxford Warwick Algorithms Seminar 7 th March 2014 ... Hidden Markov
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More information