Variable Selection in Structured High-dimensional Covariate Spaces

Similar documents
Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces with Applications in Genomics

Default Priors and Effcient Posterior Computation in Bayesian

Bayes methods for categorical data. April 25, 2017

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Nonparametric Bayesian Methods (Gaussian Processes)

STA414/2104 Statistical Methods for Machine Learning II

25 : Graphical induced structured input/output models

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Methods for Machine Learning

Introduction to Machine Learning

Stochastic processes and

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Sparse Linear Models (10/7/13)

Integrated Anlaysis of Genomics Data

Nearest Neighbor Gaussian Processes for Large Spatial Data

Machine Learning Techniques for Computer Vision

MCMC: Markov Chain Monte Carlo

STAT 518 Intro Student Presentation

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Structural Learning and Integrative Decomposition of Multi-View Data

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Nonparametric Bayes tensor factorizations for big data

MCMC algorithms for fitting Bayesian models

Contents. Part I: Fundamentals of Bayesian Inference 1

Introduction to Bioinformatics

Bayesian Linear Regression

On Bayesian Computation

Gibbs Sampling in Linear Models #2

STA 4273H: Sta-s-cal Machine Learning

Group lasso for genomic data

Bayesian Regression Linear and Logistic Regression

Hierarchical Modeling for Univariate Spatial Data

Physician Performance Assessment / Spatial Inference of Pollutant Concentrations

Bayesian construction of perceptrons to predict phenotypes from 584K SNP data.

Bayesian Sparse Correlated Factor Analysis

Shrinkage Methods: Ridge and Lasso

Density Estimation. Seungjin Choi

Bayesian Inference of Multiple Gaussian Graphical Models

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Markov Chain Monte Carlo Algorithms for Gaussian Processes

25 : Graphical induced structured input/output models

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Based on slides by Richard Zemel

Bayesian Linear Models

Bayesian Linear Models

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

CPSC 540: Machine Learning

Exam: high-dimensional data analysis January 20, 2014

Pattern Recognition and Machine Learning

Bayesian linear regression

Integrated Non-Factorized Variational Inference

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Approximate Bayesian Computation

Hierarchical Modelling for Univariate Spatial Data

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Bayesian Linear Models

Semi-Penalized Inference with Direct FDR Control

Inferring Transcriptional Regulatory Networks from High-throughput Data

November 2002 STA Random Effects Selection in Linear Mixed Models

Gaussian Mixture Model

MCMC Sampling for Bayesian Inference using L1-type Priors

Lecture 13 Fundamentals of Bayesian Inference

STA 4273H: Statistical Machine Learning

Part 8: GLMs and Hierarchical LMs and GLMs

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Statistics for high-dimensional data: Group Lasso and additive models

Recent Advances in Bayesian Inference Techniques

Or How to select variables Using Bayesian LASSO

A temporal hidden Markov regression model for the analysis. of gene regulatory networks

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Stat 516, Homework 1

Latent Variable models for GWAs

Hierarchical Modelling for Univariate Spatial Data

Large-scale Ordinal Collaborative Filtering

Learning in Bayesian Networks

Simultaneous inference for multiple testing and clustering via a Dirichlet process mixture model

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Stochastic processes and Markov chains (part II)

Introduction to Bayesian methods in inverse problems

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Study Notes on the Latent Dirichlet Allocation

Proteomics and Variable Selection

Supplement to Bayesian inference for high-dimensional linear regression under the mnet priors

Part 2: Multivariate fmri analysis using a sparsifying spatio-temporal prior

Bayesian Nonparametric Regression for Diabetes Deaths

MULTILEVEL IMPUTATION 1

Advances and Applications in Perfect Sampling

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

27: Case study with popular GM III. 1 Introduction: Gene association mapping for complex diseases 1

Introduction to Probabilistic Machine Learning

Approximate Bayesian Computation and Particle Filters

Statistical Learning with the Lasso, spring The Lasso

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Probabilistic Graphical Models

Approximate Inference

Coupled Hidden Markov Models: Computational Challenges

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Transcription:

Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

Problem Formulation Consider the standard linear regression problem: Y = Xβ + ɛ, (1) where Y is n 1 response, X is n p covariates, ɛ is n 1 independent errors,

Problem Formulation Consider the standard linear regression problem: Y = Xβ + ɛ, (1) where Y is n 1 response, X is n p covariates, ɛ is n 1 independent errors, Many problems in genomics fall into the scenario: 1. Known structure exists in the covariate space. 2. p large (possibly > n), want sparse estimation of β.

Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i.

Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands.

Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands. Example of X i,: :

Example 1: Chromosome Copy Number Data X i,j : noisy measure of copy number at location j in person i. Y i : quantitative trait for person i. Scale: i in the tens or hundreds, j in the thousands. Example of X i,: : At each location, we observe a noisy measurement of the underlying copy number.

Example 1: Chromosome Copy Number Pollack et al. (2002) breast cancer data set:

Expression of key oncogenes: ERBB2 has elevated expression in 30% of breast cancers. It is correlated with more aggressive cancer. What controls the expression of ERBB2?

Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i.

Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i. Scale: i in the thousands, w in the thousands.

Example 2: Biological Motif Analysis Data X w,j : count of occurrence of word w upstream of gene i. Y i : expression of gene i. Scale: i in the thousands, w in the thousands. Examples of w: ACGCGTT ACGCGTG TCGCGTA TCGCGGA The motifs for a single transcription factor tend to be clustered. More on this later...

Review in Structured Variable Selection Lasso (L 1 penalty) type 1. Fused-Lasso (Tibshirani et al., 2005): 1-d smoothing. 2. Grouped Lasso (Yuan and Lin, 2006): variables added and dropped in groups.

Review in Structured Variable Selection Lasso (L 1 penalty) type 1. Fused-Lasso (Tibshirani et al., 2005): 1-d smoothing. 2. Grouped Lasso (Yuan and Lin, 2006): variables added and dropped in groups. Bayesian variable selection framework (next slide): 1. Gibbs sampler, George and McCulloch, (1993) 2. Many improvements since then... We apply BVS in structured settings to genomic analysis.

Review: Latent Variable Model Define latent variables: γ i {0, 1}, i = 1,..., p.

Review: Latent Variable Model Define latent variables: Conditioned on γ i : γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2)

Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3)

Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) Conjugate prior for variance: β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3) σ 2 γ IG(ν γ /2, ν γ λ γ /2)

Review: Latent Variable Model Define latent variables: Conditioned on γ i : Special case: γ i {0, 1}, i = 1,..., p. β i γ i (1 γ i )N(0, τ 2 i ) + γ i N(0, σ 2 c 2 i τ 2 i ). (2) Conjugate prior for variance: Likelihood for observed data: β i γ i (1 γ i )I 0 + γ i N(0, ν 2 i ). (3) σ 2 γ IG(ν γ /2, ν γ λ γ /2) Y β, X N(Xβ, σ 2 I).

First, a simple Markov prior Transition matrix P = ( p 1 p 1 q q Assume that γ 1 π, where π = ( 1 q ). 2 p q, stationary distribution with regards to P. ) 1 p 2 p q is the

First, a simple Markov prior Transition matrix P = ( p 1 p 1 q q Assume that γ 1 π, where π = ( 1 q ). 2 p q, stationary distribution with regards to P. ) 1 p 2 p q is the More interpretable to re-parameterize the prior as: r = π 1 π 0 = (1 p) q (1 q) 1 p : prior prob ratio of inclusion of a variable, w = : Fold change in probability of inclusion of next variable.

Two strategies for Gibbs sampler Auxiliary chain: f (γ Y) is sampled from the auxiliary Markov chain β 0, σ 0, γ 0, β 1, σ 1, γ 1,... Direct chain: f (γ Y) is directly sampled from γ 0, γ 1, γ 2,...

Auxiliary Chain Posterior distribution of β: β j N p (A γ j 1X Y, A γ j 1), (4) where A γ j 1 = [X X + D 1 γ j 1 R 1 D 1 γ j 1 ] 1. Hierarchical structure of the model γ i β i Y i implies f (γ j Y j 1, β j 1, σ j 1 ) = f (γ j β j 1 ). Posterior f (γ j β j 1 ) is an inhomogeneous Markov chain. Posterior distribution of σ j σ j IG ( n + νγ j 1 2, Y Xβj 2 + ν γ j 1λ γ j 1 2 ).

Auxiliary Chain: Computational Notes Two computational intensive tasks: 1. inverting a p p matrix to obtain A γ. 2. computing square root of A γ. Traditionally done by Cholesky-type decomposition, O(p 3 ) computation per sweep.

Auxiliary Chain: Computational Notes Two computational intensive tasks: 1. inverting a p p matrix to obtain A γ. 2. computing square root of A γ. Traditionally done by Cholesky-type decomposition, O(p 3 ) computation per sweep. Key Observation: very few γ changes in two consecutive sweeps. Key idea: 1. low-rank update of matrix inversion (e.g., SMW formula); 2. low-rank update of Cholesky decomposition (e.g., algorithm C1-C4 in Gill et. al (1974)). Combining the two, get a fast update algorithm of O(lp 2 ) computation, l is average no. of changed γ per sweep.

Direct Chain Build on the special Guassian mixture prior (3). γ i γ ( i) is sampled based on: P(γ i = 1 γ ( i), Y ) = P(γ i = 1 γ ( i) ) P(γ i = 1 γ ( i) ) + BF P(γ i = 0 γ ( i) ), BF is Bayes factor: BF = P(Y γ i =0,γ ( i) ) P(Y γ i =1,γ ( i) ). Collapsing over σ and β, BF = v Γ( n n i +1+ν ) 2 Γ( n n i +ν A 1 ( i) 2 ) 2 A i 1 2 0 @ Y Y Y X Ii A 1 X i I 1 n n i +ν Y +νλ 2 i A 2 0 Y Y Y X B I( i) A 1 1 ( i) X Y +νλ I ( i) C @ 2 A n n i +1+ν 2. A i = X I i X Ii + D 2 I i for conjugate setup.

Direct Chain: Computational Notes Two main computation tasks: 1. inverting n i n i (n i n i ) matrix A i (A ( i) ). 2. computing determinant of A i and A i (equivalent to decomposition). Both required for each γ in every sweep, O( n 3 p) computation per sweep by standard Cholesky, n average model size.

Direct Chain: Computational Notes Two main computation tasks: 1. inverting n i n i (n i n i ) matrix A i (A ( i) ). 2. computing determinant of A i and A i (equivalent to decomposition). Both required for each γ in every sweep, O( n 3 p) computation per sweep by standard Cholesky, n average model size. Further speed-up: 1. A 1 i is updated from A 1 ( i) using block matrix inversion, computation O(n 2 i ). 2. A 1 2 i is updated from A 1 2 ( i) via low-rank update of Cholesky factor, computation O(ni 2). 3. One of A i and A ( i) is always same as A i 1 or A (i 1). Overall, our fast update algorithm is of O( n 2 p). For sparse models, this becomes doable for large p.

Simulation Study: design p = 200 predictors, 2 blocks of γ = 1. X i.i.d. N(0, 1), Y = Xβ γ + N(0, σ 2 ɛ ). 25000 iterations, at various levels of the smoothing parameter w. Interested in low signal/noise situations: σ 2 ɛ = 1, β = 0.8.

Simulation Results: Structure in γ is used to obtain better estimates

Finding regulators of ERBB2 gene p = 6000, n = 41, adjusted sparsity r so that average model size 20-30. 10 random restarts, 100,000 sweeps per round of Monte Carlo. For this data set, not much difference between w = 1 and w = 10. However, a few low signals did jump out consistently for w = 10.

Finding regulators of ERBB2 gene Gene PTPRN2 NR1H3 STAT1 MINK MLN64 ERBB2 Notes Tumor suppressor gene, target of methylation in human cancers. Transcription factors that regulate cell growth, sizeable body of data implicate these factors in oncogenesis of breast cancer. Co-regulated with ERBB2, Alpy et al, 2003, Oncogene Locus for ERBB2

Finding regulators of ERBB2 gene Gene PTPRN2 NR1H3 STAT1 MINK MLN64 ERBB2 Notes Tumor suppressor gene, target of methylation in human cancers. Transcription factors that regulate cell growth, sizeable body of data implicate these factors in oncogenesis of breast cancer. Co-regulated with ERBB2, Alpy et al, 2003, Oncogene Locus for ERBB2

Biological Motif Detection

Biological Motif Detection

Transcription Regulation Transcription factors... 1. regulate gene expression by helping (or inhibiting) transcription initiation.

Transcription Regulation Transcription factors... 1. regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner.

Transcription Regulation Transcription factors... 1. regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner. 3. play an important part in the much larger picture of expression regulation.

Transcription Regulation Transcription factors... 1. regulate gene expression by helping (or inhibiting) transcription initiation. 2. bind to DNA in a sequence specific manner. 3. play an important part in the much larger picture of expression regulation. Ultimate goal: Learn the grammar of transcription regulation.

Data description For each gene g: Promoter sequence S g Expression Y g

Data description For each gene g: Promoter sequence S g Expression Y g

Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g.

Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. MCB (ACGCGT), 21 minutes

Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. SCB (TTTCGCG), 21 minutes

Regression Model Bussemaker, Li, and Siggia (2001): Y g = β 0 + M β m X g (m) + error, m=1 X g (m) is the count of word w in S g. Y g is log expression of gene g. Arbitrary (TGATATC), 21 minutes

Modeling motif degeneracy 1. Inexact matches are allowed.

Modeling motif degeneracy 1. Inexact matches are allowed. 2. Not all positions created equal.

Information Content of Positions Pattern observed for most transcription factors: Dimeric binding: two such peaks separated by a short distance.

Information Content of Positions Pattern observed for most transcription factors: Dimeric binding: two such peaks separated by a short distance. This information has been noted by several studies: 1. Eisen, 2005, Genome Biology. 2. Kechris et al., 2004. 3. Keles et al., 2002.

Hypercube model We consider all words of length L = 6, 7 to lie in a graph.

Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1.

Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1. The weight on the edge depends on the position of the differing letter.

Hypercube model We consider all words of length L = 6, 7 to lie in a graph. There is an edge between words w 1, w 2 if d Hamming (w 1, w 2 ) = 1. The weight on the edge depends on the position of the differing letter. Hard to draw, here s a 2-D simplification:

A more general model: Ising prior for γ i P(γ) = e α γ+γ Bγ ψ(α,b), where α = (α 1,..., α p ), B = (b i,j ) p p are hyperparameters, and ψ(α, B) is the normalizing constant: ψ(α, B) = γ {0,1} p e α γ+γ Bγ.

A more general model: Ising prior for γ i P(γ) = e α γ+γ Bγ ψ(α,b), where α = (α 1,..., α p ), B = (b i,j ) p p are hyperparameters, and ψ(α, B) is the normalizing constant: ψ(α, B) = γ {0,1} p e α γ+γ Bγ. For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B.

Ising Prior: Posterior Computation For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B.

Ising Prior: Posterior Computation For each i, the conditional distribution P P(γ i γ ( i) ) = eγ i (α i + j I β ij γ j ) ( i) 1 + e γ i (α i + P j I ( i) β ij γ j ). can be efficiently computed for sparse B. Apply to structured model selection: P(γ i = 1 γ ( i), Y ) = P(γ i = 1 γ ( i) ) P(γ i = 1 γ ( i) ) + BF P(γ i = 0 γ ( i) ),

Example: Regulatory Motifs for Yeast Cell Cycle

Periodic time series

PCA of cell cycle experiment

Motif Analysis Results I Motif length 6. B i,j = { 1, positions 1,6; 2, positions 2,3,4,5. Distance between top 100 motifs found by our model:

Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total.

Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description MCB binding site in CLN1 TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total.

Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT TGCTGG GGCTGG Description SWI5 binding site ACGGGT TCGCGG TCGGGT 16 new motifs total.

Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT MCM1 binding site in CLN3 TCGCGG TCGGGT 16 new motifs total.

Motif Analysis Results II Signals that were found (posterior probability > 0.05) with smoothing that were lost without: Words ACGCGT TCGCGT TCGCGA GCGCGT CCGCGT Description TGCTGG GGCTGG ACGGGT TCGCGG TCGGGT 16 new motifs total. REB1 binding sites

Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important.

Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important. For general graphs: model size is no longer as easy to control. Phase transition behavior. Asymmetry in graph: α should not be constant. Interpretation of hyperparameters not as straightforward.

Notes on Hyperparameter Selection For the motif model, we so far arbitrarily selected hyperparameters for good computational properties. For 1-d lattice: (r, w) code for sparsity and smoothness. Given w, can choose r analytically to set desired model size. Since algorithm is O(p n 2 ), this is practically very important. For general graphs: model size is no longer as easy to control. Phase transition behavior. Asymmetry in graph: α should not be constant. Interpretation of hyperparameters not as straightforward. Favor a priori certain structures, not certain variables.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > 1000.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > 1000. Extensions: Hyperparameter selection for biological motif discovery.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > 1000. Extensions: Hyperparameter selection for biological motif discovery. Convergence speed-up.

Conclusions and Extensions General Ising model for variable selection in structured covariate spaces. 1-d lattice: reduces to simple Markov model which is applicable to problems in genomic profiling studies. L-d hypercube: a natural model for motif detection. Computationally feasible for p > 1000. Extensions: Hyperparameter selection for biological motif discovery. Convergence speed-up. Nonlinear regression models. Thank you!