An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis

Size: px
Start display at page:

Download "An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis"

Transcription

1 An Imputation-Consistency Algorithm for Biomedical Complex Data Analysis Faming Liang Purdue University January 11, 2018

2 Outline Introduction to biomedical complex data An IC algorithm for high-dimensional missing data problems The missing data problem The IC algorithm Theoretical development for the IC algorithm Numerical examples Extension to Blockwise Consistency Algorithm Discussion

3 Biomedical Complex Data Motivation: During the past two decades, the dramatic improvement in data collection and acquisition technologies has enabled scientists to collect vast amounts of health-related data in biomedical studies. Here are some examples: Multi-omics data: SNPs, copy number variants, mutation, methylation, RNA-seq Biomedical image data: cancer pathological images, brain images Mobile health data: wearable and/or ambient sensors Electronic health record If analyzed properly, these data can help us to improve contemporary healthcare services from diagnosis to prevention to personalized treatment, and also provide us some insights toward reducing healthcare costs.

4 Biomedical Complex Data The biomedical complex data are often characterized by some mixture of missing data heterogeneity high dimensionality small sample size high variety high volume high velocity How to analyze these data has posed many challenges on existing statistical methods!

5 Missing Data Problem Missing data appear ubiquitously in both low- and high-dimensional problems. For low-dimensional data, EM algorithm can be used. For high-dimensional data, some problem-specific algorithms have been developed, but there still lacks a general algorithm. Example: for some microarray data, missing values can appear for over 90% of genes (Ouyang et al., 2004).

6 EM Algorithm (Dempster et al., 1977) E-step: Calculate the expected value of the log-likelihood function with respect to the predictive distribution of the missing data given the current estimate of θ (t), i.e., Q(θ θ (t) ) = log f (X obs, x mis θ)h(x mis θ (t), X obs )dx mis. M-step: Find a value of θ that maximizes the quantity Q(θ θ (t) ), i.e., set θ (t+1) = arg max θ Q(θ θ(t) ).

7 Variants of EM Algorithm Stochastic EM algorithm (Celeus and Diebolt, 1985): The E-step is replaced by an imputation step Monte Carlo EM algorithm (Wei and Tanner, 1990): the E-step is replaced by Monte Carlo integration ECM algorithm (Meng and Rubin, 1993): Replacing the M-step by a number of computationally simpler conditional maximization steps. ECME (Liu and Rubin, 1994; He and Liu, 2012), PX-EM (Liu et al., 1998).

8 High-Dimensional Missing Data Problems The existing algorithms are usually problem-specific: Bayesian principal component analysis (BPCA) (Oba et al., 2003) matrix completion (Cai et al., 2010): large incomplete matrix MissGLasso (Stadler and Buhlmann, 2012): Gaussian graphical models MissPALasso (Stadler et al., 2014)

9 Imputation Consistency (IC) Algorithm I-step: Draw X mis from the predictive distribution h(x mis X obs, θ n (t) ) given X obs and the current estimate θ (t) C-step: Based on the pseudo-complete data X = (X obs, X mis ), find an updated estimate θ n (t+1) which forms a consistent estimate of θ (t) n. = arg max E θ θ n (t) log f θ ( x), (1) where E (t) θ log f θ ( x) = n log(f (x obs, x mis θ))f (x obs θ )h( x mis x obs, θ n (t) )dx obs d x mis, θ denotes the true value of the parameters, and f (x obs θ ) denotes the marginal density function of x obs.

10 Imputation Consistency (IC) Algorithm To find a consistent estimate of θ (t), we suggest a regularization approach: Estimate θ (t) by maximizing a penalized likelihood function, θ (t+1) n [ = arg max log f (X obs, X ] mis θ) λp(θ) θ where P(θ) denotes the penalty function of θ and λ is an appropriately tuned regularization parameter, and X mis denotes the imputed data based on the current estimate θ (t) n. Here the regularization should be understood in a general sense, and it also includes Bayesian and variable screening methods. (2)

11 Convergence of the IC Algorithm The rationale underlying the algorithm can be intuitively explained as follows: The consistency step is to find the minimizer of the Kullback-Leibler divergence from f (x obs, x mis θ) to the joint density f (x obs θ )h( x mis x obs, θ n (t) ). Hence, each consistency step provides a momentum to drive the current estimate θ n (t) toward θ, and the convergence will eventually happen by assuming n. For the empirical version (i.e., with a finite value of n), will jump around θ after the convergence due to the randomness in imputation. θ (t) n

12 Convergence of the IC Algorithm Let x = (x obs, x mis ) and define G n (θ θ n (t) ) = E (t) θ log f θ ( x) = n n Ĝ n (θ x, θ n (t) ) = 1 n G n (θ θ n (t) ) = 1 n i=1 n i=1 log f (x obs i log(f θ ( x))f (x obs θ )h( x mis θ n (t) )d x,, x i mis θ), log f (xi obs, x mis θ)h( x mis x obs, θ n (t) )d x mis. Let θ n (t+1) = arg max θ Θn Ĝ n (θ x, θ n (t) ) λ n P(θ), and θ (t) = arg max θ Θn G n (θ θ n (t) ). Our goal is to show θ n (t+1) p θ (t) as n.

13 Convergence of the IC Algorithm Ĝ n (θ x, θ n (t) ) G n (θ θ n (t) ) = {Ĝn(θ x, θ (t) n ) G n (θ θ n (t) )} + { G n (θ θ n (t) ) G n (θ θ n (t) )}. Lemma 1 [ULLN of Q n ] Assume conditions A 1 -A 3 and A 6 hold. Then sup sup G n (θ θ n (t) ) G n (θ θ n (t) ) p 0. θ n (t) Θ θ Θ n n Theorem 1 Assume conditions A1-A8 hold. For any T such that log T = o p (n), consider Θ T n as an arbitrary subsets of Θ n with T elements (can be allowed to have replicates). Then, sup θ Θn Ĝ n (θ x, θ n (t) ) G n (θ θ n (t) ) p 0. (i) sup (t) θ n Θ T n (ii) sup (t) θ n Θ T n θ (t+1) n θ (t) p 0.

14 Convergence of the IC Algorithm: Conditions (A1) log f θ ( x) is a continuous function of θ for each x X and a measurable function of x for each θ. (A2) Θ n is compact. (A3) There exists a function m n ( x) such that sup θ Θn, x X log f θ ( x) m n ( x). (A4) P(θ)/n 0 as n, where P(θ) is the penalty function or the log-prior density function. (A5) G n (θ θ n (t) ) has a unique maximum at θ (t) for all θ n (t) Θ n.

15 Convergence of the IC Algorithm: Conditions (A6) [Conditions for Glivenko-Cantelli theorem] (a) Assume that there exists mn(x obs ) such that 0 m n (x obs, θ n (t) ) mn(x obs ) for all θ n (t), E[mn(x obs )] <, and sup n Z + E[mn(x obs )1(mn(x obs ) ζ)] 0 as ζ. In addition, sup n 1 sup x X,θ Θn m n ( x)1(m n ( x) > ζ)h( x mis x, θ)d x mis 0 as ζ. (b) Define F n = { log f (xi obs, x mis θ)h( x mis x obs, θ n (t) )d x mis θ, θ n (t) Θ n }, and G n,m = {q1(mn(x obs ) M) q F n }. Suppose for any fixed M, ϵ, log N(ϵ, G n,m, L 1 (P n )) = o p (n), where P n is the empirical measure of x obs, L 1 (P n ) denotes the L 1 space of the empirical measure, and N(ϵ, G n,m, L 1 (P n )) denotes the minimum number of balls {g : g q ϵ} of radius ϵ needed to cover the set G n,m.

16 Convergence of the IC Algorithm Define B r (θ) = {θ θ θ 2 < r}, r n (η θ n (t) ) = inf{r G n (θ (t) θ n (t) and r n (η) = sup (t) θ r n Θ n (η θ n (t) ). n ) sup θ Θn \B r (θ (t) n (A7) r(η) = sup n 1 r n (η) 0 as η 0. ) G n(θ θ n (t) ) > η}, (A8) [Bounds on tails of the imputed data] For any θ n (t) Θ n and x obs X obs, the random variable x mis h( x obs, θ n (t) ) satisfies: (a) log(f (x obs, x mis θ)) [ M, M], for some generic constant M > 0. (log(f (x obs, x mis θ))) σ 2, for some generic constant σ 2 > 0. (b) var θ (t) n

17 Convergence of the IC Algorithm The IC algorithm generates two interleaved Markov chains: θ (t) n sampling mis optimization X t+1 θ n (t+1) sampling mis X t+2 It can be shown that the Markov chain {θ n (t) } is almost surely ergodic for sufficiently large n, i.e., Theorem 2. If A1 A8 hold, then {θ (t) n } is almost surely ergodic for sufficiently large n.

18 Convergence of the IC Algorithm Define the mapping M(θ) = arg max θ E θ log f θ ( x). (A9) (Contraction) The mapping M(θ) is differentiable. Let λ n (θ) be the largest singular value of M(θ)/ θ. There exists a number λ < 1 such that λ n (θ) λ for all θ Θ n for sufficiently large n and almost every x obs -sequence. Theorem 3. If A1 A9 hold, then for sufficiently large n, sufficiently large t, and almost every x obs -sequence, we have θ (t) n θ = o p (1). Furthermore, the sample average of the Markov chain also forms a consistent estimate of θ, i.e., 1 T T t=1 θ (t) n θ = o p (1), as n and T.

19 Imputation-Conditional Consistency Algorithm I-step. Draw Z from the conditional distribution h(z Y, θ n (t,1),..., θ n (t,k) ) given Y and the current estimate (θ n (t,1),..., θ n (n,k) ). CC-step. Based on the pseudo-complete data X = (Y, Z), do the following step: (1) Conditional on (θ n (t,2),..., θ n (t,k) ), find θ n (t+1,1) which forms a consistent estimate of θ (t,1) = arg max E θ (t,1) θ n (t,1),...,θ n (n,k) log f ( x θ (t,1) n, θ n (t,2),..., θ n (t,k) ), where the expectation is with respect to the joint density function of x = (y, z) and the subscript of E gives the current estimate of θ (k) Conditional on (θ n (t+1,1) forms a consistent estimate of θ (t,k) = arg max,..., θ (t,k 1) n E θ (t,k) θ n (t+1,1),...,θ n (t+1,k 1),θ n (t,k) ), find θ (t+1,k) n which log f ( x θ n (t+1,1),..., θ n (t,k) ), where the expectation is with respect to the joint density function of x = (y, z) and the subscript of E gives the current estimate of θ.

20 Convergence of the ICC Algorithm (A9 ) Let M i denote the mapping of the ith part of the CC-step, i.e., θ (t,i) = M i (θ n (t+1,1),..., θ n (t+1,i 1), θ n (t,i),..., θ n (t,k) ). Let M = M k M k 1... M 1 denote the joint mapping of M 1,..., M k. Let λ n (θ) denote the largest singular value of M(θ)/ θ. There exists a number λ < 1 such that λ n (θ) λ for all θ Θ n, all sufficiently large n, and almost every x obs -sequence.

21 Convergence of the ICC Algorithm Theorem 4 If A1-A8 and A9 hold, then for sufficiently large n, sufficiently large t, and almost every x obs -sequence, θ (t) n θ = o p (1). Furthermore, the sample average of the Markov chain also forms a consistent estimate of θ, i.e., 1 T T t=1 θ (t) n θ = o p (1), as n and T.

22 Gaussian Graphical Models The algorithms for complete data: Graphical Lasso (Yuan and Lin, 2007; Friedman et al., 2008) nodewise regression (Meinshausen and Buhlmann, 2006) ψ-learning algorithm (Liang et al., 2015)

23 ψ-learning algorithm 1. Correlation screening, which is to determine the conditional set for each pair of variables for calculating the partial correlation coefficient. 2. Calculation of ψ-partial correlation coefficients based on the reduced conditional sets. The ψ-partial correlation coefficient is equivalent to the partial correlation coefficient in learning GGM structure in the sense that ψ ij = 0 ρ ij = ψ-partial correlation screening, which is to determine the structure of the network.

24 Equivalent Measure Figure: Illustrative plot for calculation of ψ-partial correlation coefficients, where the solid and dotted edges indicate the direct and indirect associations, respectively. It reduces a high-dimensional problem (calculation of ρ ij s) to a low-dimensional problem (calculation of ψ ij s)

25 IC Algorithm for Gaussian Graphical Models (Initialization) Replace each missing entry by the median of the corresponding variable, and then iterates between the C- and I-steps. (C-step) Apply the ψ-learning algorithm to learn the structure of the Gaussian graphical network. (I-step) Impute missing values based on the network structure learned in the C-step.

26 Simulated Example The simulated example is an autoregressive process of order two with the concentration matrix given by 0.5, if j i = 1, i = 2,..., (p 1), 0.25, if j i = 2, i = 3,..., (p 2), C i,j = (3) 1, if i = j, i = 1,..., p, 0, otherwise. n = 200, p = 100, 200, 300, 400 missing rate 10%: randomly delete 10% of observations

27 Simulated Example (a) p = 100 (b) p = 400 precision misglasso Median BPCA IC Last IC Ave True precision misglasso Median BPCA IC Last IC Ave True recall recall Figure: Precision-recall curves for the GGM with missing data: IC-Ave is obtained from the ψ-scores averaged over last 20 iterations, IC-Last is obtained from the ψ-score generated in the last iteration, True is obtained from the ψ-score calculated using the complete data, Median is obtained from the ψ-score calculated with the missing entry replaced by the median expression value of the corresponding gene, and BPCA is obtained from the ψ-score calculated with the missing entries replaced by the BPCA estimate, and misglasso refers to the misglasso algorithm

28 Yeast Cell Expression Data Gasch et al. (2000) explored the genomic expression patterns in the yeast Saccharomyces cerevisiae responding to diverse environmental changes. The dataset contains 173 samples and 6152 genes. The missing rate is 3.01%. We work on top 1000 genes with the largest variation across samples.

29 Yeast Cell Expression Data (a) (b) log probability log degree Figure: (a) Integrated network obtained by the IC algorithm for the yeast data. (b) Log-log plot of the degree distribution of the integrated network.

30 High-Dimensional variable selection Y = (1, X)β + ϵ, where ϵ N(0, σ 2 I n ), some elements of X are missed at random, and each row of X follows a multivariate normal distribution N(0, Σ). The parameters of the model include θ 1 = (β, σ 2 ) and θ 2 = Σ 1. The ICC algorithm is applicable.

31 High-Dimensional variable selection Regularization methods for complete data: Lasso (Tibshirani, 1996): L 1 penalty. elastic net (Zou and Hastie, 2005): A linear combination of L 1 and L 2 penalties. SCAD (Fan and Li, 2001), MCP (Zhang, 2009): Concave penalty. Extend BIC (Chen and Chen, 2008): L 0 penalty. rlasso (Song and Liang, 2015): reciprocal L 1 penalty.

32 ICC Algorithm for High-Dimensional variable selection (Initialization) Replace each missing entry of X by the median of the corresponding column, and then iterates between the CC- and I-steps. (CC-step) (i) Apply the MCP algorithm to estimate the regression coefficients; (ii) estimate σ 2 ϵ conditional on the estimate of β; and (iii) apply the ψ-learning algorithm to learn the structure of the Gaussian graphical network. (I-step) Impute missing values according to the conditional distributions based on the regression model and network structure learned in the CC-step.

33 Simulated Example The datasets were simulated with n = 100 and a variety of p=200 and 500. The covariates X were generated in two settings: (i) the covariates are mutually independent, where x i N(0, 2I n ) for i = 1,..., n; and (ii) the covariates are generated according to the concentration matrix (3). For both settings, we set (β 0, β 1,..., β 5 ) = (1, 2, 1.5, 2.5, 5) and β 6 = = β p = 0, and random error term ϵ N(0, 2I n ). Under each setting of (n, p), we simulated 10 datasets. For each dataset, we considered two missing rates, randomly deleting 5% and 10% observations of X as missing values.

34 Simulated Example Table: Comparison of the ICC algorithm with the Median and BPCA methods for high-dimensional variable selection with independent covariates. True denotes the results obtained by the MCP method from the complete data. p MR BPCA Median ICC True err 2 β 0.257(0.267) 0.262(0.261) 0.042(0.041) 0.046(0.048) 5% fsr 0.119(0.143) 0.082(0.092) 0(0) 0(0) 200 nsr 0(0) 0(0) 0(0) 0(0) err 2 β 0.903(0.396) 0.856(0.421) 0.065(0.087) 0.046(0.048) 10% fsr 0.310(0.159) 0.308(0.178) 0(0) 0(0) nsr 0(0) 0(0) 0(0) 0(0) err 2 β 0.339(0.214) 0.350(0.206) 0.029(0.034) 0.027(0.023) 5% fsr 0.249(0.225) 0.266(0.237) 0(0) 0(0) 500 nsr 0(0) 0(0) 0(0) 0(0) err 2 β 1.532(1.071) 1.354(0.895) 0.044(0.022) 0.027(0.023) 10% fsr 0.470(0.265) 0.420(0.255) 0(0) 0(0) nsr 0.033(0.070) 0.017(0.053) 0(0) 0(0)

35 Simulated Example Table: Comparison of the ICC algorithm with the Median and BPCA methods for high-dimensional variable selection with dependent covariates. True denotes the results obtained by the MCP method from the complete data. p MR BPCA Median ICC True err 2 β 0.580(0.413) 0.548(0.140) 0.118(0.097) 0.071(0.050) 5% fsr 0.262(0.204) 0.263(0.200) 0(0) 0(0) 200 nsr 0.017(0.052) 0.017(0.052) 0(0) 0(0) err 2 β 1.604(0.666) 1.575(0.974) 0.424(0.461) 0.071(0.050) 10% fsr 0.247(0.229) 0.273(0.238) 0(0) 0(0) nsr 0.100(0.086) 0.083(0.088) 0.033(0.070) 0(0) err 2 β 0.669(0.366) 0.717(0.358) 0.172(0.195) 0.096(0.083) 5% fsr 0.262(0.202) 0.289(0.236) 0(0) 0(0) 500 nsr 0.017(0.053) 0.017(0.053) 0(0) 0(0) err 2 β 2.752(2.306) 2.896(2.601) 0.578(0.587) 0.096(0.083) 10% fsr 0.297(0.230) 0.327(0.224) 0(0) 0(0) nsr 0.133(0.070) 0.133(0.070) 0.050(0.081) 0(0)

36 A Real Data Example The eye dataset contains 120 samples and 200 variables. We set the missing rate at 5% and run for 10 times with different missing entries. Each run consists of 30 iterations and the results from the last 10 iterations are averaged. Table: Estimation errors of ˆβ (with respect to β c ) produced by ICC, Median and BPCA for the Bardet-Biedl syndrome example, where err 2 β is calculated by averaging ˆβ β c 2 over 10 incomplete datasets, and s.d. represents the standard deviation of err 2 β. Method BPCA Median ICC err 2 β s.d

37 A Real Data Example: Model Selection Complete Data: V.153, v.180, v.185, v.87, v.200 ICC: v.153, v.185, v.180, v.87, v.200 Median: v.153, v.185, v.62, v.200, v.54 BPCA: v.153, v.87, v.185, v.62, v.200

38 Mixture High-Dimensional Regression This model mimics the personalized medicine problem and addresses the variety (or heterogeneity) issue of big data β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ϵ, 1 i 200, β 0 + β 1 x 1 + β 102 x β 103 x ϵ, 201 i 400, y = β 0 + β 1 x 1 + β 202 x β 203 x ϵ, 401 i 600, β 0 + β 1 x 1 + β 302 x β 303 x ϵ, 601 i 800, β 0 + β 1 x 1 + β 402 x β 403 x ϵ, 801 i 1000, Results comparison (for p = 500 and n 1 = n 2 = = n 5 = 200) SIS-MCP: 38 variables are selected SIS-SCAD: 40 variables are selected SIS-Lasso: 47 variables are selected IC: 5 clusters and the exactly true model is selected.

39 Mixture High-Dimensional Regression Cluster Dendrogram hclust (*, "average") Height Figure: Clusters identified by the ICC algorithm for the mixture high-dimensional regression example.

40 Cancer Cell Line Encyclopedia (CCLE) Data The CCLE dataset contains compound screening data performed on large panels of molecularly characterized cancer cell lines. The gene expression (genome-wide), copy number profiling, and mutation data have been summarized to gene-level features. The CCLE panel is composed of 41,814 genomic features and 24 compounds on 504 cell lines (411 cell lines contain all measurement types). We tested the proposed method on the compound Lapatinib, which is an orally active drug for breast cancer and other solid tumors. For this compound, the dataset contains 491 cell lines. The AUC (area under the response curve) was used as the response variable, and the gene expression data were used as predictors. For the purpose of illustration, we used only top 500 genes, where the genes were ranked according to their marginal Henze-Zirkler scores with respect to the response variable (Xue and Liang, 2016).

41 CCLE Data: Lapatinib (a) (b) Fitted Y Fitted Y Y (c) Y (d) Fitted Y Height Y hclust (*, "average") Figure: (a)-(c) Scatter plots of the fitted response versus the original observation for M = 1, 2, and 3, respectively; (d) Cluster dendrogram for M = 3.

42 CCLE Data: Lapatinib M = 1: corr(y, Ŷ ) = 0.49, average-bic value= (Xue and Liang, 2017), gene ERBB2 was selected which is the known predictive gene of Lapatinib (Penzcalto et al., 2013). M = 2: corr(y, Ŷ ) = 0.85, average-bic value= M = 3: corr(y, Ŷ ) = 0.93, average-bic value=316.37, cluster 1 (214 cell lines) selected 39 genes, cluster 2 (166 cell lines) selected 32 genes, and cluster 3 (111 cell lines) selected only the gene ERBB2. The algorithm is efficient: A total of 6.5 minutes (CPU time) on a 3.5GHz computer for all three models. For both M = 2 and M = 3, each was run for 200 iterations.

43 Biomarker Discovery Biomarker identification from high-throughput omics data has been one of major focuses in cancer research. Yet despite intense effort in the past two decades, the number of biomarkers approved by the FDA each year for clinical use remains in the single digits. An important factor contributing to this failure is the lack of appropriate statistical methods for analyzing such heterogeneous, high-dimensional, but small sample-sized data. ICC provides a promising tool for biomarker discovery under heterogeneity.

44 Extension: Blockwise Consistency (BwC) Algorithm 1. There exists a constant K k such that every index s {1, 2,..., k} is chosen at least once between the rth iteration and the (r + K 1)th iteration, for all r. 2. For the chosen index s, find an estimator of θ (s) t asymptotically maximizes the objective function ˆθ (s+1) t 1 which W (θ (s) t ) = E θ log f (X ˆθ (1) (s 1) t 1,..., ˆθ t 1, θ(s) (k) t,,..., ˆθ t 1 ) (4) based on the samples X 1,..., X n, where t indexes iterations, and ˆθ (j) t 1 s (for j s) denote the current estimates and are treated as constants at iteration t. Let ˆθ (s) t denote the estimator of θ (s) found at iteration t, and set ˆθ (j) t = {ˆθ(s) ˆθ (j) t 1 t, j = s,, j s.

45 Extension: Blockwise Consistency (BwC) Algorithm Let θ (s) t = arg max W (θ (s) t ), and let ˆθ (s) t denote a regularized estimator of θ (s) t. { θ t } forms a path of coordinate descent (or iterated conditional modes) algorithm. Under appropriate conditions, it can be shown that ˆθ t converges to θ t uniformly with probability going to 1. Consequently, the two sequences will converge to the same limit.

46 An Illustrative Example We consider an variable selection example with n = 100 and p = 5000: y i = θ 0 + p x ij θ j + ϵ i, i = 1, 2,..., n, (5) j=1 where ϵ i s are iid normal random errors with mean 0 and variance 1. The true value of θ j s are θ j = 1 for j = 1, 2,..., 10 and 0 otherwise. The predictors x 1,..., x p are given by x 1 = z 1 + e, x 2 = z 2 + e,...,..., x p = z p + e, (6) where e, z 1,..., z p are iid normal random vectors drawn from N(0, I n ). Ten datasets are independently generated.

47 An Illustrative Example Repositioned the true variables such that they are positioned as {1, 2, 1001, 1002, 2001, 2002, 3001, 3002, 4001, 4002}. 1. Split the predictors into 5 blocks: {x 1,..., x 1000 }, {x 1001,..., x 2000 }, {x 2001,..., x 3000 }, {x 3001,..., x 4000 }, and {x 4001,..., x 5000 }. 2. Conduct variable selection using MCP for each block independently, and combine the selected predictors to get an initial estimate of θ. 3. Conduct blockwise conditional variable selection using MCP for 25 sweeps. Here a sweep refers to a cycle of updates for all blocks.

48 BwC Results I o o fsr nsr sqrt(sse) + o o o o o o o o o o o o o o o o o o o o o o o Number of Sweeps Figure: Convergence path of BwC for one simulated dataset with n = 100 and p = 5000, where fsr, nsr, and sqrt(sse) denote the false selection rate, negative selection rate, and parameter estimation error ˆθ θ, respectively.

49 BwC Results II Table: Comparison of BwC, Lasso, SCAD and MCP for the simulated example with p = Lasso SCAD MCP BwC ŝ avg 21(0.0) 19.9(1.10) 20.3(0.70) 12.8(0.36) ŝ s avg 3.7 (0.79) 5.4 (0.93) 5.2 (0.95) 10(0) ˆθ θ avg 3.31 (0.24) 2.96 (0.41) 3.01 (0.39) (0.05) fsr nsr

50 BwC for eqtl analysis The eqtl (expression quantitative trait loci) analysis can be formulated as a multivariate regression analysis, Y = XB + E, where Y represents expression levels of q genes, X represents p single nucleotide variants (SNVs), and E denote the random error matrix. The goal is to identify both the cis-eqtls and trans-eqtls. The former refers to that SNVs regulate the expression of their own genes, and the latter refers to that SNVs regulates the expression of the genes that they do not belong to. Let ϵ (i) denote the ith row of E. In general, it is assumed that ϵ T (i) follows a multivariate normal distribution N(0, Σ), while ϵ T (1),..., ϵt (n) are mutually independent. We are interested in jointly estimating the regression coefficient matrix B and the precision matrix Ω = Σ 1, in particular, when q and/or p are greater than n.

51 BwC for eqtl analysis Table: Results comparison for precision matrix estimation (with n = 200, q = 100 and p = 3000, 5000 and 10000): the results are averages over 10 datasets, where ŝ avg denotes the number of connections selected by the method, and ŝ s avg denotes the number of true connections selected by the method. The true number of connections is 197. p metod ŝ avg ŝ s avg fsr nsr BwC (2.28) (1.75) 0.05 (0.008) 0.18 (0.009) AMRCE 70.7 (5.92) 1.1 (0.33) 0.99 (0.004) 0.99 (0.003) BwC (3.51) (2.64) 0.05 (0.006) 0.18 (0.013) AMRCE (10.14) 0.65 (0.20) 0.99 (0.003) 0.99 (0.002) BwC (1.57) (1.54) 0.06 (0.006) 0.21 (0.008) AMRCE (13.58) 1.55 (0.52) 0.98 (0.006) 0.98 (0.005) BwC: Treat B and Ω as two blocks, and B can be blocked further. AMRCE (approximate multivariate regression with covariance estimation) by Rothman et al. (2010). A Bayesian method (Bhadra and Mallick, 2013).

52 BwC for eqtl analysis Frequency Frequency BwC (p=3000) AMRCE (p=3000) Frequency Frequency BwC (p=5000) AMRCE (p=5000) Frequency Frequency BwC (p=10000) AMRCE (p=10000) Figure: Histograms of non-zero elements of ˆB obtained by BwC (left panel) and AMRCE (right panel) for three datasets with p = 3000 (upper panel), p = 5000 (middle panel), and p = 10000, respectively.

53 Discussion: I Consistency is a useful concept, which can lead to many efficient (approximation) algorithms for high-dimensional and/or big data computing. The variance of the ICC samples reflects the information loss due to the missing data.

54 Discussion: II The BwC method decomposes the high dimensional parameter estimation problem to a series of lower dimensional parameter estimation problems which have often much simpler structures than the original high-dimensional problem, and thus can be easily solved. (Example: Hierarchical model) The BwC algorithm provides a potential solution to the problem of parameter estimation for complex models that are often encountered in big data analysis. Under the framework provided by BwC, a variety of methods, such as Bayesian and frequentist methods, can be jointly used to achieve a consistent estimator for the original high-dimensional complex model. This is very important for big data problems for which a complex model is often needed!

55 Acknowledgments NSF grant DMS NIH R01GM NIH R01GM126089

of Complex Models Runmin Shi, Faming Liang, Ye Luo, Qifan Song and Malay Ghosh December 13, 2016 Abstract

of Complex Models Runmin Shi, Faming Liang, Ye Luo, Qifan Song and Malay Ghosh December 13, 2016 Abstract A Blockwise Consistency Method for Parameter Estimation of Complex Models Runmin Shi, Faming Liang, Ye Luo, Qifan Song and Malay Ghosh December 13, 2016 Abstract The drastic improvement in data collection

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models

Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models Equivalent Partial Correlation Selection for High Dimensional Gaussian Graphical Models Faming Liang University of Florida August 23, 2015 Abstract Gaussian graphical models (GGMs) are frequently used

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Variable Selection in Structured High-dimensional Covariate Spaces

Variable Selection in Structured High-dimensional Covariate Spaces Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007

More information

Semi-Penalized Inference with Direct FDR Control

Semi-Penalized Inference with Direct FDR Control Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Gaussian Graphical Models and Graphical Lasso

Gaussian Graphical Models and Graphical Lasso ELE 538B: Sparsity, Structure and Inference Gaussian Graphical Models and Graphical Lasso Yuxin Chen Princeton University, Spring 2017 Multivariate Gaussians Consider a random vector x N (0, Σ) with pdf

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences

Biostatistics-Lecture 16 Model Selection. Ruibin Xi Peking University School of Mathematical Sciences Biostatistics-Lecture 16 Model Selection Ruibin Xi Peking University School of Mathematical Sciences Motivating example1 Interested in factors related to the life expectancy (50 US states,1969-71 ) Per

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010

The MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010 Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

On High-Dimensional Cross-Validation

On High-Dimensional Cross-Validation On High-Dimensional Cross-Validation BY WEI-CHENG HSIAO Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan hsiaowc@stat.sinica.edu.tw 5 WEI-YING

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations Joint work with Karim Oualkacha (UQÀM), Yi Yang (McGill), Celia Greenwood

More information

arxiv: v1 [stat.me] 30 Dec 2017

arxiv: v1 [stat.me] 30 Dec 2017 arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Inferring Transcriptional Regulatory Networks from High-throughput Data

Inferring Transcriptional Regulatory Networks from High-throughput Data Inferring Transcriptional Regulatory Networks from High-throughput Data Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models Jingyi Jessica Li Department of Statistics University of California, Los

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Graphlet Screening (GS)

Graphlet Screening (GS) Graphlet Screening (GS) Jiashun Jin Carnegie Mellon University April 11, 2014 Jiashun Jin Graphlet Screening (GS) 1 / 36 Collaborators Alphabetically: Zheng (Tracy) Ke Cun-Hui Zhang Qi Zhang Princeton

More information

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure

Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Inverse Covariance Estimation with Missing Data using the Concave-Convex Procedure Jérôme Thai 1 Timothy Hunter 1 Anayo Akametalu 1 Claire Tomlin 1 Alex Bayen 1,2 1 Department of Electrical Engineering

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu Department of Statistics, University of Illinois at Urbana-Champaign WHOA-PSI, Aug, 2017 St. Louis, Missouri 1 / 30 Background Variable

More information

Graduate Econometrics I: Maximum Likelihood II

Graduate Econometrics I: Maximum Likelihood II Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Variable selection and machine learning methods in causal inference

Variable selection and machine learning methods in causal inference Variable selection and machine learning methods in causal inference Debashis Ghosh Department of Biostatistics and Informatics Colorado School of Public Health Joint work with Yeying Zhu, University of

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R Xingguo Li Tuo Zhao Tong Zhang Han Liu Abstract We describe an R package named picasso, which implements a unified framework

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

An algorithm for the multivariate group lasso with covariance estimation

An algorithm for the multivariate group lasso with covariance estimation An algorithm for the multivariate group lasso with covariance estimation arxiv:1512.05153v1 [stat.co] 16 Dec 2015 Ines Wilms and Christophe Croux Leuven Statistics Research Centre, KU Leuven, Belgium Abstract

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer

A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer A novel and efficient algorithm for de novo discovery of mutated driver pathways in cancer Binghui Liu, Chong Wu, Xiaotong Shen, Wei Pan University of Minnesota, Minneapolis, MN 55455 Nov 2017 Introduction

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University Problems

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Iterative Selection Using Orthogonal Regression Techniques

Iterative Selection Using Orthogonal Regression Techniques Iterative Selection Using Orthogonal Regression Techniques Bradley Turnbull 1, Subhashis Ghosal 1 and Hao Helen Zhang 2 1 Department of Statistics, North Carolina State University, Raleigh, NC, USA 2 Department

More information

Extended Bayesian Information Criteria for Gaussian Graphical Models

Extended Bayesian Information Criteria for Gaussian Graphical Models Extended Bayesian Information Criteria for Gaussian Graphical Models Rina Foygel University of Chicago rina@uchicago.edu Mathias Drton University of Chicago drton@uchicago.edu Abstract Gaussian graphical

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Divide-and-combine Strategies in Statistical Modeling for Massive Data

Divide-and-combine Strategies in Statistical Modeling for Massive Data Divide-and-combine Strategies in Statistical Modeling for Massive Data Liqun Yu Washington University in St. Louis March 30, 2017 Liqun Yu (WUSTL) D&C Statistical Modeling for Massive Data March 30, 2017

More information

Predicting causal effects in large-scale systems from observational data

Predicting causal effects in large-scale systems from observational data nature methods Predicting causal effects in large-scale systems from observational data Marloes H Maathuis 1, Diego Colombo 1, Markus Kalisch 1 & Peter Bühlmann 1,2 Supplementary figures and text: Supplementary

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini,

More information

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces

Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Extended Bayesian Information Criteria for Model Selection with Large Model Spaces Jiahua Chen, University of British Columbia Zehua Chen, National University of Singapore (Biometrika, 2008) 1 / 18 Variable

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Variable Selection for Highly Correlated Predictors

Variable Selection for Highly Correlated Predictors Variable Selection for Highly Correlated Predictors Fei Xue and Annie Qu arxiv:1709.04840v1 [stat.me] 14 Sep 2017 Abstract Penalty-based variable selection methods are powerful in selecting relevant covariates

More information

Learning Network from High-dimensional Array Data

Learning Network from High-dimensional Array Data 1 Learning Network from High-dimensional Array Data Li Hsu, Jie Peng and Pei Wang Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA Department of Statistics, University

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

High-dimensional Covariance Estimation Based On Gaussian Graphical Models

High-dimensional Covariance Estimation Based On Gaussian Graphical Models High-dimensional Covariance Estimation Based On Gaussian Graphical Models Shuheng Zhou, Philipp Rutimann, Min Xu and Peter Buhlmann February 3, 2012 Problem definition Want to estimate the covariance matrix

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Robust and sparse Gaussian graphical modelling under cell-wise contamination

Robust and sparse Gaussian graphical modelling under cell-wise contamination Robust and sparse Gaussian graphical modelling under cell-wise contamination Shota Katayama 1, Hironori Fujisawa 2 and Mathias Drton 3 1 Tokyo Institute of Technology, Japan 2 The Institute of Statistical

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data

Learning Gene Regulatory Networks with High-Dimensional Heterogeneous Data Learning Gene Regulatory Networs with High-Dimensional Heterogeneous Data Bochao Jia and Faming Liang arxiv:1805.02547v1 [stat.me] 7 May 2018 May 8, 2018 Abstract The Gaussian graphical model is a widely

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information