The Bayesian Group-Lasso for Analyzing Contingency Tables

Size: px
Start display at page:

Download "The Bayesian Group-Lasso for Analyzing Contingency Tables"

Transcription

1 Sudhir Raman Department of Comput Science, Univsity of Basel, Bnoullistr. 16, CH-4056 Basel, Switzland Thomas J. Fuchs Institute of Computational Science, ETH Zurich, Univsitaetstrasse 6, CH-809 Zurich, Switzland & Competence Cent for Systems Physiology and Metabolic Diseases - Schafmattstr. 18, CH-8093 Zurich, Switzland Pet J. Wild PETER.WILD@CELL.BIOL.ETHZ.CH Institute of Pathology, Univsity Hospital Zurich, Schmelzbgstrasse 1, CH-8091 Zurich, Switzland Edgar Dahl Institute of Pathology, Univsity Hospital, Pauwelsstrasse 30, 5074 Aachen, Gmany EDAHL@UKAACHEN.DE Volk Roth VOLKER.ROTH@UNIBAS.CH Department of Comput Science, Univsity of Basel, Bnoullistr. 16, CH-4056 Basel, Switzland Abstract Group-Lasso estimators, useful in many applications, suff from lack of meaningful variance estimates for regression coefficients. To ovcome such problems, we propose a full Bayesian treatment of the Group-Lasso, extending the standard Bayesian Lasso, using hiarchical expansion. The method is then applied to Poisson models for contingency tables using a highly efficient MCMC algorithm. The simulated expiments validate the pformance of this method on artificial datasets with known ground-truth. When applied to a breast canc dataset, the method demonstrates the capability of identifying the diffences in intactions pattns of mark proteins between diffent patient groups. 1. Introduction and Related Work The identification of important explanatory factors for a process is a key task in many practical learning problems. In the context of standard linear regression, the Lasso (Tibshirani, 1996) has become a popular method for this purpose in the recent years. The Lasso optimizes a regression functional und an l 1 -constraint on the coefficient vector: given a n 1 vector of responses y = (y 1,...,y n ) t, y i R and obsvation vectors x i R p arranged as rows of a Appearing in Proceedings of the 6 th Intnational Confence on Machine Learning, Montreal, Canada, 009. Copyright 009 by the author(s)/own(s). n p data matrix X, the Lasso minimizes y Xβ subject to β 1 κ. (1) In many applications, howev, explanatory factors do not necessarily have a one-to-one correspondence with single features (main effects) but rath require a more complex representation, for instance, by considing not only single features but high ord intactions between some/all features. As a furth extension to this idea, the factors (main effects + high ord intactions) may need to be represented as groups of variables. A popular example of this kind is the representation of categorical variables (i.e. factors in the usual statistical tminology and/or their intactions) as groups of dummy variables. To address such situations, the Group-Lasso (Yuan & Lin, 006) is an ideal choice since it sves as a natural extension of the Lasso by finding solutions that are sparse at the level of groups of variables which, for instance, might represent categorical variables and/or intaction tms. Despite the fact that the Group-Lasso estimators have proven useful in many applications (Kim et al., 006; Mei et al., 008; Roth & Fisch, 008), their main problem concns the definition of meaningful variance estimates for the regression coefficients. This problem because the Hessian is not defined at the optimal solution. In this pap, we suggest a full Bayesian treatment of the Group-Lasso to ovcome this problem. The proposed model uses a hiarchical expansion and directly extends corresponding Bayesian vsions of the standard Lasso (Figueiredo & Jain, 001; Park & Casella, 008) to handle grouped predictors.

2 While the Bayesian Group-Lasso is applicable for many diffent likelihood models, the focus of this pap is on Poisson models for contingency tables. In applications of this kind, the dummy covariates x represent factor intactions up to a ctain ord, and sparsity induced by the Group-Lasso corresponds to selecting edges in the hypgraph defining the intaction structure among these factors. Implementation approaches include the method described in (Figueiredo & Jain, 001) which basically reduces to MAP point estimation without essential Bayesian features such as postior variances, wheas (Park & Casella, 008) describes a full Bayesian approach based on MCMC sampling. A third class of methods falls in between these extremes by utilizing approximation schemes like expectation propagation (Seeg, 008). By exploiting a special orthogonality propty of design matrices in the Poisson model for contingency tables, we are able to dive a highly efficient MCMC sampling algorithm that makes it possible to follow the full Bayesian route without resorting to approximation schemes even in large realworld examples. The availability of postior variances and correlations ovcomes seval problems of a related penalized likelihood (or MAP) approach introduced in (Dahinden et al., 007). Moreov, our algorithm is applicable to contingency tables stemming from arbitrary factors rath than being restricted to binary variables. We show that our method is capable of identifying relevant high-ord intaction pattns both in toy examples and in large-scale medical applications.. The Bayesian Group-Lasso For the following divation, it is convenient to partition the data matrix X, and the coefficients β into G subgroups: X = (X 1,...,X G ), β t = ( β t 1,...,β t G). () The size of the g-th subgroup will be denoted by p g. The Group-Lasso minimizes the negative log-likelihood viewed as a function in β, l = l(β), und a constraint on the sum of the l -norms of the subvectors β g : minimize l(β) s.t. G g=1 β g κ. (3) From a probabilistic pspective, the Group-Lasso with Gaussian likelihood can be undstood as a standard linear regression model with Gaussian noise and a product of multivariate Laplacian priors ov the regression coefficients. The obsvations y = (y 1,...,y n ) t are assumed to be genated according to y i = x t iβ + ε i, with ε i N(0,σ ), (4) which implies a likelihood of the form p(y X, β,σ ) exp { y Xβ /(σ ) } { (σ ) p/ exp 1 } σ (β ˆβ) t X t X(β ˆβ) { (σ ) ν/ exp SSE } σ, whe ˆβ is the least-squares solution, SSE = (y X ˆβ) t (y X ˆβ) is the sum of squared rors, and ν = n p. The last equation results from completing the squares which is standard in Bayesian regression, see for instance (Gelman et al., 1995). Assuming a multivariate (sphical) p g -dimensional Multi-Laplacian prior ov each group of regression coefficients, (5) M-Laplace(β g 0,c 1 ) c pg/ exp( c β g ), (6) the classical group-lasso in eq. (3) is recoved as the MAP-solution in log-space, with σ c having the role of a fixed Lagrange paramet. For a full Bayesian treatment, on the oth hand, we would like to place (hyp)priors ov c and σ and integrate out these paramets. In practice, these integrations will not be possible analytically, and we propose to use a Gibbs sampling strategy for stochastic integration. Hiarchical expansion. For finding a representation in which all conditionals have a standard form, we use the following hiarchical expansion which extends the respective divation for the standard Lasso in (Figueiredo & Jain, 001) to grouped predictors: the prior can be expressed as a two-level hiarchical model involving independent zomean Gaussian priors ov β g and Gamma priors ov λ g. Defining a g = p g ρ and b g = β g /σ (for each group g), the Multi-Laplacian prior on β g can be expressed as a hiarchical Normal-Gamma model: p(β g ρ) = N(β 0 g 0,σ λ g)gamma(λ g pg+1, a g ) dλ g [ ] = (σ ) pg (λ g) 1 exp bg λ pg+1 a g g ag dλ g 0 λ g }{{} ( bg ag ) 1 [ 4 K 1 (a gb g) 1 ] GIG(λ g 1,ag,bg) (σ ) pg ( bg a g ) 1 4 (a g b g ) 1 4 a pg+1 g exp = (a g /σ ) pg exp( (ag /σ ) 1 βg ) [ ] (a g b g ) 1 M-Laplace(β g 0,(a g /σ ) 1 ). (7) In the above divation we have used the definition of the genalized invse Gaussian distribution [ ] (a g b g ) 1 GIG ( λ g 1,a ) b g,b g = ( g a g ) 1 4 K 1 1 [ (λ g) 1 exp 1 ( )] bg λ + λ ga g g, (8)

3 with K 1 (x) = π/(x) exp[ x] denoting a special case of the sphical Bessel functions. For the standard linear model we can, thus, expand the Group-Lasso in tms of a Gaussian likelihood, Gaussian priors ov the regression coefficients and Gamma priors ov the corresponding variances. The model is completed by introducing a prior on σ (we use the standard conjugate joint prior p(β,σ ) = p(β σ )p(σ ) = N(β µ,σ Σ) Inv-χ (σ ν 0,s 0)) and a conjugate Gamma(ρ r, s) prior on ρ. Multiplying the priors with the likelihood and rearranging the relevant tms yields the full conditional postiors, which are needed in the Gibbs sampl for carrying out the stochastic integrations. Concning β and σ, the resulting conditionals have the standard form in Bayesian regression: p(σ ) = Inv-χ, and p(β ) =N(β µ,σ Σ) with Σ = (X t X + Λ 1 ) 1, and µ = ΣX t X ˆβ, whe Λ is a diagonal matrix consisting of λ g s as diagonal elements, with each λ g repeated p g times. The conditional postior of ρ is again Gamma due to conjugacy, p(ρ ) = ρ Gr exp [ ρ (9) ( G s + 1 )] G g=1 λ gp g, (10) whe r and s are the shape- and scale hypparamets of the Gamma prior on ρ. Finally, the conditional of λ g is genalized invse Gaussian: p(λ g ) = GIG ( λ g 1,a g,b g ) (11) 3. Poisson Models for Contingency Tables While in principle the Bayesian Group-Lasso introduced in the last section is applicable for many diffent likelihoods, in this pap we focus on Poisson models for analyzing count data in contingency tables. The reason for this choice is threefold: (i) count data for ctain compositions of discrete propties occur frequently in practical applications, particularly in a bio-medical context; (ii) feature selection for categorical variables is directly related to methods infring sparsity on the level of groups of predictors; (iii) the use of ctain encoding schemes for categorical variables allows us to dive a highly efficient sampling algorithm that makes full Bayesian infence practical in large-scale applications. Denote by z = (z 1,...,z n ) the obsved counts in a contingency table with n cells. The standard approach to modeling count data is Poisson regression which involves a loglinear model with independent tms: for i = 1,...,n : i e µi z i µ i Poisson(µ i ) = µzi z i!, (1) with the Poisson mean µ i = e ηi, and η i = x i β. Using a random link η i = x t iβ + ǫ i, ǫ i N(0,σ ) (13) makes it possible to allow for deviations from the log-linear model. For instance, ovdispsed Poisson models (which are common in practice) can be modeled with random link functions. In the Bayesian context such random links correspond to a conditional Gaussian prior on η: p(η i β,σ ) = N(η i x t iβ,σ ). (14) Note that inputs are available only as counts z. Assuming that the total numb of counts is fixed, the obsved vector of individual counts z = (z 1,...,z n ) can be consided as a realization drawn from a multinomial distribution. This is the approach taken in (Dahinden et al., 007) in a penalized likelihood framework. If, on the oth hand, we assume a sampling model in which the total numb of counts itself is random and the time piod of obsving the counts is fixed, we arrive at the Poisson model (1). This sampling model is plausible for many practical situations. For example, a clinical study of fixed length whe the counts correspond to the numb of patients with ctain propties visiting the hospital. The main technical advantage of the Poisson model lies in the factorization ov the cells, given the means µ i (see eq. (1)). To undstand the nature of the dummy covariates x, the following notations are helpful: assume our contingency table is based on obsvations of d categorical random variables or factors, C 1,...,C d, whe each C j has K j levels. The dummy covariates x represent factor intactions, starting with the intactions of ord zo (the main effects) up to all intactions of a ctain maximum ord Q. Intactions are denoted by the colon opator (:). The is one group of dummy variables used for encoding each intaction tm. This corresponds to assuming that the means of the η i (which are in turn the log Poisson means) can be expressed (in vector form) as η = Xβ, with the design matrix X composed of individual submatrices: X =[1,X C1,...,X C d,x C1:C,...,X C d 1:C d, }{{}}{{} main effects 1st ord intactions...,x C1: :C Q+1,...,X C d Q: :C d ]. }{{} highest ord intactions (15) To avoid ov-parametrization, identifiability constraints are imposed on the individual submatrices in the form of contrast codes which encode a factor with K levels into a matrix with K 1 columns. Intaction tms are basically column-wise product expansions of these individual (main-effect) matrices. In many practical applications we are given orded factors, i.e. ordinal variables for which a natural ording is involved. Examples of this kind are

4 for instance intensity levels. For such ordinal categorical variables, the use of polynomial contrast codes is a natural choice. These encodings employ orthogonal polynomials and have the practical advantage that the (typically huge) design matrix is orthogonal, i.e. X t X = I. The use of random links not only allows for deviations from the parametric model, but also greatly simplifies Gibbs sampling: when conditioning on η i, sampling of β and σ follows the standard procedure in Bayesian regression. The orthogonality propty of the design matrix, X t X = I, makes it possible to sample from vy high-dimensional models in an efficient and numically stable way since only diagonal matrices have to be invted, see eq. (9). Updating η i, on the oth hand, is more difficult because the corresponding conditional is not of recognized form: p(η i β,σ,x, z) exp [ η i z i exp(η i ) i 1 σ (xt iβ η i ) ]. (16) Howev, the above conditional postior is log-concave which makes it possible to use black-box sampling methods like adaptive rejection sampling. Altnatively, we propose to use a Laplace approximation similar to that in (Green & Park, 004), which in practice turns out to give results which are almost indistinguishable from adaptive rejection sampling while speeding up the computations considably. Figure 1 summarizes the hiarchical structure of the Poisson Group-Lasso model. ν 0, s 0 Inv-χ (σ ν 0, s 0) x z r, s ρ Λ (σ, β) η Poisson(z exp(η)) Gamma(ρ r, s) Gamma(λ g, ρ 1 ) N(β, σ Λ) N(η x t β, σ ) Figure 1. Dependency structure of the hiarchical Poisson Group-Lasso model. On top of this hiarchy are the hypparamets (r, s) for the Gamma prior on ρ and (ν 0, s 0) for the Inv-χ prior on σ. On the oth end the are the obsved vector of counts z and the dummy covariates x. The core of the diagram represents the hiarchical Normal-Gamma prior on the regression coefficients β and the Normal random link between η and the covariates and coefficients. Hypparamet selection. A side effect of using the Laplace approximation for η i is a good intuition about reasonable priors on σ. Such a prior should be roughly cented around the reciprocal value of the avage of all counts, see (Green & Park, 004) for details. In our implementation we use this rule of thumb by setting s 0 in the Inv-χ (n 0,s 0) prior on σ to 1/(1 + median(z)). Note that the influence of the Inv-χ prior can be intpreted as adding n 0 virtual samples with variance s 0. The only oth hypparamets in the model are the shape r and scale s in the Gamma(r, s) prior on ρ. Testing for suitable values can be done straightforwardly by first sampling ρ from the hypprior and in turn sampling λ g from Gamma( pg+1, p ). gρ The fraction of large λ g values encodes our prior belief about the sparsity of the model: since λ g is a variance paramet for the g-th group of regression coefficients, the fraction of large λ g values essentially detmines the expected sparsity. As a default setting in our implementation we use the crition that roughly 1% of the λ g values shoud exceed 5 median(λ ). 4. Simulated Example In ord to illustrate the pformance of the Bayesian Group Lasso, expiments we carried out on simulated data. The data was constructed by assuming 8 categorical variables with 3 levels each (main effects) and all high ord intaction tms upto nd ord (total of 9 groups representing the intaction tms, and 6561 combinations of levels). The orthogonal ( ) design matrix X was genated with polynomial contrast codes as described in the previous section. Then three factors we chosen for genating the counts, namely one main effect (variable 1), a first ord intaction (1:5) and a second ord intaction (6:7:8). For these factors, positive values of β we taken, with all oth β values fixed to zo. The counts we then genated using eq. (1) and eq. (13) with σ = 0.1. Hypparamets we specified using our default procedure described above. The example traceplot in the low panel of Figure indicates that the Markov chain convges almost immediately, an obsvation which is corroborated by a length control diagnosis according to (Rafty & Lewis, 199) indicating that the necessary burn-in-piod is probably 100 samples. Gibbs sampling was executed for itations, and the postior distributions of the coefficients β i we analyzed based on evy 5th sample. The upp panel of Figure shows the resulting estimation of significant intaction tms, with significance measured by eith the fraction of positive or negative samples, depending on the sign of the mean value β i. The size of the circles encodes this significance measure for the main effects; the linewidth of the blue edges (1st ord intactions) and reddish trian-

5 Trace of 6:7: Itations Figure. Simulated data: 8 categorical variables with 3 levels each, and three truly nonzo intaction tms: 1, 1:5, 6:7:8. Upp panel: intactions identified from the postior distributions. The size of the circles indicates the estimated significance of the main effects: 97.5% of the postior samples for variable 1 have a positive sign. Correspondingly, the linewidth of the intactions (blue lines: 1st-ord, reddish triangles: nd-ord) indicates their significance. Low panel: Example traceplot and moving avage for the nd-ord intaction 6:7:8 gles represents the importance of the high-ord tms, ranging from 0.7 (thinnest lines plotted) to Note that a level of 0.5 represents complete randomness. All three intaction tms could be clearly identified. The are 3 additional spurious intaction tms whose significance level, howev, is vy low (0.7). 5. Application to Breast Canc Studies Breast canc and immunohistochemistry. In Westn societies breast canc is one of the leading causes of tumor-induced death in women. Despite improvements in the identification of prognostic and predictive paramets, novel biomarks are needed to improve patient risk stratification and to optimize patient outcome. Furthmore, the identification of molecules that are diffentially regulated during tumorigenesis may lead to the development of psonalized thapeutic approaches. Recently, independent research groups we able to identify five distinct gene expression pattns which are (i) highly predictive for the patients prognosis and (ii) may reflect the biological behavior bett compared to established paramets. According to this model, a basal as well as two distinct luminal-like expression pattns in addition to a h (ERBB) ovexpressing and a normal breast-like group could be distinguished (Pou et al., 000). 1 Even though results from mrna expression profiling are vy convincing, the are still some limitations to its clinical application due to the high costs. Seval studies have shown that biologically distinct classes of breast canc as defined by mrna expression analysis can also be identified with cost efficient immunohistochemistry (Abd El- Rehim et al., 005; Diallo-Danebrock et al., 007). Definitions of the basal phenotype by the form and oth groups using diffent cytokatin antibodies (e.g. anti- ) prove to be robust and allow the identification of this tumor type on a routine basis. Tissue microarrays. In the present study, intensity levels of the following immunohistochemical marks in tissue samples have been measured utilizing the tissue microarray (TMA) technology: the estrogen receptor (), karyophin-alpha- (KPNA), anti-cytokatin, fibrous structural protein Collagen-6, membraneassociated tetraspanin protein Claudin-7, int-α-trypsin inhibitor, and the human epidmal growth factor receptor h. The TMA technology promises to significantly accelate studies seeking for associations between molecular changes and clinical endpoints (Kononen & Bubendorf, L. et al, 1998). Survival Probability time(months) Figure 3. Kaplan-Mei curves regarding ovall survival for the low-risk (upp red curve) and the high-risk group (low blue curve) of breast canc patients. Error bars define standard 95% confidence intvals. Expimental design. Aft histopathological grading of tumors according to (Elston & Ellis, 1991), patients we divided into a low-risk (grade 1-) and a high-risk (grade 3) group. The ovall goal of this expiment was to identify diffences in intaction pattns of mark proteins between these groups. Kaplan-Mei survival analysis in Figure 3 shows that the split chosen is meaningful in the sense that the survival probability diffs significantly between these two groups. This obsvation was corroborated by the analysis of mean protein expression levels in the two groups (Figure 4). As expected for the low-risk class of patients (grade 1-), the was marked estrogen receptor () expression, wheas and KPNA we negative.

6 KPNA KPNA KPNA h KPNA h Low risk group High risk group h h Figure 5. Identified intaction pattns for the low-risk group (left) and the high-risk group (right). The size of the circles indicates the estimated significance of the main effects. For instance, the largest circle for means that more than 95% of the postior samples are negative. Correspondingly, the linewidth of the intactions (blue lines: 1st-ord, reddish triangles: nd-ord) indicates their significance, see Figure 6 for an example. Figure 4. Distribution of protein expression levels (3 bins corresponding to low, intmediate, high ) in the low-risk group (left) and the high-risk group (right). Data analysis and intpretation. Having identified a meaningful subdivision into patient groups, we focused on identifying intaction pattns in each of the groups. The obsved expression of each of the proteins was represented as a factor with 3 levels ( low, intmediate, high ). The resulting contingency tables we separately analyzed for each group with our Bayesian Poisson Group-Lasso model. Intaction tms up to the second ord (i.e. intactions between triples of factors) we analyzed. One million Gibbs samples we drawn, the burn-in phase contained the first 00,000 samples, and evy 5th of the remaining samples was used for computing the postior densities. Due to our efficient algorithm, it took less than one hour to compute all samples on a standard comput. Traceplots indicated that the convgence of the Markov chain was not really an issue in this expiment (see Figure 6 for an example), an obsvation which is corroborated by a length control diagnosis according to (Rafty & Lewis, 199) indicating that the necessary burnin-piod is probably In the low-risk group the following intaction tms appeared to be highly significant: the two main effects of KPNA and expression, the first-ord intaction KPNA: and the second ord intaction KPNA::h, see Figure 5. Intpreting high-ord intaction tms can be a complex problem. A close analysis of the contrast codes and the sign of the regression coefficients showed, howev, that all these intaction tms explain obsved counts by eith marginal or joint negative immunoreactivity for KPNA, and h, respectively. The high-risk group showed a distinctly diffent intaction pattn which is dominated by the main effect of expression, the intaction :KPNA and the (weak) intaction ::. Again, looking into the contrast codes and the signs, we concluded that the main effect of explains the counts by an ov-represented intmediate bin and both undrepresented low and high bins. The intaction tm :KPNA explains counts mainly by a joint intmediate expression. These intaction pattns are in line with known gene expression pattns of breast canc. Non-high grade breast cancs (grade 1-) we mainly hormone-receptor positive, and negative for the high-grade marks KPNA, and h. In a study by Dahl et al. (Dahl et al., 006), high rates of KPNA expression we significantly associated with positive TP53 and h immunoreactivity and a high prolifation index. Besides, KPNA seemed to be charactistic of the basal-like subtype of breast cancs, possibly representing a diffent clinical entity of breast tumors, which is associated with short survival times and a high frequency of TP53 mutations. Control expiments. In ord to compare our results with oth analysis methods we conducted two control expiments. For the low-risk group, Figure 7 shows the solution path computed by the non-bayesian analogue of our method, the standard Group-Lasso with Poisson likelihood. We used the algorithm described in (Roth & Fisch, 008). The solution path shows the evolution of the individual group norms when relaxing the constraint κ, see eq. (3). The plot indicates that the main effects and KPNA and the intactions KPNA: and KPNA::h have a dominating role, which is in pfect agreement with our results. At the same time, the more diffuse picture for large constraint values κ > 100 to-

7 Trace of KPNA: Score: Score: KPNA h KPNA h Itations Density of KPNA: Figure 6. Example traceplot for to the vy strong 1st-ordintaction KPNA: in the low-risk group, and moving avage (upp panel). Corresponding postior density (low panel). More than 90% of the samples exceed zo (red area). geth with the difficulty of defining meaningful variance estimates nicely demonstrates the inhent intpretation problems of classical Group-Lasso solutions. group norm KPNA h kappa KPNA: KPNA::h : : KPNA: :h KPNA: KPNA: : KPNA:: Figure 7. Comparison with non-bayesian analogue for the lowrisk group: evolution of group norms ( solution path ) obtained by relaxing the constraint in the standard Group-Lasso with Poisson likelihood. The test for uniqueness/completeness of solutions proposed in (Roth & Fisch, 008) reveals anoth problem: for any reasonable numical tolance paramet in the optimization process, the solutions found by the Group-Lasso are probably not uniquely identifiable. For constraint values κ > 90 the is an increasing amount of inactive (i.e. zo norm) groups that might become active in altnative solutions which are ǫ-close (in tms of likelihood) to the found optimal solution. This problem might be viewed as anoth strong argument for following the Bayesian paradigm of avaging ov Group-Lasso solutions, instead of focusing on a single (penalized) maximum likelihood solution. For a second control expiment we used the same data to estimate a Bayesian network. Concning the identification of intactions, the main technical diffences to our Group-Lasso model are the restriction to a graph (instead of Figure 8. Two examples of the top-scoring Bayes nets with almost indistinguishable scores, showing typical variations in topology. For quantifying the score diffences we made a pturbation expiment: when randomly leaving out 10% of the obsvations, the standard deviation of the individual maximum scores is σ.. More than 50 diffent networks in the unpturbed problem lie within the highest one-σ region. a hypgraph), and the use of directed edges which in some cases can be used for infring causal relations. We used the deal-package (Bøttch & Dethlefsen, 003; Bøttch & Dethlefsen., 009) that finds the topology on the basis of the network score which is basically the log of the joint probability of the graph and the data. In the resulting analysis, we obsve many similarities between the Markov blankets from the Bayes nets and from our model. On the oth hand, neith the network topology nor the direction of edges seems to be vy stable. Among the top-scoring models many variants of the network have almost indistinguishable scores. Most of these fluctuation concn the dependencies between the three variables KPNA, and h, see Figure 8 for an example. The clear identification of a second-ord intaction between these three variables in our model (the bold red triangle in the left panel of Figure 5) might be intpreted as a strong advantage of explicitly modeling high-ord intactions in a hypgraph. 6. Conclusion This pap has presented a full Bayesian treatment of the Group-Lasso which is applicable to many real-world learning problems. Despite the genality of the proposed model, the main focus was on Poisson likelihood models for analyzing count data in contingency tables, with feature selection, because (i) they occur frequently in a bio-medical context, (ii) they are endowed with an inhent group structure representing the intaction tms of the categorical variables and (iii) they allow for efficient Gibbs sampling models making it practical in real-world examples. On the theoretical side we have dived a hiarchical Normal-Gamma expansion of the Multi-Laplace prior on the regression coefficients. This expansion is one of the key components for rewriting the Group-Lasso model in a form that is suitable for efficient Gibbs sampling. The second key component is the orthogonality propty of the design matrix representing dummy encodings for categori-

8 cal variables which ensures that sampling becomes numically stable and highly efficient. When applied to a real-world breast-canc study, the proposed method identifies the diffences in intactions pattns of mark proteins between multiple patient groups. To our knowledge, this is the first study which systematically analyzes the influence of high-ord intactions based on immunohistochemical data for breast canc. Intpretation of the results by pathologists furth validates our approach since the findings are in line with known gene expression pattns of breast canc and hence supporting the usage of cost-effective immunohistochemical marks. The comparison of the method to standard approaches (standard Group-Lasso, Bayesian network) validates the results and also highlights the advantages of the proposed method ov existing approaches. The related software is going to be published as an R package. Acknowledgments The work was partly financed with a grant of the Swiss SystemsX.ch Initiative to the project LivX of the Competence Cent for Systems Physiology and Metabolic Diseases. The LivX project was evaluated by the Swiss National Science Foundation. Refences Abd El-Rehim, D., Ball, G., & Pind, S.E. et al. (005). High-throughput protein expression analysis using tissue microarray technology of a large well-charactised sies identifies biologically distinct classes of breast canc confirming recent cdna expression analyses. Int J Canc, 116, Bøttch, S. G., & Dethlefsen, C. (003). deal: A package for learning bayesian networks. Journal of Statistical Software, 8, Bøttch, S. G., & Dethlefsen., C. (009). deal: Learning bayesian networks with mixed variables. R package vsion Dahinden, C., Parmigiani, G., Emick, M., & Bühlmann, P. (007). Penalized likelihood for sparse contingency tables with an application to full-length cdna libraries. BMC Bioinformatics, 8, 476. Dahl, E., G., G. K., & Gottlob, K. et al. (006). Molecular profiling of las-microdissected matched tumor and normal breast tissue identifies karyophin alpha as a potential novel prognostic mark in breast canc. Clin Canc Res, 1, Diallo-Danebrock, R., Ting, E., & Gluz, O. et al. (007). Protein expression profiling in high-risk breast canc patients treated with high-dose or conventional dosedense chemothapy. Clin Canc Res, 13, Elston, C., & Ellis, I. (1991). Pathological prognostic factors in breast canc. I. The value of histological grade in breast canc: expience from a large study with longtm follow-up. Histopathology, 19, Figueiredo, M., & Jain, A. (001). Bayesian learning of sparse classifis. Proc. IEEE Comp. Soc. Conf. Comput Vision and Pattn Recognition (pp ). Gelman, A., Carlin, J., Stn, H., & Rubin, D. (1995). Bayesian data analysis. Chapman&Hall. Green, P., & Park, T. (004). Bayesian methods for contingency tables using Gibbs sampling. Statistical Paps, 45, Kim, Y., Kim, J., & Kim, Y. (006). Blockwise sparse regression. Statistica Sinica, 16, Kononen, J., & Bubendorf, L. et al (1998). Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nature Medicine, Jul;4(7), Mei, L., van de Ge, S., & Bühlmann, P. (008). The Group Lasso for Logistic Regression. J. Roy. Stat. Soc. B, 70, Park, T., & Casella, G. (008). The Bayesian Lasso. Journal of the Amican Statistical Association, 103, Pou, C., Sorlie, T., & Eisen, M.B. et al. (000). Molecular portraits of human breast tumours. Nature, 406, Rafty, A., & Lewis, S. (199). One long run with diagnostics: Implementation strategies for Markov chain Monte Carlo. Statistical Science, 7, Roth, V., & Fisch, B. (008). The Group-Lasso for genalized linear models: uniqueness of solutions and efficient algorithms. ICML 08: Proceedings of the 5th intnational confence on Machine learning (pp ). New York, NY, USA: ACM. Seeg, M. (008). Bayesian infence and optimal design in the sparse linear model. Journal of Machine Learning Research, 9, Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. B, 58, Yuan, M., & Lin, Y. (006). Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. B,

Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation

Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation Multilevel Factor Analysis Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation Harvey Goldstein William Browne Institute of Education, Univsity of London Abstract A vy

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Modelling geoadditive survival data

Modelling geoadditive survival data Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Bayesian Linear Regression

Bayesian Linear Regression Bayesian Linear Regression Sudipto Banerjee 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. September 15, 2010 1 Linear regression models: a Bayesian perspective

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Generalized Linear Models and Its Asymptotic Properties

Generalized Linear Models and Its Asymptotic Properties for High Dimensional Generalized Linear Models and Its Asymptotic Properties April 21, 2012 for High Dimensional Generalized L Abstract Literature Review In this talk, we present a new prior setting for

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

The Group-Lasso for Generalized Linear Models: Uniqueness of Solutions and Efficient Algorithms

The Group-Lasso for Generalized Linear Models: Uniqueness of Solutions and Efficient Algorithms : Uniqueness of Solutions and Efficient Algorithms Volker Roth VOLKER.ROTH@UNIBAS.CH Computer Science Department, University of Basel, Bernoullistr. 16, CH-4056 Basel, Switzerland Bernd Fischer BERND.FISCHER@INF.ETHZ.CH

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape

LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape LASSO-Type Penalization in the Framework of Generalized Additive Models for Location, Scale and Shape Nikolaus Umlauf https://eeecon.uibk.ac.at/~umlauf/ Overview Joint work with Andreas Groll, Julien Hambuckers

More information

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined

More information

Bayesian model selection: methodology, computation and applications

Bayesian model selection: methodology, computation and applications Bayesian model selection: methodology, computation and applications David Nott Department of Statistics and Applied Probability National University of Singapore Statistical Genomics Summer School Program

More information

Statistics for high-dimensional data: Group Lasso and additive models

Statistics for high-dimensional data: Group Lasso and additive models Statistics for high-dimensional data: Group Lasso and additive models Peter Bühlmann and Sara van de Geer Seminar für Statistik, ETH Zürich May 2012 The Group Lasso (Yuan & Lin, 2006) high-dimensional

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Proteomics and Variable Selection

Proteomics and Variable Selection Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial

More information

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays

One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays One-shot Learning of Poisson Distributions Information Theory of Audic-Claverie Statistic for Analyzing cdna Arrays Peter Tiňo School of Computer Science University of Birmingham, UK One-shot Learning

More information

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression

Other likelihoods. Patrick Breheny. April 25. Multinomial regression Robust regression Cox regression Other likelihoods Patrick Breheny April 25 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/29 Introduction In principle, the idea of penalized regression can be extended to any sort of regression

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Kernel Metric Learning For Phonetic Classification

Kernel Metric Learning For Phonetic Classification Knel Metric Learning For Phonetic Classification Jui-Ting Huang, Xi Zhou, Mark Hasegawa-Johnson, and Thomas Huang Beckman Institute, Univsity of Illinois at Urbana-Champaign, Urbana, IL 680, USA Dept.

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Regularization Path Algorithms for Detecting Gene Interactions

Regularization Path Algorithms for Detecting Gene Interactions Regularization Path Algorithms for Detecting Gene Interactions Mee Young Park Trevor Hastie July 16, 2006 Abstract In this study, we consider several regularization path algorithms with grouped variable

More information

The Sample Complexity of Self-Verifying Bayesian Active Learning

The Sample Complexity of Self-Verifying Bayesian Active Learning Liu Yang Steve Hanneke Jaime Carbonell shanneke@stat.cmu.edu Department of Statistics Carnegie Mellon Univsity liuy@cs.cmu.edu Machine Learning Department Carnegie Mellon Univsity jgc@cs.cmu.edu Language

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018

LARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018 LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

Integrated Anlaysis of Genomics Data

Integrated Anlaysis of Genomics Data Integrated Anlaysis of Genomics Data Elizabeth Jennings July 3, 01 Abstract In this project, we integrate data from several genomic platforms in a model that incorporates the biological relationships between

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception COURSE INTRODUCTION COMPUTATIONAL MODELING OF VISUAL PERCEPTION 2 The goal of this course is to provide a framework and computational tools for modeling visual inference, motivated by interesting examples

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31 Hierarchical models Dr. Jarad Niemi Iowa State University August 31, 2017 Jarad Niemi (Iowa State) Hierarchical models August 31, 2017 1 / 31 Normal hierarchical model Let Y ig N(θ g, σ 2 ) for i = 1,...,

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data

Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Causal inference (with statistical uncertainty) based on invariance: exploiting the power of heterogeneous data Peter Bühlmann joint work with Jonas Peters Nicolai Meinshausen ... and designing new perturbation

More information

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig

Genetic Networks. Korbinian Strimmer. Seminar: Statistical Analysis of RNA-Seq Data 19 June IMISE, Universität Leipzig Genetic Networks Korbinian Strimmer IMISE, Universität Leipzig Seminar: Statistical Analysis of RNA-Seq Data 19 June 2012 Korbinian Strimmer, RNA-Seq Networks, 19/6/2012 1 Paper G. I. Allen and Z. Liu.

More information

High-Dimensional Statistical Learning: Introduction

High-Dimensional Statistical Learning: Introduction Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

y(x) = x w + ε(x), (1)

y(x) = x w + ε(x), (1) Linear regression We are ready to consider our first machine-learning problem: linear regression. Suppose that e are interested in the values of a function y(x): R d R, here x is a d-dimensional vector-valued

More information

Testing Restrictions and Comparing Models

Testing Restrictions and Comparing Models Econ. 513, Time Series Econometrics Fall 00 Chris Sims Testing Restrictions and Comparing Models 1. THE PROBLEM We consider here the problem of comparing two parametric models for the data X, defined by

More information

Chapter 2. Data Analysis

Chapter 2. Data Analysis Chapter 2 Data Analysis 2.1. Density Estimation and Survival Analysis The most straightforward application of BNP priors for statistical inference is in density estimation problems. Consider the generic

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Learning Gaussian Process Models from Uncertain Data

Learning Gaussian Process Models from Uncertain Data Learning Gaussian Process Models from Uncertain Data Patrick Dallaire, Camille Besse, and Brahim Chaib-draa DAMAS Laboratory, Computer Science & Software Engineering Department, Laval University, Canada

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart

Machine Learning. Bayesian Regression & Classification. Marc Toussaint U Stuttgart Machine Learning Bayesian Regression & Classification learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic Regression & GP classification, Bayesian Neural

More information

A Bayesian Treatment of Linear Gaussian Regression

A Bayesian Treatment of Linear Gaussian Regression A Bayesian Treatment of Linear Gaussian Regression Frank Wood December 3, 2009 Bayesian Approach to Classical Linear Regression In classical linear regression we have the following model y β, σ 2, X N(Xβ,

More information

Lecture 10: Logistic Regression

Lecture 10: Logistic Regression BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics Lecture 10: Logistic Regression Jie Wang Department of Computational Medicine & Bioinformatics University of Michigan 1 Outline An

More information

Graph Wavelets to Analyze Genomic Data with Biological Networks

Graph Wavelets to Analyze Genomic Data with Biological Networks Graph Wavelets to Analyze Genomic Data with Biological Networks Yunlong Jiao and Jean-Philippe Vert "Emerging Topics in Biological Networks and Systems Biology" symposium, Swedish Collegium for Advanced

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models Kohta Aoki 1 and Hiroshi Nagahashi 2 1 Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

More information

Probabilistic Graphical Networks: Definitions and Basic Results

Probabilistic Graphical Networks: Definitions and Basic Results This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty

Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Journal of Data Science 9(2011), 549-564 Selection of Smoothing Parameter for One-Step Sparse Estimates with L q Penalty Masaru Kanba and Kanta Naito Shimane University Abstract: This paper discusses the

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information