Multilevel Functional Clustering Analysis

Size: px
Start display at page:

Download "Multilevel Functional Clustering Analysis"

Transcription

1 Multilevel Functional Clustering Analysis Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and Engineering Georgia Institute of Technology nserban@isye.gatech.edu Huijing Jiang Business Analytics & Mathematical Sciences IBM T.J. Watson Research Center huijiang@us.ibm.com Abstract: In this paper, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g. subjects) at multiple sub-units (e.g. proteins); that is, there are multiple random functions observed for each unit. To describe the within- and between-variability induced by the hierarchical structure in the data, we take a multilevel functional principal components (MFPCA) approach. We develop and compare a hard clustering method based on the scores derived from the multilevel FPCA and a soft clustering method using a MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and cluster patterns under a series of settings: small vs. moderate number of time points, various noise levels and varying number of repeated measurements or subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes of immune cells. Common and unique response patterns are identified by clustering the expression profiles using our multilevel clustering analysis. Keywords: cluster analysis, hard clustering, functional ANOVA, microarray analysis, multilevel functional data, multilevel principal component analysis, model-based clustering, soft clustering. 1 Correspondent Author

2 1 Introduction Due to an increasing number of applications involving analysis of a large number of observed random functions, exploratory tools such as unsupervised or supervised clustering play an important role in uncovering prevalent patterns among the observed random functions. Specific applications include gene expression profiling from microarray studies (Hastie et al., 2000; Bar-Joseph et al., 2002; Wakefield et al., 2002; Serban and Wasserman, 2005; Booth et al., 2008), clustering subjects by their spinal bone mineral density (James and Sugar, 2003), and summarizing the market value trends for manufacturing companies (Serban, 2009). Functional clustering methods divide into hard and soft (model-based) methods. Hard clustering partitions the set of random functions into non-overlapping subsets according to a similarity measure (e.g. correlation). In soft clustering, the underlying assumption is that the observed random functions are realizations from a mixture process where the mixture weights are the cluster probabilities. The cluster membership is not fixed as in hard clustering but random following a multinomial distribution; therefore, the soft clustering model is equivalent to a mixture of densities model. Examples of hard clustering methods are by Hastie et al. (2000), Bar-Joseph et al. (2002); Serban and Wasserman (2005); Chiou and Li (2008). Examples of soft clustering are by James and Sugar (2003); Fraley and Raftery (2002); Wakefield et al.(2002) and Booth, Casella & Hobert (2008). In the existing literature, clustering algorithms are developed for and applied to data consisting of one random function or vector for each unit to be clustered - one level data. In this paper, we introduce clustering methods for multilevel functional data; that is, cluster X i (t), i = 1,..., I where X i (t) is a multidimensional random function. For simplicity, we will focus on two-level data: X ij (t) for j = 1,..., J and i = 1,..., I where j indexes subunits; that is, for each unit (e.g. subject, product or gene), we observe random functions for J subunits. The underlying model is functional ANOVA 1

3 X ij (t) = α(t) + β j (t) + Y i (t) + W ij (t) + ε ij (t) (1) where α(t) and β j (t) for j = 1,..., J are fixed functional means specifying the global trend, and respectively, subunit-specific functional trends. For simplicity, we assume α(t) = 0 and β j (t) = 0; when non-zero, we can use standard nonparametric methods to estimate them. Under this framework, we pose two clustering problems: Clustering by similarity of unit-specific means (at level 1): two units i 1 and i 2 are in the same cluster if their unit-specific means Y i1 (t) and Y i2 (t) are similar in shape. Clustering by similarity of within-unit deviations (at level 2): two units i 1 and i 2 are in the same cluster if their corresponding deviations from the unit-specific means, {W i1 j} j=1,...,j and {W i2 j} j=1,...,j, are dynamically similar or they move together over time. The first clustering problem identifies groups of units which behave similarly in average across J subunits and it can be viewed as an extension from the existing functional clustering approaches. Following this extension, this clustering problem could be simply carried out by estimating the unit-specific means Y i (t) using nonparametric methods and cluster the smooth means using functional clustering algorithms. In this paper, we call this method the level-1 naive clustering approach. A second modeling alternative is to decompose the functional ANOVA model following the multilevel functional principal component analysis (MFPCA) introduced by Di et al. (2008) and Di and Crainiceanu (2010) and cluster the level- 1 estimated scores using common clustering methods such as k-means, k-median, hierarchical clustering and others (Hastie et al., 2009). We call this method the level-1 hard clustering approach. The third approach is a soft clustering model using a MFPCA decomposition. We call this method the level-1 soft clustering approach. The second clustering problem is more unique in its definition. Assuming Y i (t) = 0, each unit features repeated random functions W ij (t), j = 1,..., J, which are dissimilar within the unit. For example, one could observe protein expression profiles (subunits) for a number of subjects (units) in response to an experimental drug. The focus may be on clustering subjects 2

4 responding differently to the drug treatment - whether they are drug-resistant or not - where the response is recorded only for a small number of established proteins. If the proteins respond differently to the experimental drug, W ij (t), j = 1,..., J will be dissimilar within each subject, and therefore, clustering by similarity in Y i (t) doesn t provide the clustering of interest. On the other hand, we expect some grouping of the subjects by similarity of their protein expression profiles, which is a multidimensional measure of whether they are resistant or not to the drug, for example. Clustering at level 2 can therefore be used to identify the grouping of units or subjects where the similarity is not a measure between two univariate random functions as in all existing clustering methods but between two multivariate random functions, {W i1 j} j=1,...,j and {W i2 j} j=1,...,j. Therefore, level-2 clustering assumes that the random functions within each unit are dissimilar up to an overall mean Y i (t) and it commonly applies when the subunits are non-homogeneous (different proteins in this this example or different bacteria in the case study of this paper). Level-2 clustering can be reduced to estimation of the correlation between two samples of random functions and cluster based on the correlation estimates. For example, one msy apply the dynamical correlation analysis introduced by Dubin and Müller (2005) to {W i1 j} j=1,...,j and {W i2 j} j=1,...,j to obtain a correlation value ρ i1,i 2 for each pair of units (i 1, i 2 ) and further apply a distance-based clustering with the correlation matrix {ρ i1,i 2 } i1 =1,...,I;i 2 =1,...,I. However, this approach assumes large J and large number of time points, assumption that does not hold in many applications. Instead, we can apply the MFPCA approach to the multilevel data and cluster the level-2 estimated scores. We call this level-2 hard clustering approach. An alternative approach is a soft clustering model using a MFPCA decomposition. We call this method the level-2 soft clustering approach. In this paper, we discuss advantages and disadvantages of these clustering approaches and validate their performance within a simulation study. We point out here that one underlying advantage of the soft clustering approach is that it provides a natural framework 3

5 for inference on the number of clusters, imputed cluster memberships and cluster means, and it allows incorporating information about the dependence between functions at various levels. However, a drawback is that it is computationally intensive because the estimation of the clustering model components is based on an Expectation-Maximization algorithm. The rest of the paper is organized as follows. In Section 2, we review the ANOVA functional model and its decomposition using the MFPCA approach. We will continue in Section 3 with the description of a series of hard clustering approaches and in Section 4 with the presentation of the soft clustering method. An important aspect of unsupervised clustering is that the number of clusters is unknown. Under the soft clustering model, we discuss a selection method for the number of clusters in Section 5. We assess the performance of the clustering approaches discussed in this paper within a simulation study in Section 6 and within a case study in Section 7. Some technical details are deferred to the Appendix. 2 Multi-level Functional Model Let {X ij (t), j = 1,..., J} be a group of random functions observed over a continuous variable t T (T is the functional domain) for the ith experimental unit with i = 1,..., I (I is the number of units). Generally, the number of units I is large (I >> 100 s) whereas the number of subunits per unit, J, is small (J 2 5). Under the functional ANOVA model in (1) with unknown functional effects, we employ a nonparametric decomposition N 1 X ij (t) = s=1 ξ i,s ϕ (1) s (t) + N 2 r=1 ζ ij,r ϕ (2) r (t) + ε ij (t) (2) where {ξ i,s } s=1,...,n1 and {ζ ij,r } r=1,...,n2,j=1,...,j are the level-1 and level-2 unconditional scores for the ith unit. In this paper, we use the term unconditional in contrast to the term conditional which refers to conditionality on the cluster membership variable in the clustering model. In this paper, we assume 4

6 A.1 E(ξ i,s ) = 0, V ar(ξ i,s ) = τ (1) s for any unit i and E(ξ i,s1 ξ i,s2 ) = 0 for s 1 s 2. A.2 {ϕ (1) s (t), s = 1, 2,...} is an orthogonal basis in L 2 (T ). A.3 E(ζ ij,r ) = 0, V ar(ζ ij,r ) = τ (2) j,r and for r 1 r 2. A.4 {ϕ (2) r (t), r = 1, 2,...} is an orthogonal basis in L 2 (T ). and E(ζ ij,r 1, ζ ij,r2 ) = 0 for any unit i and any subunit j A.5 {ξ i,s, s = 1, 2,...} are uncorrelated with {ζ ij,r, r = 1, 2,...}. There are various approaches for estimating the functional ANOVA model. Recent methods are by Bugli and Lambert (2006), who assume that the bases of functions in A.2 and A.4 are fixed and estimate the scores using penalized splines; Di et al. (2008) and Di and Crainiceanu (2010), who base their estimation procedure on functional principal component analysis; and Kaufman and Sain (2010), who pursue a fully Bayesian approach. An advantage of employing the MFPCA approach is its computational efficiency; the bases of functions are functional principal components which allow reducing the functional space into a lower dimensional space than when fixing the bases of functions. Moreover, it applies to both densely observed as well as sparse data. To this end, our clustering model is based on the MFPCA decomposition. Remark: Assumption A.3 of our clustering model is less restrictive than in the MFPCA by Di et al. (2008) and Di and Crainiceanu (2010). Specifically, in the existing works, MFPCA assumes that V ar(ζ ij,r ) = τ r (2) ; that is, the variances are the same for all subunits. However, as we will discuss in Section 4, the soft clustering model is subject to the more general assumption A.3 when the cluster means vary with the subunit index j. 3 Hard Clustering 3.1 Level-1 Clustering In this section, we describe two approaches to clustering by similarity of unit-specific means; they are both hard clustering methods. Generally, in hard clustering, the underlying as- 5

7 sumption is that the set of units to be clustered I = {1, 2,..., I} is divided into a partition of K subsets, {C 1,..., C K } with C k1 C k2 = for any k 1 and k 2. Two units are in the same cluster if they are similar according to a similarity measure. When the objective is to cluster random functions by shape regardless of scale, the similarity measure is often the correlation between two functions. One common approach to clustering functional data is to first project the random functions into a finite dimensional space using nonparametric decompositions, and cluster based on similarity of the transform coefficients. James and Sugar (2003) dubbed this approach as filtering. Clustering functions by shape using the correlation measure in the functional domain is equivalent to clustering the transform coefficients using the Euclidean distance in the transform domain (Serban and Wasserman, 2005). For multi-level functional data, a naive clustering approach is to first decompose the random functions using an orthogonal basis of functions {ψ 1 (t), ψ 2 (t),...}: X ij (t) = θ p,ij ψ p (t) = Ψ(t)θ ij p=1 where θ ij = (θ 1,ij, θ 2,ij,...) is the vector of coefficients of the random functions observed for unit i in the transform domain. The selection of the basis of functions depends on the smoothness of the underlying regression functions and the irregularity of the design points at which the random functions are observed. Since we observe the random functions at a finite number of time points, we need to truncate the summation in the decomposition above. That is, estimate up to P i < coefficients where P i controls the smoothness of the estimated unit-specific mean Y i (t), and therefore, its selection will impact the accuracy of the estimated cluster memberships. Bugli and Lambert (2006) proposed using a large P i = P to reduce the modeling bias but penalize the influence of the coefficients - penalized smoothing spline. Further, we cluster the estimated mean coefficients ˆθ i = 1 J ˆθ J i=1 ij using common clustering approaches for multivariate data. For densely observed random functions, this approach will perform reasonably well since 6

8 the coefficients θ ij are accurately estimated - ˆθ ij are asymptotically unbiased and consistent. On the other hand, under sparse design (i.e. each random function is observed at a small number of design points), the coefficients θ ij are inaccurately estimated which in turn, will result in inaccurate clustering membership estimation. To overcome this difficulty, one approach is to employ an estimation method which allows borrowing strength across subunits to improve the accuracy of the estimated coefficients for individual units. Consequently, our proposed algorithm for clustering at level 1 is: 1. Apply MFPCA to impute the scores at level 1: ˆξi,s ; and 2. Apply a multivariate clustering algorithm to the estimated scores ˆξ i,s where the similarity measure is the Euclidean distance (d(i 1, i 2 ) = ˆξ i1 ˆξ i2 2 for i 1, i 2 I). This algorithm is equivalent to clustering the unit-specific means Y i (t) by shape regardless of scale, or, more precisely, clustering by correlation in the functional space. By borrowing strength across subunits, the clustering membership is more accurately estimated than for the naive approach as supported by our simulation study (see Section 6). 3.2 Level-2 Clustering Clustering by similarity of within-unit deviations requires defining a similarity measure between the groups of random functions {W i1 j} j=1,...,j and {W i2 j} j=1,...,j. For large J and densely sampled time domain, one such measure is the dynamical correlation for multivariate longitudinal data by Dubin and Müller (2005). However, it is rarely the case that we will have available a large number of subunits J per each unit observed over a large number of time points. Because of this limitation, we propose a hard clustering approach as follows 1. Apply MFPCA to impute the scores at level 2: ˆζ ij,r ; and 2. Apply a multivariate clustering algorithm to the estimated coefficients ˆζ ij,r. The similarity measure is the average L 2 norm d(i 1, i 2 ) = J ˆζ i1 j ˆζ i2 j 2. j=1 7

9 4 Soft Clustering In this section, we introduce a soft clustering approach which allows borrowing strength across random functions within the same cluster and within the same unit (MFPCA). In soft clustering, the underlying assumption is that the complete data are bivariate variables (X i, Z i ) for i = 1,..., I where X i are unit-specific realizations from a multivariate distribution and the cluster membership Z i is a latent variable (Fraley and Raftery, 2002). A common estimation method for soft clustering is the Estimation-Maximization algorithm where at the Estimation step, we impute or predict the cluster membership Z = (Z 1,..., Z I ) along with estimation of the cluster weights π 1,..., π K, and at the Maximization step, we estimate the parameters specifying the conditional distribution of X i Z i, i = 1,..., I. Therefore, we need to specify the conditional distribution X i Z i, i = 1,..., I and the distribution of the latent variable Z i, which in turn, specify the distribution of the complete data. The cluster membership of the ith unit Z i follows a multinomial distribution with proportion parameters π 1,..., π K where K is the number of clusters. X i Z i = k, i = 1,..., I are commonly assumed conditionally independent following a distribution with cluster mean µ k (t) and covariance function Σ k (t, t ). Using a similar framework for clustering multilevel data, the complete data are (X ij, Z (1) i, Z (2) i ) for i = 1,..., I and j = 1,..., J where Z (1) i and Z (2) i are latent variables specifying the clustering membership at level 1, and respectively, at level 2. We assume: The cluster membership Z (1) i of the ith unit has a multinomial distribution with proportion parameters π (1) 1,..., π (1) C 1 where C 1 is the number of clusters at level 1. The cluster membership Z (2) i of the ith unit has a multinomial distribution with proportion parameters π (2) 1,..., π (2) C 2 where C 2 is the number of clusters at level 2. Level-1 Clustering. For clustering at level 1, we assume C 2 = 1 but C 1 1. Therefore, the joint data are (X ij, Z (1) i ). However, to model the distribution of the joint data we need 8

10 to specify the conditional distribution of X i Z (1) i. Following the model in (1), the conditional distribution is: X ij (t) (Z (1) i = k) = N 1 s=1 ν i,s,k ϕ (1) s (t) + N 2 r=1 ζ ij,r ϕ (2) r (t) + ε ij (t) with (3) ν i,k = (ν i,1,k,..., ν i,n1,k) N(µ k, Λ (1) k ) ζ ij = (ζ ij,1,..., ζ ij,n2 ) N(0, Λ (2) j ) where µ k = (µ 1,k,..., µ N1,k) and Λ (1) k is a N 1 N 1 diagonal matrix with diagonal elements λ (1) k = (λ (1) 1,k,..., λ(1) N 1,k ). ν i,s,k = (ξ i,s Z (1) i Under this conditional model, the conditional scores = k) for k = 1,..., C 1 are assumed independent with conditional mean µ s,k and conditional variance λ (1) s,k. For this model, ξ i,s for i = 1,..., I and s = 1,..., N 1 are the unconditional scores at level 1 with a distribution following assumption A.1. The scores at level 2 are unconditional of the clustering latent variable Z (1), and therefore, their distribution follows the assumption A.3. From the conditional and unconditional models, we derive 0 = E(ξ i,s ) = E(E(ξ i,s Z (1) i )) = τ (1) s = V(ξ i,s ) = C 1 k=1 π (1) k (λ(1) C 1 k=1 s,k + C 1 µ2 s,k) ( π (1) k E(ν i,s,k) = k=1 C 1 π (1) k µ s,k) 2 = k=1 C 1 π (1) k µ s,k (4) k=1 π (1) k (λ(1) s,k + µ2 s,k). (5) It follows that the clustering model at level 1 (Model 1) is X ij (t) = N 1 s=1 ξ i,sϕ (1) s (t) + N 2 r=1 ζ ij,rϕ (2) r (t) + ε ij (t) ξ i,s (Z (1) i = k) N(µ s,k, λ (1) s,k ) Z (1) i Multinomial(1; π (1) 1,..., π (1) C 1 ) ζ ij,r N(0, λ (2) j,r ) indep. of ξ i,s,k, Z (1) i (6) 9

11 subject to the constrain C 1 k=1 π(1) k µ s,k = 0 by (4). We note that the relationship between conditional and unconditional variances in equation (5) does not impose a constraint. Under this clustering set up, the kth cluster mean is E(X ij (t) Z (1) i = k) = E(Y i (t) Z (1) i = k) = N 1 s=1 µ s,k ϕ (1) s (t). (7) Level-2 Clustering. For clustering at level 2, we assume C 1 = 1 but C 2 1. Therefore, the joint data are (X i, Z (2) i ) and the conditional distribution of X i Z (2) i is: X ij (t) (Z (2) i = k) = N 1 s=1 ξ i,s ϕ (1) s (t) + N 2 r=1 δ ij,r,k ϕ (2) r (t) + ε ij (t) with (8) ξ i = (ξ i,1,..., ξ i,n1 ) N(0, Λ (1) ) δ ij,k = (δ ij,1,k,..., δ ij,n2,k) N(η jk, Λ (2) j,k ) where η jk = (η j,1,k,..., η j,n2,k) and Λ (2) jk is an N 2 N 2 diagonal matrix with diagonal elements λ (2) jk = (λ(2) j,1,k,..., λ(2) j,n 2,k ). Under this conditional model, the conditional scores at level 2, δ ij,r,k = (ζ ij,r Z (2) i = k), are assumed independent with conditional mean η j,r,k and conditional variance τ (2) j,r,k for k = 1,..., C 2. For this model, ζ ij,r s are the unconditional scores in the unconditional model (2) assumed independent with mean zero (E(ζ ij,r ) = 0) and constant variance across units (V(ζ ij,r ) = τ (2) j,r ) as provided in assumption A.3. From the conditional and unconditional models, we derive 0 = E(ζ i,s ) = E(E(ζ i,s Z (2) i )) = τ (2) j,r = V(ζ ij,r) = C 2 π (2) k (λ(1) k=1 C 2 k=1 π (2) k E(δ i,s,k) = j,r,k + C 2 η2 j,r,k) ( k=1 C 2 k=1 π (2) k η j,r,k) 2 = π (2) k η s,k (9) C 2 k=1 π (2) k (λ(1) j,r,k + η2 j,r,k) (10) Similar to the clustering model at level 1, the clustering model at level 2 (Model 2) is 10

12 X ij (t) = N 1 s=1 ξ i,sϕ (1) s (t) + N 2 r=1 ζ ij,rϕ (2) r ζ ij,r (Z (2) i = k) N(η j,k, Λ (1) j,k ) Z (2) i Multinomial(1; π (2) 1,..., π (2) k ) ξ i,s N(0, τ (1) s ) indep. of ζ ij,r,k, Z (2) i (t) + ε ij (t) (11) subject to the constraint C 2 k=1 π(2) k η j,r,k = 0 by (9). On the other hand, the relationship between unconditional and conditional variances in equation (10) requires that the unconditional variances differ across subunits when η jk varies with j leading to assumption A.3 in Section 2. However, MFPCA as introduced by Di et al. (2009) does not allow for the eigenvalues at level-2 to vary across subunits. For this, the estimated level-2 scores will provide lower accuracy clustering when the number of repeated subunits, J, is large; this observation is supported by our simulation study. Under this clustering set up, the kth cluster trend for the jth condition is E(X ij (t) Z (2) i = k) = E(W ij (t) Z (2) i = k) = N 2 r=1 η j,r,k ϕ (2) r (t). (12) The formulation and estimation of the level-1 and level-2 joint clustering model with C 1 1 and C 2 1 is provided in the Supplemental Material. The estimation method is an iterative likelihood-based algorithm. 5 Model Selection The clustering models described in the previous section depend on a series of parameters which are assumed fixed: C 1, C 2, N 1 and N 2. We identify two model selection problems: (1) Selecting the number of eigenfunctions which explain a large percentage of the variability between units (selecting N 1 ) and within units (selecting N 2 ); and (2) Selecting the number of clusters at level 1 (selecting C 1 ) and/or the number of clusters at level 2 (selecting C 2 ). We can select N 1 and N 2 using the unconditional MFPCA model. Di et al. (2008) and Di 11

13 and Crainiceanu (2010) discuss various alternative methods for selection of the number of basis functions and we follow their direction. For identifying the number of clusters, we focus on likelihood-based approaches. Common variable selection methods, such as the Akaike information criterion (AIC), and Bayesian information criterion (BIC) have been employed for estimating the number of clusters (Fraley and Raftery, 2002). Both criteria select the number of clusters by minimizing 2 log L(Ψ) + 2P (C 1, C 2 ) where log L( ˆΨ) is the log likelihood of observed data, which measures the lack of fit. In our multi-level clustering model, log L(Ψ) = I C 1 C 2 i=1 k=1 k =1 π (1) k π(2) k log f(x i ; µ k, η k, Λ (1) k, Λ(2) k, σ 2 ). The second term 2P (C 1, C 2 ) is the penalty term that measures the complexity of the model. For AIC, 2P (C 1, C 2 ) = 2d,and for BIC, 2P (C 1, C 2 ) = (log IJm)d where d = 2C 1 K 1 + 2C 2 K 2 K 1 K 2 + C 1 + C 2 1 is the number of parameters. The number of parameters is Level-1 (C 2 = 1) unequal variance: d = 2N 1 C 1 + 2JN 2 + C N Level-1 (C 2 = 1) equal variance: d = N 1 (C 1 + 1) + 2JN 2 + C N Level-2 (C 1 = 1) unequal variance: d = 2N 1 + 2JN 2 C C 2 JN Level-2 (C 1 = 1) equal variance: d = 2N 1 + JN 2 (C 2 + 1) C 2 JN Many authors (for example, Koehler and Murphee, 1988) observed that models selected using AIC tend to overfit as AIC prefers larger models. In the soft clustering context, this translates into overestimation of the number of clusters (Soromenho, 1933; Celeux and Soromenho, 1996). Alternatively, the likelihood correction using BIC selects more parsimonious models. Consequently, BIC selection criteria has been often used in soft clustering (Fraley and Raftery, 1998). Lereoux (1992) has shown that BIC does not underestimate the true number of components, asymptotically. 12

14 6 Simulation Studies The primary objective of this simulation study is to assess the estimation accuracy of the clustering membership and cluster means under various comparative settings: 1. Varying sparsity in the sampling design; 2. Varying number of subunits J; 3. Varying noise level; and 4. Naive vs. hard vs. soft clustering. 6.1 Level-1 Clustering We generate samples of functions from the joint model (X i, Z (1) i ) described in Section 4. Specifically, we generate Z (1) i, the clustering membership, from multinomial distribution with fixed cluster weights π (1) 1,..., π (1) C 1 across all simulations. For simplicity, we choose C 1 = 2 with π (1) 1 = 1/3 and π (1) 2 = 2/3. The generated data consist of I = 100 units. We vary the number of maximum observations or time points per random function, m = 4, 6, 10, 15 and the number of subunits per unit, J = 3, 4, 5. The conditional variances at level 1 are generated according to two different settings: Equal conditional variances across clusters: λ s,k = 0.9 s 1 for k = 1,..., C 1 ; and Varying conditional variances across clusters: λ s,k = 2 2(k s) 1. The unconditional variances at level 2 are τ jr = j+1 2 2r. The conditional means at level 1 are µ 1 = (3, 2, 1, 0) and µ 2 = ( 1.5, 1, 0.5, 0) selected such that C 1 eigenfunctions are k=1 π(1) k µ s,k = 0. The Φ (1) (t) = ( 2 sin(2πt), 2 cos(2πt), 2 sin(4πt), 2 cos(4πt)) Φ (2) (t) = (1, 3(2t 1), 5(6t 2 6t + 1), 7(20t 3 30t t 1). The number of eigenfunctions at level 1 is N 1 = 4 and at level 2 is N 2 = 4. The noise level for the simulation in this paper is σ = 2. We investigate the estimation accuracy of the cluster membership and cluster means for other values of σ in the Supplemental Material. In our simulation example, because we have the true clustering membership, we can assess 13

15 the accuracy of the clustering prediction for the method introduced in this paper using a clustering/classification error. We measure the clustering error using the Rand index (Rand, 1971), which is the fraction of all misclustered pairs of functions. Let C = {f 1,..., f S } denote the set of true functions, Ĉ = { ˆf 1,..., ˆf S } denote the set of estimated functions, and T and ˆT denote the true and estimated clustering maps, respectively. The Rand index is R(C, Ĉ) = r<s I(T k(f r, f s ) ˆT k (f r, f s )). ( N 2 ) Therefore, the Rand index is low when there are only few misclustered functions. In order to evaluate the accuracy of the estimated cluster means we report the mean square error calculated as RMSE = 1 C 1 C 1 k=1 T (µ k(t) ˆµ k (t)) 2 dt T µ2 k (t) dt. We report the estimation accuracy of the clustering membership and the cluster means for the naive clustering approach where the basis of functions is the radial spline basis, the hard clustering approach discussed in Section 3 and the soft clustering approach discussed in Section 4. We do not report accuracy results for the naive clustering algorithm when m = 4 because of computational instability. The values reported for the Rand index and root mean squared errors are averages over 100 simulations. We also investigated the use of the Gap Statistic (Tibshirani et al., 2001) for identifying the number of clusters for hard clustering and the use of the AIC and BIC model selection criteria for identifying the number of clusters for soft clustering (see Supplemental Material for additional figures). Based on the results reported in Figure 1 and the additional simulation results in the Supplemental Material, we brief the estimation accuracy results as follows: The naive clustering fails under the very sparse sampling design, m = 4, and it is much less accurate than alternative methods in most settings; 14

16 There is a significant improvement in the estimation accuracy for both the clustering membership and the clustering patterns when comparing the hard to the naive clustering; For equal conditional variances, the hard and soft clustering methods perform similarly whereas for varying conditional variances, a more realistic setting, the soft clustering approach performs significantly better uniformly over all settings; As J increases and under equal conditional variances, the clustering estimation accuracy improves for all three methods; however, under the unequal conditional variances setting, the clustering estimation accuracy improves only for the soft clustering approach consistently over all settings. A m increases, the clustering estimation accuracy does not improve significantly for the soft clustering approach but it improves for the naive clustering method. As the noise level increases, the gap in accuracy between soft clustering and other methods increases with much better performance for the soft clustering method at high noise levels. The Gap Statistics for hard clustering accurately identifies the correct number of clusters, C1 = 2, under the assumption of equal conditional variances but not under the assumption of non-equal conditional variances. BIC outperforms AIC in correctly identifying the number of clusters under the assumption of non-equal conditional variances. BIC identifies the number of clusters for most of 100 simulations. As J increases and as the maximum number of time points, m, increases, the accuracy of AIC diminishes. 6.2 Level-2 Clustering To assess the clustering performance of our soft clustering method at level 2, we simulate C 2 = 2 clusters with π (2) 1 = 1/3 and π (2) 2 = 2/3. The true eigenfunctions are the same as in the previous section and the unconditional variances at level 1 are λ s,k = 0.9 s 1. The conditional means at level 2, η j,k, are selected such that C 2 k=1 π(2) k η j,k = 0. Since in our 15

17 simulations we compare the estimation accuracy for J = 3, 4, 5, the conditional means for cluster 1 are η 1,1 = (4, 3, 2, 1), η 2,1 = (4, 3, 2, 1), η 3,1 = (4, 3, 2, 1), η 4,1 = ( 4, 3, 2, 1), η 5,1 = ( 4, 3, 2, 1) and the means for cluster 2 are η 1,2 = ( 2, 1.5, 1, 0.5), η 2,2 = ( 2, 1.5, 1, 0.5), η 3,2 = ( 2, 1.5, 1, 0.5), η 4,2 = (2, 1.5, 1, 0.5), η 5,2 = (2, 1.5, 1, 0.5). The conditional variances at level 2 are λ jr,k = a kj 2 (2(r 1)) where a kj is a scaling constant randomly generated from U nif(0.5, 1.5) (varying across clusters and across replicates within each unit). Figure 2 provides the accuracy of the clustering membership measured by the Rand index and the accuracy of the clustering patterns measured by the mean square error for the simulation setting above. We do not show the results for equal level-2 conditional variances as this is not a realistic assumption because of the constraint given by (10). We brief the estimation accuracy results as follows: The estimation accuracy of the clustering membership and cluster means improves significantly for the soft clustering approach as compared to the hard-clustering method. One possible reason for this significant improvement is that the hard clustering approach assumes equal unconditional variances of the scores whereas the soft clustering model does not (assumption A.3). Moreover, the soft clustering approach updates the clustering provided by the hard clustering by maximizing a goodness of fit function, the likelihood function; An increase in m does not improve the accuracy of the clustering membership estimated using the hard or soft clustering approach; The accuracy of the cluster membership estimated using the soft clustering model also increases as J increases; Similarly to the level-1 clustering, as σ, the noise level, increases, there is only a slight decrease in the accuracy, more significant for the cluster mean estimates and for the naive clustering method; The Gap Statistics for selecting the number of clusters for the hard clustering approach performs poorly within all settings. 16

18 Similarly to level-1 clustering, BIC outperforms AIC in selecting the number of clusters for the soft clustering method although the gap between the two methods is smaller for level-2 clustering as compared to level-1 clustering. Generally, As m increases, the accuracy of both methods diminishes. 7 Case Study Innate immunity is an antimicrobial host defense in most multi-cellular organism. This immunity system involves a series of cells (macrophages, dendritic cells and others), which in turn activate a pathway of genes. Existing microarray studies of cells infected with various pathogens identified hundreds of differentially expressed genes which could potentially be responsible in the expression pathway. These studies can be divided along several lines, e.g., cell types, bacteria types (Gram-negative and Gram-positive pathogens) and host species (human and mouse). Specific bacteria types are known to trigger very different immune responses (Nau et al., 2002). Immune response microarray experiments consisting of 29 datasets were retrieved and organized from various supporting websites (Lu et al., 2010). In this case study, we selected only six experiments conducted on human macrophages cells infected by different bacteria types. The data consist of I = experimental units or genes each observed for J = 6 bacteria types. The gene expression profiles are observed at m = 8 time points, specifically, t 1 = 0, t 2 = 30, t 3 = 60, t 4 = 120, t 5 = 240, t 6 = 360, t 7 = 720, t 8 = 1440 (in minutes). The data are therefore observed at two levels: unit-specific level where the genes correspond to units in our model description and subunit level which are expression profiles for J = 6 subunits or bacteria. We therefore apply the multi-level clustering methods to identify underlying common responses to different bacteria (level-1 clustering) and to summarize the variability within responses to different bacteria (level-2 clustering). We note here that we first estimate α(t) and β j (t) as means of normalization and remove the estimated means from the data. Then we apply the MFPCA method to obtain the functional principal components 17

19 and the scores for the within and between covariances. The number of selected components is N 1 = 4 and N 2 = 2. An important aspect in the analysis of gene expression profiles is that only few genes are significantly expressed whereas the rest have approximately constant expression profiles. Most genes are so called house keeping genes which are not expressed (more or less independent of stimulus), and therefore, they have constant trends. Serban and Wasserman (2005) point out the challenge of clustering a large number of curves or random functions when most of them are approximately constant. They suggest employing a preliminary filtering step for removing a large percentage of the constant curves followed by clustering of only those which were not removed from the complete set. In this analysis, we therefore apply the level-1 clustering algorithms to the 278 differentially expressed genes identified by Lu et al. (2010) and not to the complete set of genes (i.e. I=278). Similarly, we apply the level-2 clustering algorithms to 292 genes that show significant within-variation in the response to different bacteria as identified by Lu et al. (2010). They correspond to genes that display higher variability across bacteria responses. For details on the the list of differentially expressed genes we refer to lyongu/pub/immune/immune.html. We use different sets of genes for level-1 clustering and level-2 clustering because the grouping has different meaning. Clustering genes at level-1 means clustering by their average expression across all bacteria, and therefore, genes have to be differentially expressed in average across all bacteria. Clustering genes at level-2 means clustering by their bacteria-specific behavior, and therefore, genes are clusters based on their unique responses to different bacteria. We investigated the selection of the number of clusters using the Gap Statistic (Tibshirani et al., 2001) for the hard clustering approach and the AIC selection criteria for the soft clustering method. We also visually assessed the cluster means when deciding about the number of clusters. In this paper, we discuss the results for C 1 = 3 and C 2 = 3. We provide additional results and discussions in the Supplemental Material. 18

20 Figures 3 and 4 display the 5% and 95% quantiles of the observed curves for the genes within each cluster along with the estimated cluster means. For hard clustering, the cluster means are estimated by averaging over the estimated Y i (t) within each cluster. For soft clustering, the cluster means are estimated using the estimation method for the level-1 clustering model discussed in the Supplemental Material. Therefore, when using the soft clustering method, not only the clustering membership is updated but also the cluster means. In the clustering analysis at level-1, we identify the common responses of human macrophages genes to different bacteria. Similarly to Lu et al. (2010), we cluster the expression responses into three categories: constant/unchanged pattern corresponding to inactivated genes; up pattern corresponding to induced genes and down pattern corresponding to suppressed genes. Figure 4 displays the clustering patterns obtained using our soft clustering method. It suggests that out of 278 genes identified as differentially expressed by Lu et al., only 62 genes have non-constant expression profiles. Moreover, the pattern of cluster 1 indicates that 17 genes are suppressed at early time points then slowly stabilizing whereas the 45 genes in cluster 3 are induced by the bacteria at first then stabilizing. Interestingly, the average response time is around 280 minutes after treatment; that is, the responsible genes in human macrophages cells respond within 4-5 hours. Comparing the two clustering methods (Figures 3 and 4), we find that the clustering derived from simply applying hard clustering to the MFPCA score classifies most of the curves in cluster 1 whereas clusters 2 and 3 are very similar in trends (in fact, cluster 3 consists of only 2 genes). Therefore, hard clustering does not pick up on the three expression patterns described above while assigning most of the genes to one cluster. In the clustering analysis at level-2, we summarize the within-bacteria variability in the response of human macrophages genes to different bacteria. Figures 5 and 6 display the level 2 cluster means for C 2 = 3. Each of the three subplots of Figures 5 and 6 contain six cluster patterns, which correspond to the summarized immune responses to J = 6 bacteria. 19

21 The results of the soft clustering method shown in Figure 6 imply that 10 genes in cluster 1 are induced by three bacteria (two Gram- and one Gram+) and suppressed by two bacteria (one Gram- and one Gram+) while the 32 genes in cluster 3 have opposite responses to these six bacteria. The remaining 254 genes show similar activating patterns to all six bacteria types. While the soft clustering method identifies genes with varying within-unit expression means (clusters 1 and 3), level-2 hard clustering does not capture much variation in the immune responses to different bacteria. For instance, the 45 genes in cluster 2 have similar upward response patterns to six bacteria as shown in Figure 5(b). Comparing the soft and hard clusterings for various numbers of clusters, we find that the hard clustering algorithm is sensitive to outlying patters in the sense that it tends to estimate clusters consisting of one unit. For example, the clustering in Figure 5 is for K = 4 where the forth cluster consists of one unit or gene (not shown as a separate cluster). Similarly, when we set K = 5, two of the clusters at level 2 consists of one gene only. Last, we compare the clustering at level-1 and level-2 estimated using the hard and soft clustering algorithms using the Rand Index, R(H, S) where H is the clustering membership using the hard clustering method and S is the clustering membership using the soft clustering method. We find that the mismatch in the cluster membership between the two clusterings measured by R(H, S) increases as the number of cluster increases, and it is in the range of 31% 40% for level-1 clustering and in the range of 25% 41% for level-2 clustering, which suggests that the soft clustering approach updates the cluster membership significantly. 8 Discussions In this paper, we introduce a means for clustering multilevel functional data; the clustering algorithm identifies groups of functions which are similar in their overall behavior across repeated measurements or/and similar in within-unit trends. The underlying clustering (hard or soft) begins with the specification of a model using functional principal component analysis and either clusters the resulting estimated scores using common hard clustering methods or 20

22 updates the estimated scores assuming a clustering model. The estimation procedure for the latter approach is iterative and therefore more computational expensive; however, in contrast to purely algorithmic approach, it allows inference on the model parameters such as the number of clusters, imputed cluster memberships and cluster means. From our simulation studies, we find that clustering by similarity of unit-specific means at level-1 using either of the two approaches will provide similar results as soon as there is not a significant difference in within-cluster variability across clusters. Therefore, the extra computational cost incurred by updating the scores using the soft clustering approach will be offset by an improve in the estimation accuracy as soon as the variability between functions assigned to the same cluster (i.e. the conditional variances) will largely vary from one cluster to another. Because it most often may be difficult to evaluate how the between-cluster variability varies across clusters since the clustering is unknown, we suggest proceeding with soft clustering if the number of units to be clustered is not large (I ), as the additional computational cost is not great. On the other hand, for large I we recommend either a more computational efficient implementation of the soft clustering method or simply the application of the hard clustering approach with the understanding of its shortcomings including lower estimation accuracy. Clustering by similarity of within-unit deviations at level 2 is more difficult as it pools information across multiple functions simultaneously. The hard clustering approach using the estimated scores from MFPCA provides inaccurate clustering. On the other hand, after updating the scores and the cluster membership using the soft clustering approach, the accuracy of the estimated cluster membership and cluster means improves over the hard clustering algorithm significantly. We therefore recommend using the soft clustering approach over the hard clustering method under any setting, small or large noise level, lower or higher sparsity in the sampling design. Last, our case study clearly shows that soft clustering outperforms hard clustering. In 21

23 contrast to soft clustering, hard clustering at level-1 is sensitive to outlying patters in the sense that it tends to estimate clusters consisting of one unit, it doesn t identify the primary gene expression trends. In addition, hard clustering at level-2 does not capture the patterns in the between-unit variability providing clusters. Acknowledgement The authors are thankful to Ciprian Crainiceanu for providing useful insights about the research in this paper and to Chong-Zhi Di for providing the software for sparse MFPCA. The authors thank to the referees and associate editor for helpful comments. References [1] Z. Bar-Joseph, G. Gerber, D.K. Gifford, T.S. Jaakkola (2002), A new approach to analyzing gene expression time series data, Proceedings of the 6th Annual International Conference on RECOMB, [2] J.G. Booth, G. Casella, J.P. Hobert (2008), Clustering Using Objective Functions and Stochastic Search, Journal of the Royal Statistical Society, B 70(1), p [3] C. Bugli and P. Lambert (2006) Functional ANOVA with random functional effects: an application to event-related potentials modelling for electroencephalograms analysis, Statistics in Medicine, 25, [4] H. Cardot (2007), Conditional functional principal components analysis, Scandinavian Journal of Statistics, 34, [5] J.M. Chiou and P.L. Li (2007), Functional clustering and identifying substructures of longitudinal data, Journal of the Royal Statistical Society, Series B, 69, [6] Di, C. Z., Crainiceanu, C. M., Caffo, B. S. and Punjabi, N. M. (2009), Multilevel Functional Principal Component Analysis, Annals of Applied Statistics, 3(1), [7] Di, C.Z., Crainiceanu, C.M. (2010). Multilevel Sparse Functional Principal Component Analysis, Johns Hopkins University, Dept. of Biostatistics, Working Papers. [8] C. Fraley, A. E. Raftery (2002), Model-Based Clustering, Discriminant Analysis, and Density Estimation, Journal of the American Statistical Association, 97, p [9] T. Hastie, R. Tibshirani, M. Eisen, A. Alizadeh, R. Levy, L. Staudt, W. Chan, D. Botstein, P. Brown (2000), Gene shaving as a method for identifying distinct sets of genes with similar expression patterns, Genome Biology, I(2):research

24 [10] T. Hastie, R. Tibshirani and J. Friedman (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer. [11] G.M. James, and C.A. Sugar (2003), Clustering for sparsely sampled functional data, Journal of the American Statistical Association, 98, [12] G. M. James, T. Hastie, and C. Sugar (2000) Principal Component Models for Sparse Functional Data, Biometrika, 87, [13] C. Kaufman and S.R. Sain (2010) Bayesian Functional ANOVA Modeling Using Gaussian Process Prior Distributions, Bayesian Analysis, 5, Number 1, pp [14] Y. Lu, R. Rosenfeld, G.J. Nau, Z. Bar-Joseph (2010), Cross Species Expression Analysis of Innate Immune Response, Journal of Computational Biology, 17(3), [15] G.J. Nau, J.F.L. Richmond, A. Schlesinger, E.G. Jennings, E.S. Lander, R.A. Young (2002), Human macrophage activation programs induced by bacterial pathogens, Proceedings of the National Academy of Sciences USA, 99, [16] J.O. Ramsay and B.W. Silverman, B.W. (2002) Applied Functional Data Analysis, Springer, New York. [17] W. M. Rand (1971), Objective Criteria for the Evaluation of Clusterings Methods, J. of American Statistical Association, 66, p [18] N. Serban (2008), Clustering in the Presence of Heteroscedastic Errors, Journal of Nonparametric Statistics, 20, 7, [19] N. Serban (2009), Clustering Confidence Sets, Journal of Statistical Planning and Inference, 139, (2009), [20] N. Serban, L. Wasserman (2005), CATS: Cluster Analysis by Transformation and Smoothing, J. of the American Statistical Association, 100, [21] C. Sugar and G. James (2003), Finding the Number of Clusters in a Data Set: An Information Theoretic Approach, Journal of the American Statistical Association, 98, (2003), [22] R. Tibshirani, G. Walther, T. Hastie (2001), Estimating the number of clusters in a dataset via the gap statistic, Journal of the Royal Statististical Society, B 63, [23] F. Vaida, S. Blanchard (2005), Conditional Akaike information for mixed-effects models, Biometrika, 92(2), [24] F. Yao, H.G. Müller, and J.L. Wang (2005), Functional data analysis for sparse longitudinal data, Journal of the American Statistical Association, 100,

25 Rand Index J=3/Naive J=4/Naive J=5/Naive J=3/Hard J=4/Hard J=5/Hard J=3/Soft J=4/Soft J=5/Soft Rand Index J=3/Naive J=4/Naive J=5/Naive J=3/Hard J=4/Hard J=5/Hard J=3/Soft J=4/Soft J=5/Soft Maximum number of time points Maximum number of time points (a) Rand Index: Equal Variance (b) Rand Index: Nonequal Variance RMSE J=3/Naive J=4/Naive J=5/Naive J=3/Hard J=4/Hard J=5/Hard J=3/Soft J=4/Soft J=5/Soft RMSE J=3/Naive J=4/Naive J=5/Naive J=3/Hard J=4/Hard J=5/Hard J=3/Soft J=4/Soft J=5/Soft Maximum number of time points Maximum number of time points (c) MSE: Equal Variance (d) MSE: Nonequal Variance Figure 1: Leve1-clustering: Comparing naive, hard and soft clustering for various values of J = 3, 4, 5 and for equal vs. non-equal level-2 eigenvalues. We evaluate the estimation accuracy from the cluster membership (Rand Index) and for the cluster means (MSE). 24

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1

FUNCTIONAL DATA ANALYSIS. Contribution to the. International Handbook (Encyclopedia) of Statistical Sciences. July 28, Hans-Georg Müller 1 FUNCTIONAL DATA ANALYSIS Contribution to the International Handbook (Encyclopedia) of Statistical Sciences July 28, 2009 Hans-Georg Müller 1 Department of Statistics University of California, Davis One

More information

Mixture of Gaussian Processes and its Applications

Mixture of Gaussian Processes and its Applications Mixture of Gaussian Processes and its Applications Mian Huang, Runze Li, Hansheng Wang, and Weixin Yao The Pennsylvania State University Technical Report Series #10-102 College of Health and Human Development

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology

Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Missing Data Issues in the Studies of Neurodegenerative Disorders: the Methodology Sheng Luo, PhD Associate Professor Department of Biostatistics & Bioinformatics Duke University Medical Center sheng.luo@duke.edu

More information

Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Curves clustering with approximation of the density of functional random variables

Curves clustering with approximation of the density of functional random variables Curves clustering with approximation of the density of functional random variables Julien Jacques and Cristian Preda Laboratoire Paul Painlevé, UMR CNRS 8524, University Lille I, Lille, France INRIA Lille-Nord

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

Mixture regression for observational data, with application to functional regression models

Mixture regression for observational data, with application to functional regression models Mixture regression for observational data, with application to functional regression models arxiv:1307.0170v1 [stat.me] 30 Jun 2013 Toshiya Hoshikawa IMJ Corporation July 22, 2013 Abstract In a regression

More information

1 We would like to thank Alexander Gray and Nikolaos Vasiloglou II for their helpful suggestions and for

1 We would like to thank Alexander Gray and Nikolaos Vasiloglou II for their helpful suggestions and for Clustering Random Curves Under Spatial Interdependence with Application to Service Accessibility Huijing Jiang Business Analytics & Mathematical Sciences IBM Thomas J. Watson Research Center huijiang@us.ibm.com

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Degradation Modeling and Monitoring of Truncated Degradation Signals. Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban

Degradation Modeling and Monitoring of Truncated Degradation Signals. Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban Degradation Modeling and Monitoring of Truncated Degradation Signals Rensheng Zhou, Nagi Gebraeel, and Nicoleta Serban School of Industrial and Systems Engineering, Georgia Institute of Technology Abstract:

More information

FuncICA for time series pattern discovery

FuncICA for time series pattern discovery FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models

Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Pre-Selection in Cluster Lasso Methods for Correlated Variable Selection in High-Dimensional Linear Models Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider variable

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Fractal functional regression for classification of gene expression data by wavelets

Fractal functional regression for classification of gene expression data by wavelets Fractal functional regression for classification of gene expression data by wavelets Margarita María Rincón 1 and María Dolores Ruiz-Medina 2 1 University of Granada Campus Fuente Nueva 18071 Granada,

More information

Overview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering

Overview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering Model-based clustering and data transformations of gene expression data Walter L. Ruzzo University of Washington UW CSE Computational Biology Group 2 Toy 2-d Clustering Example K-Means? 3 4 Hierarchical

More information

An Introduction to Graphical Lasso

An Introduction to Graphical Lasso An Introduction to Graphical Lasso Bo Chang Graphical Models Reading Group May 15, 2015 Bo Chang (UBC) Graphical Lasso May 15, 2015 1 / 16 Undirected Graphical Models An undirected graph, each vertex represents

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Analysis Methods for Supersaturated Design: Some Comparisons

Analysis Methods for Supersaturated Design: Some Comparisons Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

Estimating Mixture of Gaussian Processes by Kernel Smoothing

Estimating Mixture of Gaussian Processes by Kernel Smoothing This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Estimating

More information

Grouping of correlated feature vectors using treelets

Grouping of correlated feature vectors using treelets Grouping of correlated feature vectors using treelets Jing Xiang Department of Machine Learning Carnegie Mellon University Pittsburgh, PA 15213 jingx@cs.cmu.edu Abstract In many applications, features

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate

More information

FUNCTIONAL DATA ANALYSIS FOR VOLATILITY PROCESS

FUNCTIONAL DATA ANALYSIS FOR VOLATILITY PROCESS FUNCTIONAL DATA ANALYSIS FOR VOLATILITY PROCESS Rituparna Sen Monday, July 31 10:45am-12:30pm Classroom 228 St-C5 Financial Models Joint work with Hans-Georg Müller and Ulrich Stadtmüller 1. INTRODUCTION

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Graphical Model Selection

Graphical Model Selection May 6, 2013 Trevor Hastie, Stanford Statistics 1 Graphical Model Selection Trevor Hastie Stanford University joint work with Jerome Friedman, Rob Tibshirani, Rahul Mazumder and Jason Lee May 6, 2013 Trevor

More information

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE

REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE REGRESSING LONGITUDINAL RESPONSE TRAJECTORIES ON A COVARIATE Hans-Georg Müller 1 and Fang Yao 2 1 Department of Statistics, UC Davis, One Shields Ave., Davis, CA 95616 E-mail: mueller@wald.ucdavis.edu

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

FUNCTIONAL DATA ANALYSIS

FUNCTIONAL DATA ANALYSIS FUNCTIONAL DATA ANALYSIS Hans-Georg Müller Department of Statistics University of California, Davis One Shields Ave., Davis, CA 95616, USA. e-mail: mueller@wald.ucdavis.edu KEY WORDS: Autocovariance Operator,

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

New Global Optimization Algorithms for Model-Based Clustering

New Global Optimization Algorithms for Model-Based Clustering New Global Optimization Algorithms for Model-Based Clustering Jeffrey W. Heath Department of Mathematics University of Maryland, College Park, MD 7, jheath@math.umd.edu Michael C. Fu Robert H. Smith School

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Estimating subgroup specific treatment effects via concave fusion

Estimating subgroup specific treatment effects via concave fusion Estimating subgroup specific treatment effects via concave fusion Jian Huang University of Iowa April 6, 2016 Outline 1 Motivation and the problem 2 The proposed model and approach Concave pairwise fusion

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices

Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Article Lasso Maximum Likelihood Estimation of Parametric Models with Singular Information Matrices Fei Jin 1,2 and Lung-fei Lee 3, * 1 School of Economics, Shanghai University of Finance and Economics,

More information

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay) Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

Functional Latent Feature Models. With Single-Index Interaction

Functional Latent Feature Models. With Single-Index Interaction Generalized With Single-Index Interaction Department of Statistics Center for Statistical Bioinformatics Institute for Applied Mathematics and Computational Science Texas A&M University Naisyin Wang and

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

A Bayesian Criterion for Clustering Stability

A Bayesian Criterion for Clustering Stability A Bayesian Criterion for Clustering Stability B. Clarke 1 1 Dept of Medicine, CCS, DEPH University of Miami Joint with H. Koepke, Stat. Dept., U Washington 26 June 2012 ISBA Kyoto Outline 1 Assessing Stability

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

Univariate shrinkage in the Cox model for high dimensional data

Univariate shrinkage in the Cox model for high dimensional data Univariate shrinkage in the Cox model for high dimensional data Robert Tibshirani January 6, 2009 Abstract We propose a method for prediction in Cox s proportional model, when the number of features (regressors)

More information

An Adaptive LASSO-Penalized BIC

An Adaptive LASSO-Penalized BIC An Adaptive LASSO-Penalized BIC Sakyajit Bhattacharya and Paul D. McNicholas arxiv:1406.1332v1 [stat.me] 5 Jun 2014 Dept. of Mathematics and Statistics, University of uelph, Canada. Abstract Mixture models

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Some properties of Likelihood Ratio Tests in Linear Mixed Models Some properties of Likelihood Ratio Tests in Linear Mixed Models Ciprian M. Crainiceanu David Ruppert Timothy J. Vogelsang September 19, 2003 Abstract We calculate the finite sample probability mass-at-zero

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Leverage Sparse Information in Predictive Modeling

Leverage Sparse Information in Predictive Modeling Leverage Sparse Information in Predictive Modeling Liang Xie Countrywide Home Loans, Countrywide Bank, FSB August 29, 2008 Abstract This paper examines an innovative method to leverage information from

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Tutorial on Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation Tutorial on Approximate Bayesian Computation Michael Gutmann https://sites.google.com/site/michaelgutmann University of Helsinki Aalto University Helsinki Institute for Information Technology 16 May 2016

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2016 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Raied Aljadaany, Shi Zong, Chenchen Zhu Disclaimer: A large

More information

An Unbiased C p Criterion for Multivariate Ridge Regression

An Unbiased C p Criterion for Multivariate Ridge Regression An Unbiased C p Criterion for Multivariate Ridge Regression (Last Modified: March 7, 2008) Hirokazu Yanagihara 1 and Kenichi Satoh 2 1 Department of Mathematics, Graduate School of Science, Hiroshima University

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Multilevel Cross-dependent Binary Longitudinal Data

Multilevel Cross-dependent Binary Longitudinal Data Multilevel Cross-dependent Binary Longitudinal Data Nicoleta Serban 1 H. Milton Stewart School of Industrial Systems and Engineering Georgia Institute of Technology nserban@isye.gatech.edu Ana-Maria Staicu

More information

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Inversion Base Height. Daggot Pressure Gradient Visibility (miles) Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:

More information

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77

Linear Regression. September 27, Chapter 3. Chapter 3 September 27, / 77 Linear Regression Chapter 3 September 27, 2016 Chapter 3 September 27, 2016 1 / 77 1 3.1. Simple linear regression 2 3.2 Multiple linear regression 3 3.3. The least squares estimation 4 3.4. The statistical

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Lecture 5: November 19, Minimizing the maximum intracluster distance

Lecture 5: November 19, Minimizing the maximum intracluster distance Analysis of DNA Chips and Gene Networks Spring Semester, 2009 Lecture 5: November 19, 2009 Lecturer: Ron Shamir Scribe: Renana Meller 5.1 Minimizing the maximum intracluster distance 5.1.1 Introduction

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Sparseness and Functional Data Analysis

Sparseness and Functional Data Analysis Sparseness and Functional Data Analysis Gareth James Marshall School of Business University of Southern California, Los Angeles, California gareth@usc.edu Abstract In this chapter we examine two different

More information

Model Based Clustering of Count Processes Data

Model Based Clustering of Count Processes Data Model Based Clustering of Count Processes Data Tin Lok James Ng, Brendan Murphy Insight Centre for Data Analytics School of Mathematics and Statistics May 15, 2017 Tin Lok James Ng, Brendan Murphy (Insight)

More information

Approaches for Multiple Disease Mapping: MCAR and SANOVA

Approaches for Multiple Disease Mapping: MCAR and SANOVA Approaches for Multiple Disease Mapping: MCAR and SANOVA Dipankar Bandyopadhyay Division of Biostatistics, University of Minnesota SPH April 22, 2015 1 Adapted from Sudipto Banerjee s notes SANOVA vs MCAR

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

Probabilistic Fisher Discriminant Analysis

Probabilistic Fisher Discriminant Analysis Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux Choosing a model in a Classification purpose Guillaume Bouchard, Gilles Celeux Abstract: We advocate the usefulness of taking into account the modelling purpose when selecting a model. Two situations are

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information