arxiv: v2 [stat.ml] 19 Oct 2016

Size: px
Start display at page:

Download "arxiv: v2 [stat.ml] 19 Oct 2016"

Transcription

1 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv: v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University Abstract Trevor Hastie Department of Statistics Stanford University We deveop a cass of rues spanning the range between quadratic discriminant anaysis and naive Bayes, through a path of sparse graphica modes. A group asso penaty is used to introduce shrinkage and encourage a simiar pattern of sparsity across precision matrices. It gives sparse estimates of interactions and produces interpretabe modes. Inspired by the connected-components structure of the estimated precision matrices, we propose the community Bayes mode, which partitions features into severa conditiona independent communities and spits the cassification probem into separate smaer ones. The community Bayes idea is quite genera and can be appied to non-gaussian data and ikeihood-based cassifiers. 1 Introduction In the generic cassification probem, the outcome of interest G fas into K unordered casses, which for convenience we denote by {1, 2,..., K}. Our goa is to buid a rue for predicting the cass abe of an item based on p measurements of features X R p. The training set consists of the cass abes and features for n items. This is an important practica probem with appications in many fieds. Quadratic discriminant anaysis (QDA) can be derived as the maximum ikeihood method for norma popuations with different means and different covariance matrices. It is a favored too when there are some strong interactions and inear boundaries cannot separate the casses. However, QDA is poory posed when the sampe size n k is not consideraby arger than p for any cass k, and ceary i-posed if n k p. Therefore it encourages us to empoy a reguarization method [17, 24]. Athough there aready exist a number of proposas to reguarize inear discriminant anaysis (LDA) [1, 4, 6, 8, 9, 11, 23, 26], few methods have been deveoped to reguarize QDA. Friedman [7] suggests appying a ridge penaty to within-cass covariance matrices, and Price et a. [19] propose to add a ridge penaty and a ridge fusion penaty to the og-ikeihood. 1

2 For both methods, no eements of the resuting within-cass precision matrices wi be zero, eading to dense interaction terms in the discriminant functions and difficuties in interpretation. Assuming sparse conditions on both unknown means and covariance matrices, Li and Shao [13] propose to construct mean and covariance estimators by threshoding. Sun and Zhao [21] assume bock-diagona structure for covariance matrices and estimate each bock by 1 reguarization. The bocks are assumed to be of the same size, and are formed based on two sampe t statistics. However, both methods can ony appy to two-cass cassification. Another approach is to assume conditiona independence of the features (naive Bayes). Naive Bayes cassifiers often outperform far more sophisticated aternatives when the dimension p of the feature space is high, but the conditiona independence assumption may be too rigid in the presence of strong interactions. In this paper we deveop a cass of rues spanning the range between QDA and naive Bayes using a group asso penaty [27]. A group asso penaty is appied to the (i, j)th eement across a K within-cass precision matrices, which forces the zeros in the K estimated precision matrices to occur in the same paces. This shared pattern of sparsity resuts in a sparse estimate of interaction terms, making the cassifier easy to interpret. We refer to this cassification method as Sparse Quadratic Discriminant Anaysis (SQDA). The connected components in the resuting precision matrices correspond exacty to those obtained from simpe threshoding rues on the within-cass sampe covariance matrices [5]. This suggests us to partition features into severa independent communities before estimating the parameters [3, 10, 18]. Therefore, we propose the community Bayes mode, a simpe idea to identify and make use of conditiona independent communities of features. Furthermore, it is a genera approach appicabe to non-gaussian data and any ikeihoodbased cassifiers. Specificay, the community Bayes works by partitioning features into severa communities, soving separate cassification probems for each community, and combining the community-wise resuts into one fina prediction. We show that this approach can improve the accuracy and interpretabiity of the corresponding cassifier. The paper is organized as foows. In section 2 we discuss sparse quadratic discriminant anaysis and the sampe covariance threshoding rues. In section 3 we discuss the community Bayes idea. Simuated and rea data exampes appear in Section 4, concuding remarks in Section 5 and the proofs are gathered in the Appendix. 2 Sparse Quadratic Discriminant Anaysis Suppose we have training data (x i, g i ) R p K, i = 1, 2,..., n, where x i s are observations with measurements on a set of p features and g i s are cass abes. We assume that the n k observations within the kth cass are identicay distributed as N(µ k, Σ k ), and the n = k n k observations are independent. We 2

3 et π k denote the prior for cass k. Then the quadratic discriminant function is δ k (x) = 1 2 og det Σ 1 k 1 2 (x µ k) T Σ 1 k (x µ k) + og π k. (1) Let Θ (k) = Σ 1 k denote the true precision matrix for cass k, and θ ij = (θ (1) ij,..., θ(k) ij ) denote the vector of the (i, j)th eement across a K precision matrices. We propose to estimate µ, π, Θ by maximizing the penaized og ikeihood K n k 2 og det Θ(k) 1 (x i µ k ) T Θ (k) (x i µ k ) + n k og π k λ θ ij k=1 g i=k (2) The ast term is a group asso penaty, appied to the (i, j)th eement across a K precision matrices, which forces a simiar pattern of sparsity across a the precision matrices. Hence, we refer to this cassification method as Sparse Quadratic Discriminant Anaysis (SQDA). Let Θ = (Θ (1),..., Θ (K) ). Soving (2) gives us the estimates ˆµ k = 1 x i (3) n k ˆπ k = n k n g i=k ˆΘ =argmax Θ (k) 0 K k=1 i j (4) ) n k (og det Θ (k) tr(s (k) Θ (k) ) λ θ ij 2, (5) i j where S (k) = 1 n k g i=k (x i ˆµ k )(x i ˆµ k ) T is the sampe covariance matrix for cass k. This mode has severa important features: 1. λ is a nonnegative tuning parameter controing the simutaneous shrinkage of a precision matrices toward diagona matrices. The vaue λ = 0 gives rise to QDA, whereas λ = yieds the naive Bayes cassifier. Vaues between these imits represent degrees of reguarization ess severe than naive Bayes. Since it is often the case that even sma amounts of reguarization can argey eiminate quite drastic instabiity, smaer vaues of λ (smaer than ) have the potentia of superior performance when some features have strong partia correations. 2. The group asso penaty forces ˆΘ (1),..., ˆΘ (K) to share the ocations of the nonzero eements, eading to a sparse estimate of interactions. Therefore, our method produces more interpretabe resuts. 3. The optimization probem (5) can be separated into independent subprobems of the same form by simpe screening rues on S (1),..., S (K). This eads to a potentiay massive reduction in computationa compexity. 3

4 The convex optimization probem (5) can be quicky soved by R package JGL [5]. Not for the purpose of cassification as here, Danaher et a. [5] propose to jointy estimate mutipe reated Gaussian graphica modes by maximizing the group graphica asso max Θ (k) 0 K ) n k (og detθ (k) tr(s (k) Θ (k) ) λ 1 θ ij 1 λ 2 θ ij 2. k=1 (6) They use the additiona asso penaty to further encourage sparsity within (θ (1) ij,..., θ(k) ij ). However, for the purpose of cassification, our goa is to identify interaction terms that is, we are interested in whether (θ (1) ij,..., θ(k) ij ) is a zero vector instead of whether each eement θ (k) ij is zero. Therefore, we ony use the group asso penaty to estimate precision matrices. We iustrate our method with a handwritten digit recognition exampe. Here we focus on the sub-task of distinguishing handwritten 3s and 8s. We use the same data as Le Cun et a. [12], who normaized binary images for size and orientation, resuting in 8-bit, gray scae images. There are 658 threes and 542 eights in our training set, and 166 test sampes for each. Figure 1 shows a random seection of 3s and 8s. i j i j Figure 1: Exampes of digitized handwritten 3s and 8s. Each image is a 8 bit, gray scae version of the origina binary image. Because of their spatia arrangement the features are highy correated and some kind of smoothing or fitering aways heps. We fitered the data by repacing each non-overapping 2 2 pixe bock with its average. This reduces the dimension of the feature space from 256 to 64. We compare our method with QDA, naive Bayes and a variant of RDA which we refer to as diagona reguarized discriminant anaysis (DRDA). The estimated covariance matrix of DRDA for cass k is ˆΣ (k) (λ) = (1 λ)s (k) + λdiag(s (k) ), with λ [0, 1]. (7) The tuning parameters are seected by 5-fod cross vaidation. Tabe 1 shows the miscassification errors of each method and the corresponding standardized tuning parameters s = P ( ˆΘ(λ))/P ( ˆΘ(0)), where P (Θ) = Σ i j θ (1) ij,..., θ(k) ij 2. The resuts show that SQDA outperforms DRDA, QDA and naive Bayes. In 4

5 Figure 2, we see that the test error of SQDA decreases dramaticay as the mode deviates from naive Bayes and achieves its minimum at a sma vaue of s, whie the test error of DRDA keeps decreasing as the mode ranges from naive Bayes to QDA. This is expected since at sma vaues of s, SQDA ony incudes important interaction terms and shrinks other noisy terms to 0, but DRDA has a interactions in the mode once deviating from naive Bayes. To better dispay the estimated precision matrices, we standardize the estimates to have unit diagona (the standardized precision matrix is equa to the partia correation matrix up to the sign of off-diagona entries). Figure 3 dispays the standardized sampe precision matrices, and the standardized SQDA and DRDA estimates. From Figure 3, we can see that SQDA ony incudes interactions within a diagona band. Tabe 1: Digit cassification resuts of 3s and 8s. Tuning parameters are seected by 5-fod cross vaidation. SQDA DRDA QDA Naive Bayes Test error Training error CV error Standardized tuning parameter Exact covariance threshoding into connected components Mazumder & Hastie [16] and Witten et a. [25] estabish a connection between the graphica asso and connected components. Specificay, the connected components in the estimated precision matrix correspond exacty to those obtained from threshoding the entries of the sampe covariance matrix at λ. For the soution to (5), we have simiar resuts. By simpe threshoding rues on the sampe covariance matrices S (1),..., S (K), the optimization probem (5) can be separated into severa optimization sub-probems of the same form, which eads to huge speed improvements. A more genera resut for the soution to (6) can be found in [5]. Suppose ˆΘ = ( ˆΘ (1),..., ˆΘ (K) ) is the soution to (5) with reguarization parameter λ. Define E (λ) ij = { 1 if (ˆθ (1) (K) ij,..., ˆθ ij ) 0, i j; 0 otherwise. This defines a symmetric graph G (λ) = (V, E (λ) ), namey the estimated concentration graph defined on the nodes V = {1,..., p} with edges E (λ). Suppose it (8) 5

6 Figure 2: The 5-fod cross-vaidation errors (bue) and the test errors (red) of SQDA and DRDA on 3s and 8s. Figure 3: Heat maps of the sampe precision matrices and the estimated precision matrices of SQDA and DRDA. Estimates are standardized to have unit diagona. The first ine corresponds to the precision matrix of 3s and the second ine corresponds to that of 8s. admits a decomposition into κ(λ) connected components G (λ) = κ(λ) =1 G(λ), (9) where G (λ) = (ˆV (λ), E (λ) ) are the components of the graph G (λ). Define S ij = (n 1 S (1) ij,..., n KS (K) ij ) 2.We can aso perform a threshoding 6

7 on the entries of S and obtain a graph edge skeeton defined by { E (λ) 1 if ij = S ij > λ, i j; 0 otherwise. (10) The symmetric matrix E (λ) defines a symmetric graph G (λ) = (V, E (λ) ), which is referred to as the threshoded sampe covariance graph. G (λ) aso admits a decomposition into connected components where G (λ) G (λ) = k(λ) =1 G(λ), (11) = (V (λ), E (λ) ) are the components of the graph G (λ). Theorem 1. For any λ > 0, the components of the estimated concentration graph G (λ) induce exacty the same vertex-partition as that of the threshoded sampe covariance graph G (λ). Formay, κ(λ) = k(λ) and there exists a permutation π on {1,..., k(λ)} such that ˆV (λ) i = V (λ) π(i), i = 1,..., k(λ). (12) Furthermore, given λ > λ > 0, the vertex-partition induced by the components of G (λ) are nested within that induced by the components of G (λ ). That is, κ(λ) κ(λ (λ) ) and the vertex-partition { ˆV } 1 κ(λ) forms a finer resoution of { ˆV (λ ) } 1 κ(λ ). Proof. The proof of this theorem appears in the Appendix. Theorem 1 aows us to quicky check the connected components of the estimated concentration graph by simpe screening rues on S. Notice that for each cass k, the edge set E (k,λ) defined by E (k,λ) ij = 1 {ˆθ(k) ij 0} is nested in E (λ). Therefore, the features can be reordered in such a way that each ˆΘ (k) is bock diagona ˆΘ (k) = ˆΘ (k) ˆΘ(k) ˆΘ(k) k(λ), (13) where the different components represent bocks of indices given by V (λ), = 1,..., k(λ). Then one can simpy sove the optimization probem (5) on the features within each bock separatey, making probem (5) feasibe for certain vaues of λ athough it may be impossibe to operate on the p p variabes Θ (1),..., Θ (K) on a singe machine. 7

8 3 Community Bayes The estimated precision matrices of SQDA are often bock diagona under suitabe ordering of the features, which impies that the features can be partitioned into severa communities and these communities are mutuay independent within each cass. In this section, we generaize this idea to non-gaussian data and other cassification modes, and refer to it as the community Bayes mode. A reated work in the regression setting is [10]. The authors propose to spit the asso probem into smaer ones by estimating the connected components of the sampe covariance matrix. Their approach invoves ony one covariance matrix and works specificay for the asso. 3.1 Main idea We et X R p denote the feature vector and G {1,..., K} denote the cass variabe. Suppose the feature set V = {1,..., p} admits a partition V = L =1 V such that X V1,..., X VL are mutuay independent conditiona on G = k for k = 1,..., K, where X V is a subset of X containing the features in community V. Then the posterior probabiity has the form og p(g = k X) = og p(x G = k) + og p(g = k) og p(x) (14) L = og p(x V G = k) + og p(g = k) og p(x) (15) = =1 L =1 og p(g = k X V ) + (1 L) og p(g = k) + C(X) (16) where C(X) = L =1 og p(x V ) og p(x) ony depends on X and serves as a normaization term. The equation (16) impies an interesting resut: we can fit the cassification mode on each community separatey, and combine the resutant posteriors into the posterior to the origina probem by simpy adjusting the intercept and normaizing it. This resut has three important consequences: 1. The goba probem competey separates into L smaer tractabe subprobems of the same form, making it possibe to sove an otherwise infeasibe arge-scae probem. Moreover, the moduar structure ends it naturay to parae computation. That is, one can sove these sub-probems independenty on separate machines. 2. This idea is quite genera and can be appied to any ikeihood-based cassifiers, incuding discriminant anaysis, mutinomia ogistic regression, generaized additive modes, cassification trees, etc. 3. When using a cassification mode with interaction terms, (16) doesn t invove interactions across different communities. Therefore, it has fewer degrees of freedom and thus smaer variance than the goba probem. 8

9 To find conditionay independent communities, we use the Spearman s rho, a robust nonparametric rank-based statistics, to directy estimate the unknown correation matrices, and then appy average, singe or compete inkage custering to the estimated correation matrices to get the estimated communities. In next section, we wi show that this procedure consistenty identify conditionay independent communities when the data are from a nonparanorma famiy. The community Bayes agorithm is summarized beow. Agorithm 1 The community Bayes agorithm 1: Compute ˆR (k), the estimated correation matrix based on the Spearman s rho for each cass k. 2: Perform average, singe or compete inkage custering with simiarity matrix R, where R ij = (n 1 ˆR(1) ij,..., n K ˆR (K) ij ) 2, and cut the dendrogram at eve τ to produce vertex-partition V = L =1 ˆV. 3: For each community = 1,..., L, estimate og p(g = k X ˆV ) using a cassification method of choice. 4: Pick τ and any other tuning parameters by cross-vaidation. 3.2 Community estimation In this section we address the key part of the community Bayes mode: how to find conditionay independent communities. We propose to derive the communities by appying average, singe or compete inkage custering to the correation matrices, which are estimated based on nonparametric rank-based statistics. In the case where data are from a nonparanorma famiy, we prove that given knowedge of L, this procedure consistenty identify conditionay independent communities. We assume X = (X 1,..., X p ) T G = k foows a nonparanorma distribution NP N(µ (k), Σ (k), f (k) ) [15] and Σ (k) is nonsinguar. That is, there exists a set of univariate stricty increasing transformations f (k) = {f (k) j such that } p j=1 Z (k) := f (k) (X) G = k N(µ (k), Σ (k) ), (17) where f (k) (X) = (f (k) 1 (X 1 ),..., f p (k) (X p )) T. Notice that the transformation functions f (k) can be different for different casses. To make the mode identifiabe, f (k) preserves the popuation mean and standard deviations: E(X j G = k) = E(f (k) j (X j ) G = k) = µ (k) j, Var(X j G = k) = Var(f (k) j (X j ) G = k) = σ (k) 2 j. Liu et a.[15] prove that the nonparanorma distribution is a Gaussian copua when the transformation functions are monotone and differentiabe. Let R (k) denote the correation matrix of the Gaussian distribution Z (k), and define { 1 if (R (1) ij E ij =,..., R(K) ij ) 0, i j; (18) 0 otherwise. 9

10 Suppose the graph G = (V, E) admits a decomposition into L connected components G = L =1 G. Then the vertex-partition V = L =1 V induced by this decomposition gives us exacty the conditionay independent communities we need. This is because G and G are disconnected Z (k) V Z (k) V, k X V X V G = k, k. (19) Let (x i, g i ) R p K, i = 1,..., n be the training data where x i = (x i1,..., x ip ) T. We estimate the correation matrices using Nonparanorma SKEPTIC [14], which expoits the Spearman s rho and Kenda s tau to directy estimate the unknown correation matrices. Since the estimated correation matrices based on the Spearman s rho and Kenda s tau have simiar theoretica performance, we ony adopt the ones based on the Spearman s rho here. In specific, et ˆρ (k) ij be the Spearman s rho between features i and j based on n k sampes in cass k, i.e., {x i g i = k, i = 1,..., n}. Then the estimated correation matrix for cass k is { ˆR (k) 2 sin( π ij = 6 ˆρ(k) ij ) i j; (20) 1 i = j. Notice that the graph defined by R (k) has the same vertex-partition as that defined by (R (k) ) 1. Inspired by the exact threshoding resut of SQDA, we define R ij = (n ˆR(1) 1 ij,..., n (K) K ˆR ij ) 2 and perform exact threshoding on the entries of R at a certain eve τ, where τ is estimated by cross-vaidation. The resutant vertex-partition yieds an estimate of the conditionay independent communities. Furthermore, there is an interesting connection to hierarchica custering. Specificay, the vertex-partition induced by threshoding matrix R corresponds to the subtrees from when we appy singe inkage aggomerative custering to R and then cut the dendrogram at eve τ [22]. Singe inkage custering tends to produce traiing custers in which individua features are merged one at a time. However, Theorem 2 shows that given knowedge of the true number of communities L, appication of singe, average or compete inkage aggomerative custering on R consistenty estimates the vertex-partition of G. Theorem 2. Assume that G has L connected components and min k n k 21 og p + 2. Define R 0 ij = (n 1R (1) ij,..., n KR (K) ij ) 2 and et min R 0 ij 16πK n k og p, for k = 1,..., K. (21) i,j V ;=1,...,L Then the estimated vertex-partition V = L =1 ˆV resuting from performing SLC, ALC, or CLC with simiarity matrix R satisfies P ( : ˆV V ) K p 2. Proof. The proof of this theorem appears in the Appendix. A sufficient condition for (21) is min i,j V ;=1,...,L (R(1) ij,..., R(K) ij 10 ) 2 16πK og p n max n max n min, (22)

11 where n min = min k n k and n max = max k n k. Therefore, Theorem 2 estabishes the consistency of identification of conditionay independent communities by performing hierarchica custering using SLC, ALC or CLC, provided that n max = Ω(og p), n max /n min = O(1) as n k, p, and provided that no within-community eement of R (k) is too sma in absoute vaue for a k. 4 Exampes 4.1 Sparse quadratic discriminant anaysis In this section we study the performance of SQDA in severa simuated exampes and a rea data exampe. The resuts show that SQDA achieves a ower miscassification error and better interpretabiity compared to naive bayes, QDA and a variant of RDA. Since RDA proposed by Friedman [7] has two tuning parameters, resuting in an unfair comparison, we use a version of RDA that shrinks ˆΣ (k) towards its diagona: ˆΣ (k) (λ) = (1 λ)s (k) + λdiag(s (k) ), with λ [0, 1] (23) Note that λ = 0 corresponds to QDA and λ = 1 corresponds to the naive Bayes cassifier. For any λ < 1, the estimated covariance matrices are dense. We refer to this version of RDA as diagona reguarized discriminant anaysis (DRDA) in the rest of this paper. We report the test miscassification errors and its corresponding standardized tuning parameters of these four cassifiers. Let P (Θ) = i j (θ(1) ij,..., θ(k) ij ) 2. The standardized tuning parameter is defined as s = P ( ˆΘ(λ))/P ( ˆΘ(0)), with s = 0 corresponding to the naive Bayes cassifier and s = 1 corresponding to QDA Simuated exampes The data generated in each experiment consists of a training set, a vaidation set to tune the parameters, and a test set to evauate the performance of our chosen mode. Foowing the notation of [28], we denote././. the number of observations in the training, vaidation and test sets respectivey. For every data set, the tuning parameter minimizing the vaidation miscassification error is chosen to compute the test miscassification error. Each experiment was repicated 50 times. In a cases the popuation cass conditiona distributions were norma, the number of casses was K = 2, and the prior probabiity of each cass was taken to be equa. The mean for cass k was taken to have Eucidean norm tr(σ (k) )/p and the two means were orthogona to each other. We consider three different modes for precision matrices and in a cases the K precision matrices are from the same mode: Mode 1. Fu mode: θ (k) ij Mode 2. Decreasing mode: θ (k) ij = 1 if i = j and θ (k) ij = ρ i j k. 11 = ρ k otherwise.

12 Mode 3. Bock diagona mode with bock size q: θ (k) ij = 1 if i = j, θ (k) ij = ρ k if i j and i, j q and θ (k) ij = 0 otherwise, where 0 < q < p. Tabe 2 and 3, summarizing the resuts for each situation, present the average test miscassification errors over the 50 repications. The quantities in parentheses are the standard deviations of the respective quantities over the 50 repications. Exampe 1: Dense interaction terms We study the performance of SQDA on Mode 1 and Mode 2 with p = 8, ρ 1 = 0, and ρ 2 = 0.8. We generated 50/50/10000 observations for each cass. Tabe 2 summarizes the resuts. In this exampe, both situations have fu interaction terms and shoud favor DRDA. The resuts show that SQDA and DRDA have simiar performance, and both of them give ower miscassification errors and smaer standard deviations than QDA or naive Bayes. The mode-seection procedure behaves quite reasonaby, choosing arge vaues of the standardized tuning parameter s for both SQDA and DRDA. Tabe 2: Miscassification errors and seected tuning parameters for simuated exampe 1. The vaues are averages over 50 repications, with the standard errors in parentheses. Miscassification error SQDA DRDA QDA Naive Bayes Average standardized tuning parameter SQDA DRDA Fu 0.129(0.021) 0.129(0.020) 0.238(0.028) 0.179(0.026) 0.828(0.258) 0.835(0.302) Decreasing 0.103(0.011) 0.104(0.013) 0.170(0.031) 0.153(0.032) 0.817(0.237) 0.781(0.314) Exampe 2: Sparse interaction terms We study the performance of SQDA on Mode 3 with ρ 1 = 0, ρ 2 = 0.8 and q = 4. We performed experiments for (p, n) = (8, 50), (20, 200), (40, 800), (100, 1500), and for each p we generated n/n/10000 observations for each cass. Tabe 3 summarizes the resuts. In this exampe, a situations have sparse interaction terms and shoud favor SQDA. Moreover, the sparsity eve increases as p increases. As conjectured, SQDA strongy dominates with ower miscassification errors at a dimensionaities. The standardized tuning parameter vaues for DRDA are uniformy arger 12

13 Tabe 3: Miscassification errors and seected tuning parameters for simuated exampe 2. The vaues are averages over 50 repications, with the standard errors in parentheses. Miscassification error SQDA DRDA QDA Naive Bayes Average standardized tuning parameter SQDA DRDA p = 8, n = (0.010) 0.088(0.013) 0.092(0.011) 0.107(0.014) 0.529(0.309) 0.688(0.314) p = 20, n = (0.007) 0.109(0.009) 0.114(0.006) 0.121(0.004) 0.190(0.046) 0.293(0.345) p = 40, n = (0.006) 0.108(0.007) 0.127(0.002) 0.114(0.004) 0.109(0.015) 0.194(0.230) p = 100, n = (0.007) 0.188(0.009) 0.230(0.025) 0.215(0.035) 0.024(0.004) 0.073(0.059) than those for SQDA. This is expected because in order to capture the same amount of interactions DRDA needs to incude more noises than SQDA Rea data exampe: vowe recognition data This data consists of training and test data with 10 predictors and 11 casses. We obtained the data from the benchmark coection maintained by Scott Fahman at Carnegie Meon University. The data was contributed by Anthony Robinson [20]. The casses correspond to 11 vowe sounds, each contained in 11 different words. Here are the words, preceded by the symbos that represent them: Vowe Word Vowe Word Vowe Word i heed a: hard U hood I hid Y hud u: who d E head O hod 3: heard A had C: hoard The word was uttered once by each of the 15 speakers. Four mae and four femae speakers were used to train the modes, and the other four mae and three femae speakers were used for testing the performance. This paragraph is technica and describes how the anaog speech signas were transformed into a 10-dimensiona feature vector. The speech signas were owpass fitered at 4.7kHz and then digitized to 12 bits with a 10kHz samping rate. Twefth-order inear predictive anaysis was carried out on six 512-sampe Hamming windowed segments from the steady part of the vowe. The refection coefficients were used to cacuate 10 og-area parameters, giving a 10-dimensiona input space. Each speaker thus yieded six frames of speech from 11 vowes. This gave 528 frames from the eight speakers used to train the modes and 462 frames from the seven speakers used to test the modes. 13

14 We impement our method on a subset of data, which consists of four vowes "Y", "O", "U" and "u:". These four vowes are seected because their sampe precision matrices are approximatey sparse, and SQDA usuay performs we on such dataset. Figure 4 dispays the standardized sampe precision matrices, and the standardized SQDA and DRDA estimates. In addition to DRDA, QDA and naive Bayes, we aso compare our method with three other cassifiers, namey the Support Vector Machine (SVM), k-nearest neighborhood (knn) and Random Forest (RF), in terms of miscassification rate. The SVM, knn, and Random Forest were impemented by the R packages of "e1071", "cass" and "randomforest" with defaut settings, respectivey. The resuts of these cassification procedures are shown in Tabe 4. Figure 4: Heat maps of the sampe precision matrices and the estimated precision matrices of SQDA and DRDA on vowe data. Estimates are standardized to have unit diagona. 14

15 Tabe 4: Vowe speech cassification resuts. Tuning parameters are seected by 5-fod cross vaidation. SQDA DRDA QDA Naive Bayes SVM knn Random Forest Test error CV error The SQDA performs significanty better than the other six cassifiers. Compared with the second winner DRDA, the SQDA has a reative gain of (19.7% %)/19.7% = 12.7%. 4.2 Community Bayes In this section we study the performance of the community Bayes mode using ogistic regression as an exampe. We refer to the cassifier that combines the community Bayes agorithm with ogistic regression as community ogistic regression. We compare the performance of ogistic regression (LR) and community ogistic regression (CLR) in severa simuated exampes and a rea data exampe, and the resuts show that CLR has better accuracy and smaer variance in predictions Simuated exampes The data generated in each experiment consists of a training set, a vaidation set to tune the parameters, and a test set to evauate the performance of our chosen mode. Each experiment was repicated 50 times. In a cases the distributions of Z (k) were norma, the number of casses was K = 3, and the prior probabiity of each cass was taken to be equa. The mean for cass k was taken to have Eucidean norm tr(σ (k) )/(p/2) and the three means were orthogona to each other. We consider three modes for covariance matrices and in a cases the K covariance matrices are the same: Mode 1. Fu mode: Σ ij = 1 if i = j and Σ ij = ρ otherwise. Mode 2. Decreasing mode: Σ ij = ρ i j. Mode 3. Bock diagona mode with q bocks: Σ = diag(σ 1,..., Σ q ). Σ i is of size p q p q or ( p q + 1) ( p q + 1) and is from Mode 1. For simpicity, the transformation functions for a dimensions were the same f (k) 1 =... = f p (k) = f (k). Define g (k) = (f (k) ) 1. In addition to the identity transformation, two different transformations g (k) were empoyed as in [15]: the symmetric power transformation and the Gaussian CDF transformation. See [15] for the definitions. We compare the performance of LR and CLR on the above modes with p = 16, ρ = 0.5 and q = 2, 4, 8. We generated 20/20/10000 observations for 15

16 each cass and use average inkage custering to estimate the communities. Two exampes were studied: Exampe 1: The transformations g (k) for a three casses are the identity transformation. The popuation cass conditiona distributions are norma with different means and common covariance matrix. The ogits are inear in this exampe. Exampe 2: The transformations g (k) for cass 1,2 and 3 are the identity transformation, the symmetric power transformation with α = 3, and the Gaussian CDF transformation with µ g0 = 0 and σ g0 = 1 respectivey. α, µ g0 and σ g0 are defined in [15]. The power and CDF transformations map a univariate norma distribution into a highy skewed and a bi-moda distribution respectivey. The ogits are noninear in this exampe. Tabe 5 summarizes the test miscassification errors of these two methods and the estimated numbers of communities. CLR performs we in a cases with ower test errors and smaer standard errors than LR, incuding the ones where the covariance matrices don t have a bock structure. The community Bayes agorithm introduces a more significant improvement when the cass conditiona distributions are not norma. Tabe 5: Miscassification errors and estimated numbers of communities for the two simuated exampes. The vaues are averages over 50 repications, with the standard errors in parentheses. Exampe 1 CLR test error LR test error Number of communities Exampe 2 CLR test error LR test error Number of communites Fu Decreasing Bock q = 2 q = 4 q = (0.041) 0.274(0.040) 5.220(4.292) 0.221(0.049) 0.255(0.061) 7.480(4.253) 0.221(0.032) 0.286(0.036) 6.040(4.262) 0.227(0.040) 0.282(0.062) 7.320(3.977) 0.227(0.033) 0.271(0.039) 4.700(3.882) 0.208(0.041) 0.263(0.062) 6.940(4.533) 0.203(0.035) 0.278(0.041) 5.320(3.395) 0.203(0.036) 0.266(0.063) 6.220(3.627) 0.262(0.030) 0.325(0.036) 7.780(3.935) 0.239(0.031) 0.309(0.057) 9.160(3.782) Rea data exampe: emai spam The data for this exampe consists of information from 4601 emai messages, in a study to screen emai for spam. The true outcome emai or spam is avaiabe, aong with the reative frequencies of 57 of the most commony occurring words and punctuation marks in the emai message. Since most of the spam predictors 16

17 have a very ong-taied distribution, we og-transformed each variabe (actuay og(x + 0.1)) before fitting the modes. We compare CLR with LR and four other commony used cassifiers, namey SVM, knn, Random Forest and Boosting. Same as in section 4.1.2, the SVM, knn, Random Forest and Boosting were impemented by the R packages of "e1071", "cass", "randomforest" and "ada" with defaut settings, respectivey. The range of the number of communities for CLR were fixed to be between 1 and 20, and the optima number of communities were seected by 5-fod cross vaidation. We randomy chose 1000 sampes from this data, and spit them into equay sized training and test sets. We repeated the random samping and spitting 20 times. The mean miscassification percentage of each method is isted in Tabe 6 with standard error in parenthesis. Tabe 6: Emai spam cassification resuts. The vaues are test miscassification errors averaged over 20 repications, with standard errors in parentheses. CLR LR SVM knn Random Forest Boosting 0.068(0.019) 0.087(0.026) 0.070(0.011) 0.106(0.017) 0.071(0.010) 0.075(0.012) The mean test error rate for LR is 8.7%. By comparison, CLR has a mean test error rate of 6.8%, yieding a 21.8% improvement. Figure 5 shows the test error and the cross-vaidation error of CLR over the range of the number of communities for one repication. It corresponds to LR when the number of communities is 1. The estimated number of communities in this repication is 6, and these communities are isted beow. Most features in community 1 are negativey correated with spam, whie most features in community 2 are positivey correated. hp, hp, george, ab, abs, tenet, technoogy, direct, origina, pm, cs, re, edu, conference, 650, 857, 415, 85, 1999, ch;, ch(, ch[, CAPAVE, CAPMAX, CAPTOT. over, remove, internet, order, free, money, credit, business, emai, mai, receive, wi, peope, report, address, addresses, make, a, our, you, your, 000, ch!, ch$. font, ch#. data, project. parts, meeting, tabe. 3d. 5 Concusions In this paper we have proposed sparse quadratic discriminant anaysis, a cassifier ranging between QDA and naive Bayes through a path of sparse graphica 17

18 Figure 5: The 5-fod cross-vaidation errors (bue) and the test errors (red) of CLR on the spam data. modes. By aowing interaction terms into the mode, this cassifier reaxes the strict and often unreasonabe assumption of naive Bayes. Moreover, the resuting estimates of interactions are sparse and easier to interpret compared to other existing cassifiers that reguarize QDA. Motivated by the connection between the estimated precision matrices and singe inkage custering with the sampe covariance matrices, we present the community Bayes mode, a simpe procedure appicabe to non-gaussian data and any ikeihood-based cassifiers. By expoiting the bock-diagona structure of the estimated correation matrices, we reduce the probem into severa subprobems that can be soved independenty. Simuated and rea data exampes show that the community Bayes mode can improve accuracy and reduce variance in predictions. A Proofs A.1 Proof of Theorem 1 Proof. The standard KKT conditions [2] of the optimization probem (5) give: ( ( ) ) 1 n ˆΘ(k) k + S (k) + λγ (k) = 0, for k = 1,..., K (24) 18

19 where Γ (k) is a p p matrix with diagona eements being zero: 0 γ (k) 12 γ (k) 1p Γ (k) = Denote γ ij = (γ (1) ij,..., γ(k) ˆθ ij 2 for i j, and ( Let Ŵ(k) = ( ˆΘ(k)) 1. Then ( n 1 (Ŵ(1) n 1 (Ŵ(1) ij S (1) ij γ (k) 21 0 γ (k) 2p γ (k) p1 ij ) and ˆθ ij = (ˆθ (1) ij γ (k) p2 0 { B2 (1) if θ ij = 0; θ ij 2 = otherwise. ),..., n K(Ŵ(K) ij θ ij θ ij 2 ) S (K) ij ). (K),..., ˆθ ij ). The subgradient γ ij = λ ˆθ ij, if i j and ˆθ ij (25) 0; ˆθ ij 2 ij S (1) ij ),..., nk(ŵ(k) ij S (K) 2 ij )) λ, if i j and ˆθ ij = 0; (26) (Ŵ(1) ) ( ) ii,..., Ŵ(K) ii = S (1) ii,..., S(K) ii. (27) There exists an ordering of the vertices {1,..., p} such that the edge-matrix of the threshoded covariance graph is bock-diagona. For notationa convenience, we wi assume that the matrix is aready in this order: E (λ) 1 0 E (λ) =....., (28) 0 E (λ) k(λ) where E (λ) represents the bock of indices given by V (λ), = 1,..., k(λ). We wi construct Ŵ(1),..., Ŵ(K) having the same structure as E (λ) which is a soution to the optimization probem (5). Note that if Ŵ(k) is bock diagona then so is its inverse. Let Ŵ(k) and its inverse ˆΘ (k) be given by: Ŵ (k) = Ŵ (k) Ŵ (k) k(λ), ˆΘ(k) = , for a k. 0 ˆΘ(k) k(λ) (29) ˆΘ (k) 19

20 Define Ŵ(k) or equivaenty ˆΘ = argmin Θ (k) 0 K k=1 ˆΘ (k) = n k ( ogdet Θ (k) (Ŵ(k) ) 1 via the foowing sub-probems ) tr(s (k) Θ (k) ) + λ i,j V (λ) i j θ ij 2 (30) is a sub-bock of S (k) for = 1,..., k(λ), where ˆΘ (1) (K) = ( ˆΘ,..., ˆΘ ), and S (k) with row/coumn indices from V (λ) V (λ) (1) (K). Thus for every, ( ˆΘ,..., ˆΘ ) satisfies the KKT conditions corresponding to the th bock of the p p dimensiona probem. By construction of the threshoded sampe covariance graph, if i V (λ) and j V (λ) with, then S (k) ij λ. Hence, the choice ˆΘ ij = Ŵ(k) ij = 0, k = 1,..., K satisfies the KKT conditions (26) for a the off-diagona entries in the bock-matrix (29). Hence ( ˆΘ (1),..., ˆΘ (K) ) soves the optimization probem (5). (λ) This shows that the vertex-partition { ˆV } 1 κ(λ) obtained from the estimated precision graph is a finer resoution of that obtained from the threshoded covariance graph, i.e., for every {1,..., k(λ)}, there is a {1,..., κ(λ)} such (λ) that ˆV V (λ). In particuar k(λ) κ(λ). Conversey, if ( ˆΘ (1),..., ˆΘ (K) ) admits the decomposition as in the statement (λ) (λ) of the theorem, then it foows from (26) that: for i ˆV and j ˆV with, we have S ij 2 λ since Ŵ(k) ij = 0 for k = 1,..., K. Thus the vertexpartition induced by the connected components of the threshoded covariance graph is a finer resoution of that induced by the connected components of the estimated precision graph. In particuar this impies that k(λ) κ(λ). Combining the above two we concude k(λ) = κ(λ) and aso the equaity (12). Since the abeing of the connected components is not unique, there is a permutation π in the theorem. Note that the connected components of the threshoded sampe covariance graph G (λ) are nested within the connected components of G (λ ). In particuar, the vertex-partition of the threshoded sampe covariance graph at λ is aso nested within that of the threshoded sampe covariance graph at λ. We have aready proved that the vertex-partition induced by the connected components of the estimated precision graph and the threshoded sampe covariance graph are equa. Using this resut, we concude that the vertex-partition induced by the components of the estimated precision graph at λ, given by { } 1 κ(λ) is contained inside the vertex-partition induced by the components of the estimated precision graph at λ, given by { ˆV (λ ) } 1 κ(λ ). ˆV (λ) 20

21 A.2 Proof of Theorem 2 Proof. Let a = min i,j V ;=1,...,L R 0 ij. a 2 } {ˆV = V, } [22]. Then P (, ˆV ) V k=1 First note that {max ij R ij R 0 ij < ( P max R ij R 0 ij a ) ij 2 ( K ) P max n ˆR(k) k ij ij R(k) ij a 2 k=1 K ( ) P max n ˆR(k) k ij ij R(k) ij a 2K K ( P max ij k=1 ˆR (k) ij R (k) ij a ) 2n k K The second inequaity is because a a2 m b b2 m a 1 b a m b m. Since a 16πK n k og p, we have P (max ˆR(k) ij ij R (k) ij a 2n k K ) 1 p by 2 Theorem 4.1 of [14]. Therefore P (, ˆV V ) K p. 2 References [1] Peter J Bicke and Eizaveta Levina. Some theory for fisher s inear discriminant function, naive bayes, and some aternatives when there are many more variabes than observations. Bernoui, pages , [2] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, [3] Peter Bühmann, Phiipp Rütimann, Sara van de Geer, and Cun-Hui Zhang. Correated variabes in regression: custering and sparse estimation. Journa of Statistica Panning and Inference, 143(11): , [4] Line Cemmensen, Trevor Hastie, Daniea Witten, and Bjarne Ersbø. Sparse discriminant anaysis. Technometrics, 53(4), [5] Patrick Danaher, Pei Wang, and Daniea M Witten. The joint graphica asso for inverse covariance estimation across mutipe casses. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), [6] Sandrine Dudoit, Jane Fridyand, and Terence P Speed. Comparison of discrimination methods for the cassification of tumors using gene expression data. Journa of the American statistica association, 97(457):77 87,

22 [7] Jerome H Friedman. Reguarized discriminant anaysis. Journa of the American statistica association, 84(405): , [8] Yaqian Guo, Trevor Hastie, and Robert Tibshirani. Reguarized inear discriminant anaysis and its appication in microarrays. Biostatistics, 8(1):86 100, [9] Trevor Hastie, Andreas Buja, and Robert Tibshirani. Penaized discriminant anaysis. The Annas of Statistics, pages , [10] Nadine Hussami and Robert Tibshirani. A component asso. arxiv preprint arxiv: , [11] WJ Krzanowski, Phiip Jonathan, WV McCarthy, and MR Thomas. Discriminant anaysis with singuar covariance matrices: methods and appications to spectroscopic data. Appied statistics, pages , [12] B Boser Le Cun, John S Denker, D Henderson, Richard E Howard, W Hubbard, and Lawrence D Jacke. Handwritten digit recognition with a backpropagation network. In Advances in neura information processing systems. Citeseer, [13] Quefeng Li and Jun Shao. Sparse quadratic discriminant anaysis for high dimensiona data. Statist. Sinica, [14] Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. The nonparanorma skeptic. arxiv preprint arxiv: , [15] Han Liu, John Lafferty, and Larry Wasserman. The nonparanorma: Semiparametric estimation of high dimensiona undirected graphs. The Journa of Machine Learning Research, 10: , [16] Rahu Mazumder and Trevor Hastie. Exact covariance threshoding into connected components for arge-scae graphica asso. The Journa of Machine Learning Research, 13: , [17] Finbarr O Suivan. A statistica perspective on i-posed inverse probems. Statistica science, pages , [18] Mee Young Park, Trevor Hastie, and Robert Tibshirani. Averaged gene expressions for regression. Biostatistics, 8(2): , [19] Bradey S Price, Chares J Geyer, and Adam J Rothman. Ridge fusion in statistica earning. arxiv preprint arxiv: , [20] Anthony John Robinson. Dynamic error propagation networks. PhD thesis, University of Cambridge, [21] Jiehuan Sun and Hongyu Zhao. The appication of sparse estimation of covariance matrix to quadratic discriminant anaysis. BMC bioinformatics, 16(1):48,

23 [22] Kean Ming Tan, Daniea Witten, and Ai Shojaie. The custer graphica asso for improved estimation of gaussian graphica modes. arxiv preprint arxiv: , [23] Robert Tibshirani, Trevor Hastie, Baasubramanian Narasimhan, and Gibert Chu. Diagnosis of mutipe cancer types by shrunken centroids of gene expression. Proceedings of the Nationa Academy of Sciences, 99(10): , [24] DM Titterington. Common structure of smoothing techniques in statistics. Internationa Statistica Review/Revue Internationae de Statistique, pages , [25] Daniea M Witten, Jerome H Friedman, and Noah Simon. New insights and faster computations for the graphica asso. Journa of Computationa and Graphica Statistics, 20(4): , [26] Ping Xu, Guy N Brock, and Rudoph S Parrish. Modified inear discriminant anaysis approaches for cassification of high-dimensiona microarray data. Computationa Statistics & Data Anaysis, 53(5): , [27] Ming Yuan and Yi Lin. Mode seection and estimation in regression with grouped variabes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 68(1):49 67, [28] Hui Zou and Trevor Hastie. Reguarization and variabe seection via the eastic net. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 67(2): ,

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso Journa of Machine Learning Research 13 (2012) 723-736 Submitted 8/11; Revised 1/12; Pubished 3/12 Exact Covariance Threshoding into Connected Components for Large-Scae Graphica Lasso Rahu Mazumder Trevor

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Sparse canonical correlation analysis from a predictive point of view

Sparse canonical correlation analysis from a predictive point of view Sparse canonica correation anaysis from a predictive point of view arxiv:1501.01231v1 [stat.me] 6 Jan 2015 Ines Wims Facuty of Economics and Business, KU Leuven and Christophe Croux Facuty of Economics

More information

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS

FORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees

Improving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic

More information

FUSED MULTIPLE GRAPHICAL LASSO

FUSED MULTIPLE GRAPHICAL LASSO FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused

More information

TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL

TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Statistica Sinica 22 (2012), 1123-1146 doi:http://dx.doi.org/10.5705/ss.2009.210 TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Xin Gao, Danie Q. Pu, Yuehua

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Evolutionary Product-Unit Neural Networks for Classification 1

Evolutionary Product-Unit Neural Networks for Classification 1 Evoutionary Product-Unit Neura Networs for Cassification F.. Martínez-Estudio, C. Hervás-Martínez, P. A. Gutiérrez Peña A. C. Martínez-Estudio and S. Ventura-Soto Department of Management and Quantitative

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

An Information Geometrical View of Stationary Subspace Analysis

An Information Geometrical View of Stationary Subspace Analysis An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

A Separability Index for Distance-based Clustering and Classification Algorithms

A Separability Index for Distance-based Clustering and Classification Algorithms IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne

PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne 17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

Technical Appendix for Voting, Speechmaking, and the Dimensions of Conflict in the US Senate

Technical Appendix for Voting, Speechmaking, and the Dimensions of Conflict in the US Senate Technica Appendix for Voting, Speechmaking, and the Dimensions of Confict in the US Senate In Song Kim John Londregan Marc Ratkovic Juy 6, 205 Abstract We incude here severa technica appendices. First,

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Emmanuel Abbe Colin Sandon

Emmanuel Abbe Colin Sandon Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

Graphical lassos for meta-elliptical distributions

Graphical lassos for meta-elliptical distributions The Canadian Journa of Statistics Vo. 42, No. 2, 2014, Pages 185 203 La revue canadienne de statistique 185 Graphica assos for meta-eiptica distributions Martin BILODEAU* Department of Mathematics and

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn

Automobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn Automobie Prices in Market Equiibrium Berry, Pakes and Levinsohn Empirica Anaysis of demand and suppy in a differentiated products market: equiibrium in the U.S. automobie market. Oigopoistic Differentiated

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

A Separability Index for Distance-based Clustering and Classification Algorithms

A Separability Index for Distance-based Clustering and Classification Algorithms A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna D. Peterson 1 Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA.

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

arxiv: v2 [cs.lg] 4 Sep 2014

arxiv: v2 [cs.lg] 4 Sep 2014 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2

More information

Wavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising

Wavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising Waveet Methods for Time Series Anaysis Part VIII: Waveet-Based Signa Extraction and Denoising overview of key ideas behind waveet-based approach description of four basic modes for signa estimation discussion

More information

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li) Onine Appendices for The Economics of Nationaism Xiaohuan Lan and Ben Li) A. Derivation of inequaities 9) and 10) Consider Home without oss of generaity. Denote gobaized and ungobaized by g and ng, respectivey.

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem Substructuring Preconditioners for the Bidomain Extraceuar Potentia Probem Mico Pennacchio 1 and Vaeria Simoncini 2,1 1 IMATI - CNR, via Ferrata, 1, 27100 Pavia, Itay mico@imaticnrit 2 Dipartimento di

More information

Testing for the Existence of Clusters

Testing for the Existence of Clusters Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

Two-sample inference for normal mean vectors based on monotone missing data

Two-sample inference for normal mean vectors based on monotone missing data Journa of Mutivariate Anaysis 97 (006 6 76 wwweseviercom/ocate/jmva Two-sampe inference for norma mean vectors based on monotone missing data Jianqi Yu a, K Krishnamoorthy a,, Maruthy K Pannaa b a Department

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels

Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channels Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

A Robust Voice Activity Detection based on Noise Eigenspace Projection

A Robust Voice Activity Detection based on Noise Eigenspace Projection A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi

More information