arxiv: v2 [stat.ml] 19 Oct 2016
|
|
- Aleesha Norris
- 6 years ago
- Views:
Transcription
1 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv: v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University Abstract Trevor Hastie Department of Statistics Stanford University We deveop a cass of rues spanning the range between quadratic discriminant anaysis and naive Bayes, through a path of sparse graphica modes. A group asso penaty is used to introduce shrinkage and encourage a simiar pattern of sparsity across precision matrices. It gives sparse estimates of interactions and produces interpretabe modes. Inspired by the connected-components structure of the estimated precision matrices, we propose the community Bayes mode, which partitions features into severa conditiona independent communities and spits the cassification probem into separate smaer ones. The community Bayes idea is quite genera and can be appied to non-gaussian data and ikeihood-based cassifiers. 1 Introduction In the generic cassification probem, the outcome of interest G fas into K unordered casses, which for convenience we denote by {1, 2,..., K}. Our goa is to buid a rue for predicting the cass abe of an item based on p measurements of features X R p. The training set consists of the cass abes and features for n items. This is an important practica probem with appications in many fieds. Quadratic discriminant anaysis (QDA) can be derived as the maximum ikeihood method for norma popuations with different means and different covariance matrices. It is a favored too when there are some strong interactions and inear boundaries cannot separate the casses. However, QDA is poory posed when the sampe size n k is not consideraby arger than p for any cass k, and ceary i-posed if n k p. Therefore it encourages us to empoy a reguarization method [17, 24]. Athough there aready exist a number of proposas to reguarize inear discriminant anaysis (LDA) [1, 4, 6, 8, 9, 11, 23, 26], few methods have been deveoped to reguarize QDA. Friedman [7] suggests appying a ridge penaty to within-cass covariance matrices, and Price et a. [19] propose to add a ridge penaty and a ridge fusion penaty to the og-ikeihood. 1
2 For both methods, no eements of the resuting within-cass precision matrices wi be zero, eading to dense interaction terms in the discriminant functions and difficuties in interpretation. Assuming sparse conditions on both unknown means and covariance matrices, Li and Shao [13] propose to construct mean and covariance estimators by threshoding. Sun and Zhao [21] assume bock-diagona structure for covariance matrices and estimate each bock by 1 reguarization. The bocks are assumed to be of the same size, and are formed based on two sampe t statistics. However, both methods can ony appy to two-cass cassification. Another approach is to assume conditiona independence of the features (naive Bayes). Naive Bayes cassifiers often outperform far more sophisticated aternatives when the dimension p of the feature space is high, but the conditiona independence assumption may be too rigid in the presence of strong interactions. In this paper we deveop a cass of rues spanning the range between QDA and naive Bayes using a group asso penaty [27]. A group asso penaty is appied to the (i, j)th eement across a K within-cass precision matrices, which forces the zeros in the K estimated precision matrices to occur in the same paces. This shared pattern of sparsity resuts in a sparse estimate of interaction terms, making the cassifier easy to interpret. We refer to this cassification method as Sparse Quadratic Discriminant Anaysis (SQDA). The connected components in the resuting precision matrices correspond exacty to those obtained from simpe threshoding rues on the within-cass sampe covariance matrices [5]. This suggests us to partition features into severa independent communities before estimating the parameters [3, 10, 18]. Therefore, we propose the community Bayes mode, a simpe idea to identify and make use of conditiona independent communities of features. Furthermore, it is a genera approach appicabe to non-gaussian data and any ikeihoodbased cassifiers. Specificay, the community Bayes works by partitioning features into severa communities, soving separate cassification probems for each community, and combining the community-wise resuts into one fina prediction. We show that this approach can improve the accuracy and interpretabiity of the corresponding cassifier. The paper is organized as foows. In section 2 we discuss sparse quadratic discriminant anaysis and the sampe covariance threshoding rues. In section 3 we discuss the community Bayes idea. Simuated and rea data exampes appear in Section 4, concuding remarks in Section 5 and the proofs are gathered in the Appendix. 2 Sparse Quadratic Discriminant Anaysis Suppose we have training data (x i, g i ) R p K, i = 1, 2,..., n, where x i s are observations with measurements on a set of p features and g i s are cass abes. We assume that the n k observations within the kth cass are identicay distributed as N(µ k, Σ k ), and the n = k n k observations are independent. We 2
3 et π k denote the prior for cass k. Then the quadratic discriminant function is δ k (x) = 1 2 og det Σ 1 k 1 2 (x µ k) T Σ 1 k (x µ k) + og π k. (1) Let Θ (k) = Σ 1 k denote the true precision matrix for cass k, and θ ij = (θ (1) ij,..., θ(k) ij ) denote the vector of the (i, j)th eement across a K precision matrices. We propose to estimate µ, π, Θ by maximizing the penaized og ikeihood K n k 2 og det Θ(k) 1 (x i µ k ) T Θ (k) (x i µ k ) + n k og π k λ θ ij k=1 g i=k (2) The ast term is a group asso penaty, appied to the (i, j)th eement across a K precision matrices, which forces a simiar pattern of sparsity across a the precision matrices. Hence, we refer to this cassification method as Sparse Quadratic Discriminant Anaysis (SQDA). Let Θ = (Θ (1),..., Θ (K) ). Soving (2) gives us the estimates ˆµ k = 1 x i (3) n k ˆπ k = n k n g i=k ˆΘ =argmax Θ (k) 0 K k=1 i j (4) ) n k (og det Θ (k) tr(s (k) Θ (k) ) λ θ ij 2, (5) i j where S (k) = 1 n k g i=k (x i ˆµ k )(x i ˆµ k ) T is the sampe covariance matrix for cass k. This mode has severa important features: 1. λ is a nonnegative tuning parameter controing the simutaneous shrinkage of a precision matrices toward diagona matrices. The vaue λ = 0 gives rise to QDA, whereas λ = yieds the naive Bayes cassifier. Vaues between these imits represent degrees of reguarization ess severe than naive Bayes. Since it is often the case that even sma amounts of reguarization can argey eiminate quite drastic instabiity, smaer vaues of λ (smaer than ) have the potentia of superior performance when some features have strong partia correations. 2. The group asso penaty forces ˆΘ (1),..., ˆΘ (K) to share the ocations of the nonzero eements, eading to a sparse estimate of interactions. Therefore, our method produces more interpretabe resuts. 3. The optimization probem (5) can be separated into independent subprobems of the same form by simpe screening rues on S (1),..., S (K). This eads to a potentiay massive reduction in computationa compexity. 3
4 The convex optimization probem (5) can be quicky soved by R package JGL [5]. Not for the purpose of cassification as here, Danaher et a. [5] propose to jointy estimate mutipe reated Gaussian graphica modes by maximizing the group graphica asso max Θ (k) 0 K ) n k (og detθ (k) tr(s (k) Θ (k) ) λ 1 θ ij 1 λ 2 θ ij 2. k=1 (6) They use the additiona asso penaty to further encourage sparsity within (θ (1) ij,..., θ(k) ij ). However, for the purpose of cassification, our goa is to identify interaction terms that is, we are interested in whether (θ (1) ij,..., θ(k) ij ) is a zero vector instead of whether each eement θ (k) ij is zero. Therefore, we ony use the group asso penaty to estimate precision matrices. We iustrate our method with a handwritten digit recognition exampe. Here we focus on the sub-task of distinguishing handwritten 3s and 8s. We use the same data as Le Cun et a. [12], who normaized binary images for size and orientation, resuting in 8-bit, gray scae images. There are 658 threes and 542 eights in our training set, and 166 test sampes for each. Figure 1 shows a random seection of 3s and 8s. i j i j Figure 1: Exampes of digitized handwritten 3s and 8s. Each image is a 8 bit, gray scae version of the origina binary image. Because of their spatia arrangement the features are highy correated and some kind of smoothing or fitering aways heps. We fitered the data by repacing each non-overapping 2 2 pixe bock with its average. This reduces the dimension of the feature space from 256 to 64. We compare our method with QDA, naive Bayes and a variant of RDA which we refer to as diagona reguarized discriminant anaysis (DRDA). The estimated covariance matrix of DRDA for cass k is ˆΣ (k) (λ) = (1 λ)s (k) + λdiag(s (k) ), with λ [0, 1]. (7) The tuning parameters are seected by 5-fod cross vaidation. Tabe 1 shows the miscassification errors of each method and the corresponding standardized tuning parameters s = P ( ˆΘ(λ))/P ( ˆΘ(0)), where P (Θ) = Σ i j θ (1) ij,..., θ(k) ij 2. The resuts show that SQDA outperforms DRDA, QDA and naive Bayes. In 4
5 Figure 2, we see that the test error of SQDA decreases dramaticay as the mode deviates from naive Bayes and achieves its minimum at a sma vaue of s, whie the test error of DRDA keeps decreasing as the mode ranges from naive Bayes to QDA. This is expected since at sma vaues of s, SQDA ony incudes important interaction terms and shrinks other noisy terms to 0, but DRDA has a interactions in the mode once deviating from naive Bayes. To better dispay the estimated precision matrices, we standardize the estimates to have unit diagona (the standardized precision matrix is equa to the partia correation matrix up to the sign of off-diagona entries). Figure 3 dispays the standardized sampe precision matrices, and the standardized SQDA and DRDA estimates. From Figure 3, we can see that SQDA ony incudes interactions within a diagona band. Tabe 1: Digit cassification resuts of 3s and 8s. Tuning parameters are seected by 5-fod cross vaidation. SQDA DRDA QDA Naive Bayes Test error Training error CV error Standardized tuning parameter Exact covariance threshoding into connected components Mazumder & Hastie [16] and Witten et a. [25] estabish a connection between the graphica asso and connected components. Specificay, the connected components in the estimated precision matrix correspond exacty to those obtained from threshoding the entries of the sampe covariance matrix at λ. For the soution to (5), we have simiar resuts. By simpe threshoding rues on the sampe covariance matrices S (1),..., S (K), the optimization probem (5) can be separated into severa optimization sub-probems of the same form, which eads to huge speed improvements. A more genera resut for the soution to (6) can be found in [5]. Suppose ˆΘ = ( ˆΘ (1),..., ˆΘ (K) ) is the soution to (5) with reguarization parameter λ. Define E (λ) ij = { 1 if (ˆθ (1) (K) ij,..., ˆθ ij ) 0, i j; 0 otherwise. This defines a symmetric graph G (λ) = (V, E (λ) ), namey the estimated concentration graph defined on the nodes V = {1,..., p} with edges E (λ). Suppose it (8) 5
6 Figure 2: The 5-fod cross-vaidation errors (bue) and the test errors (red) of SQDA and DRDA on 3s and 8s. Figure 3: Heat maps of the sampe precision matrices and the estimated precision matrices of SQDA and DRDA. Estimates are standardized to have unit diagona. The first ine corresponds to the precision matrix of 3s and the second ine corresponds to that of 8s. admits a decomposition into κ(λ) connected components G (λ) = κ(λ) =1 G(λ), (9) where G (λ) = (ˆV (λ), E (λ) ) are the components of the graph G (λ). Define S ij = (n 1 S (1) ij,..., n KS (K) ij ) 2.We can aso perform a threshoding 6
7 on the entries of S and obtain a graph edge skeeton defined by { E (λ) 1 if ij = S ij > λ, i j; 0 otherwise. (10) The symmetric matrix E (λ) defines a symmetric graph G (λ) = (V, E (λ) ), which is referred to as the threshoded sampe covariance graph. G (λ) aso admits a decomposition into connected components where G (λ) G (λ) = k(λ) =1 G(λ), (11) = (V (λ), E (λ) ) are the components of the graph G (λ). Theorem 1. For any λ > 0, the components of the estimated concentration graph G (λ) induce exacty the same vertex-partition as that of the threshoded sampe covariance graph G (λ). Formay, κ(λ) = k(λ) and there exists a permutation π on {1,..., k(λ)} such that ˆV (λ) i = V (λ) π(i), i = 1,..., k(λ). (12) Furthermore, given λ > λ > 0, the vertex-partition induced by the components of G (λ) are nested within that induced by the components of G (λ ). That is, κ(λ) κ(λ (λ) ) and the vertex-partition { ˆV } 1 κ(λ) forms a finer resoution of { ˆV (λ ) } 1 κ(λ ). Proof. The proof of this theorem appears in the Appendix. Theorem 1 aows us to quicky check the connected components of the estimated concentration graph by simpe screening rues on S. Notice that for each cass k, the edge set E (k,λ) defined by E (k,λ) ij = 1 {ˆθ(k) ij 0} is nested in E (λ). Therefore, the features can be reordered in such a way that each ˆΘ (k) is bock diagona ˆΘ (k) = ˆΘ (k) ˆΘ(k) ˆΘ(k) k(λ), (13) where the different components represent bocks of indices given by V (λ), = 1,..., k(λ). Then one can simpy sove the optimization probem (5) on the features within each bock separatey, making probem (5) feasibe for certain vaues of λ athough it may be impossibe to operate on the p p variabes Θ (1),..., Θ (K) on a singe machine. 7
8 3 Community Bayes The estimated precision matrices of SQDA are often bock diagona under suitabe ordering of the features, which impies that the features can be partitioned into severa communities and these communities are mutuay independent within each cass. In this section, we generaize this idea to non-gaussian data and other cassification modes, and refer to it as the community Bayes mode. A reated work in the regression setting is [10]. The authors propose to spit the asso probem into smaer ones by estimating the connected components of the sampe covariance matrix. Their approach invoves ony one covariance matrix and works specificay for the asso. 3.1 Main idea We et X R p denote the feature vector and G {1,..., K} denote the cass variabe. Suppose the feature set V = {1,..., p} admits a partition V = L =1 V such that X V1,..., X VL are mutuay independent conditiona on G = k for k = 1,..., K, where X V is a subset of X containing the features in community V. Then the posterior probabiity has the form og p(g = k X) = og p(x G = k) + og p(g = k) og p(x) (14) L = og p(x V G = k) + og p(g = k) og p(x) (15) = =1 L =1 og p(g = k X V ) + (1 L) og p(g = k) + C(X) (16) where C(X) = L =1 og p(x V ) og p(x) ony depends on X and serves as a normaization term. The equation (16) impies an interesting resut: we can fit the cassification mode on each community separatey, and combine the resutant posteriors into the posterior to the origina probem by simpy adjusting the intercept and normaizing it. This resut has three important consequences: 1. The goba probem competey separates into L smaer tractabe subprobems of the same form, making it possibe to sove an otherwise infeasibe arge-scae probem. Moreover, the moduar structure ends it naturay to parae computation. That is, one can sove these sub-probems independenty on separate machines. 2. This idea is quite genera and can be appied to any ikeihood-based cassifiers, incuding discriminant anaysis, mutinomia ogistic regression, generaized additive modes, cassification trees, etc. 3. When using a cassification mode with interaction terms, (16) doesn t invove interactions across different communities. Therefore, it has fewer degrees of freedom and thus smaer variance than the goba probem. 8
9 To find conditionay independent communities, we use the Spearman s rho, a robust nonparametric rank-based statistics, to directy estimate the unknown correation matrices, and then appy average, singe or compete inkage custering to the estimated correation matrices to get the estimated communities. In next section, we wi show that this procedure consistenty identify conditionay independent communities when the data are from a nonparanorma famiy. The community Bayes agorithm is summarized beow. Agorithm 1 The community Bayes agorithm 1: Compute ˆR (k), the estimated correation matrix based on the Spearman s rho for each cass k. 2: Perform average, singe or compete inkage custering with simiarity matrix R, where R ij = (n 1 ˆR(1) ij,..., n K ˆR (K) ij ) 2, and cut the dendrogram at eve τ to produce vertex-partition V = L =1 ˆV. 3: For each community = 1,..., L, estimate og p(g = k X ˆV ) using a cassification method of choice. 4: Pick τ and any other tuning parameters by cross-vaidation. 3.2 Community estimation In this section we address the key part of the community Bayes mode: how to find conditionay independent communities. We propose to derive the communities by appying average, singe or compete inkage custering to the correation matrices, which are estimated based on nonparametric rank-based statistics. In the case where data are from a nonparanorma famiy, we prove that given knowedge of L, this procedure consistenty identify conditionay independent communities. We assume X = (X 1,..., X p ) T G = k foows a nonparanorma distribution NP N(µ (k), Σ (k), f (k) ) [15] and Σ (k) is nonsinguar. That is, there exists a set of univariate stricty increasing transformations f (k) = {f (k) j such that } p j=1 Z (k) := f (k) (X) G = k N(µ (k), Σ (k) ), (17) where f (k) (X) = (f (k) 1 (X 1 ),..., f p (k) (X p )) T. Notice that the transformation functions f (k) can be different for different casses. To make the mode identifiabe, f (k) preserves the popuation mean and standard deviations: E(X j G = k) = E(f (k) j (X j ) G = k) = µ (k) j, Var(X j G = k) = Var(f (k) j (X j ) G = k) = σ (k) 2 j. Liu et a.[15] prove that the nonparanorma distribution is a Gaussian copua when the transformation functions are monotone and differentiabe. Let R (k) denote the correation matrix of the Gaussian distribution Z (k), and define { 1 if (R (1) ij E ij =,..., R(K) ij ) 0, i j; (18) 0 otherwise. 9
10 Suppose the graph G = (V, E) admits a decomposition into L connected components G = L =1 G. Then the vertex-partition V = L =1 V induced by this decomposition gives us exacty the conditionay independent communities we need. This is because G and G are disconnected Z (k) V Z (k) V, k X V X V G = k, k. (19) Let (x i, g i ) R p K, i = 1,..., n be the training data where x i = (x i1,..., x ip ) T. We estimate the correation matrices using Nonparanorma SKEPTIC [14], which expoits the Spearman s rho and Kenda s tau to directy estimate the unknown correation matrices. Since the estimated correation matrices based on the Spearman s rho and Kenda s tau have simiar theoretica performance, we ony adopt the ones based on the Spearman s rho here. In specific, et ˆρ (k) ij be the Spearman s rho between features i and j based on n k sampes in cass k, i.e., {x i g i = k, i = 1,..., n}. Then the estimated correation matrix for cass k is { ˆR (k) 2 sin( π ij = 6 ˆρ(k) ij ) i j; (20) 1 i = j. Notice that the graph defined by R (k) has the same vertex-partition as that defined by (R (k) ) 1. Inspired by the exact threshoding resut of SQDA, we define R ij = (n ˆR(1) 1 ij,..., n (K) K ˆR ij ) 2 and perform exact threshoding on the entries of R at a certain eve τ, where τ is estimated by cross-vaidation. The resutant vertex-partition yieds an estimate of the conditionay independent communities. Furthermore, there is an interesting connection to hierarchica custering. Specificay, the vertex-partition induced by threshoding matrix R corresponds to the subtrees from when we appy singe inkage aggomerative custering to R and then cut the dendrogram at eve τ [22]. Singe inkage custering tends to produce traiing custers in which individua features are merged one at a time. However, Theorem 2 shows that given knowedge of the true number of communities L, appication of singe, average or compete inkage aggomerative custering on R consistenty estimates the vertex-partition of G. Theorem 2. Assume that G has L connected components and min k n k 21 og p + 2. Define R 0 ij = (n 1R (1) ij,..., n KR (K) ij ) 2 and et min R 0 ij 16πK n k og p, for k = 1,..., K. (21) i,j V ;=1,...,L Then the estimated vertex-partition V = L =1 ˆV resuting from performing SLC, ALC, or CLC with simiarity matrix R satisfies P ( : ˆV V ) K p 2. Proof. The proof of this theorem appears in the Appendix. A sufficient condition for (21) is min i,j V ;=1,...,L (R(1) ij,..., R(K) ij 10 ) 2 16πK og p n max n max n min, (22)
11 where n min = min k n k and n max = max k n k. Therefore, Theorem 2 estabishes the consistency of identification of conditionay independent communities by performing hierarchica custering using SLC, ALC or CLC, provided that n max = Ω(og p), n max /n min = O(1) as n k, p, and provided that no within-community eement of R (k) is too sma in absoute vaue for a k. 4 Exampes 4.1 Sparse quadratic discriminant anaysis In this section we study the performance of SQDA in severa simuated exampes and a rea data exampe. The resuts show that SQDA achieves a ower miscassification error and better interpretabiity compared to naive bayes, QDA and a variant of RDA. Since RDA proposed by Friedman [7] has two tuning parameters, resuting in an unfair comparison, we use a version of RDA that shrinks ˆΣ (k) towards its diagona: ˆΣ (k) (λ) = (1 λ)s (k) + λdiag(s (k) ), with λ [0, 1] (23) Note that λ = 0 corresponds to QDA and λ = 1 corresponds to the naive Bayes cassifier. For any λ < 1, the estimated covariance matrices are dense. We refer to this version of RDA as diagona reguarized discriminant anaysis (DRDA) in the rest of this paper. We report the test miscassification errors and its corresponding standardized tuning parameters of these four cassifiers. Let P (Θ) = i j (θ(1) ij,..., θ(k) ij ) 2. The standardized tuning parameter is defined as s = P ( ˆΘ(λ))/P ( ˆΘ(0)), with s = 0 corresponding to the naive Bayes cassifier and s = 1 corresponding to QDA Simuated exampes The data generated in each experiment consists of a training set, a vaidation set to tune the parameters, and a test set to evauate the performance of our chosen mode. Foowing the notation of [28], we denote././. the number of observations in the training, vaidation and test sets respectivey. For every data set, the tuning parameter minimizing the vaidation miscassification error is chosen to compute the test miscassification error. Each experiment was repicated 50 times. In a cases the popuation cass conditiona distributions were norma, the number of casses was K = 2, and the prior probabiity of each cass was taken to be equa. The mean for cass k was taken to have Eucidean norm tr(σ (k) )/p and the two means were orthogona to each other. We consider three different modes for precision matrices and in a cases the K precision matrices are from the same mode: Mode 1. Fu mode: θ (k) ij Mode 2. Decreasing mode: θ (k) ij = 1 if i = j and θ (k) ij = ρ i j k. 11 = ρ k otherwise.
12 Mode 3. Bock diagona mode with bock size q: θ (k) ij = 1 if i = j, θ (k) ij = ρ k if i j and i, j q and θ (k) ij = 0 otherwise, where 0 < q < p. Tabe 2 and 3, summarizing the resuts for each situation, present the average test miscassification errors over the 50 repications. The quantities in parentheses are the standard deviations of the respective quantities over the 50 repications. Exampe 1: Dense interaction terms We study the performance of SQDA on Mode 1 and Mode 2 with p = 8, ρ 1 = 0, and ρ 2 = 0.8. We generated 50/50/10000 observations for each cass. Tabe 2 summarizes the resuts. In this exampe, both situations have fu interaction terms and shoud favor DRDA. The resuts show that SQDA and DRDA have simiar performance, and both of them give ower miscassification errors and smaer standard deviations than QDA or naive Bayes. The mode-seection procedure behaves quite reasonaby, choosing arge vaues of the standardized tuning parameter s for both SQDA and DRDA. Tabe 2: Miscassification errors and seected tuning parameters for simuated exampe 1. The vaues are averages over 50 repications, with the standard errors in parentheses. Miscassification error SQDA DRDA QDA Naive Bayes Average standardized tuning parameter SQDA DRDA Fu 0.129(0.021) 0.129(0.020) 0.238(0.028) 0.179(0.026) 0.828(0.258) 0.835(0.302) Decreasing 0.103(0.011) 0.104(0.013) 0.170(0.031) 0.153(0.032) 0.817(0.237) 0.781(0.314) Exampe 2: Sparse interaction terms We study the performance of SQDA on Mode 3 with ρ 1 = 0, ρ 2 = 0.8 and q = 4. We performed experiments for (p, n) = (8, 50), (20, 200), (40, 800), (100, 1500), and for each p we generated n/n/10000 observations for each cass. Tabe 3 summarizes the resuts. In this exampe, a situations have sparse interaction terms and shoud favor SQDA. Moreover, the sparsity eve increases as p increases. As conjectured, SQDA strongy dominates with ower miscassification errors at a dimensionaities. The standardized tuning parameter vaues for DRDA are uniformy arger 12
13 Tabe 3: Miscassification errors and seected tuning parameters for simuated exampe 2. The vaues are averages over 50 repications, with the standard errors in parentheses. Miscassification error SQDA DRDA QDA Naive Bayes Average standardized tuning parameter SQDA DRDA p = 8, n = (0.010) 0.088(0.013) 0.092(0.011) 0.107(0.014) 0.529(0.309) 0.688(0.314) p = 20, n = (0.007) 0.109(0.009) 0.114(0.006) 0.121(0.004) 0.190(0.046) 0.293(0.345) p = 40, n = (0.006) 0.108(0.007) 0.127(0.002) 0.114(0.004) 0.109(0.015) 0.194(0.230) p = 100, n = (0.007) 0.188(0.009) 0.230(0.025) 0.215(0.035) 0.024(0.004) 0.073(0.059) than those for SQDA. This is expected because in order to capture the same amount of interactions DRDA needs to incude more noises than SQDA Rea data exampe: vowe recognition data This data consists of training and test data with 10 predictors and 11 casses. We obtained the data from the benchmark coection maintained by Scott Fahman at Carnegie Meon University. The data was contributed by Anthony Robinson [20]. The casses correspond to 11 vowe sounds, each contained in 11 different words. Here are the words, preceded by the symbos that represent them: Vowe Word Vowe Word Vowe Word i heed a: hard U hood I hid Y hud u: who d E head O hod 3: heard A had C: hoard The word was uttered once by each of the 15 speakers. Four mae and four femae speakers were used to train the modes, and the other four mae and three femae speakers were used for testing the performance. This paragraph is technica and describes how the anaog speech signas were transformed into a 10-dimensiona feature vector. The speech signas were owpass fitered at 4.7kHz and then digitized to 12 bits with a 10kHz samping rate. Twefth-order inear predictive anaysis was carried out on six 512-sampe Hamming windowed segments from the steady part of the vowe. The refection coefficients were used to cacuate 10 og-area parameters, giving a 10-dimensiona input space. Each speaker thus yieded six frames of speech from 11 vowes. This gave 528 frames from the eight speakers used to train the modes and 462 frames from the seven speakers used to test the modes. 13
14 We impement our method on a subset of data, which consists of four vowes "Y", "O", "U" and "u:". These four vowes are seected because their sampe precision matrices are approximatey sparse, and SQDA usuay performs we on such dataset. Figure 4 dispays the standardized sampe precision matrices, and the standardized SQDA and DRDA estimates. In addition to DRDA, QDA and naive Bayes, we aso compare our method with three other cassifiers, namey the Support Vector Machine (SVM), k-nearest neighborhood (knn) and Random Forest (RF), in terms of miscassification rate. The SVM, knn, and Random Forest were impemented by the R packages of "e1071", "cass" and "randomforest" with defaut settings, respectivey. The resuts of these cassification procedures are shown in Tabe 4. Figure 4: Heat maps of the sampe precision matrices and the estimated precision matrices of SQDA and DRDA on vowe data. Estimates are standardized to have unit diagona. 14
15 Tabe 4: Vowe speech cassification resuts. Tuning parameters are seected by 5-fod cross vaidation. SQDA DRDA QDA Naive Bayes SVM knn Random Forest Test error CV error The SQDA performs significanty better than the other six cassifiers. Compared with the second winner DRDA, the SQDA has a reative gain of (19.7% %)/19.7% = 12.7%. 4.2 Community Bayes In this section we study the performance of the community Bayes mode using ogistic regression as an exampe. We refer to the cassifier that combines the community Bayes agorithm with ogistic regression as community ogistic regression. We compare the performance of ogistic regression (LR) and community ogistic regression (CLR) in severa simuated exampes and a rea data exampe, and the resuts show that CLR has better accuracy and smaer variance in predictions Simuated exampes The data generated in each experiment consists of a training set, a vaidation set to tune the parameters, and a test set to evauate the performance of our chosen mode. Each experiment was repicated 50 times. In a cases the distributions of Z (k) were norma, the number of casses was K = 3, and the prior probabiity of each cass was taken to be equa. The mean for cass k was taken to have Eucidean norm tr(σ (k) )/(p/2) and the three means were orthogona to each other. We consider three modes for covariance matrices and in a cases the K covariance matrices are the same: Mode 1. Fu mode: Σ ij = 1 if i = j and Σ ij = ρ otherwise. Mode 2. Decreasing mode: Σ ij = ρ i j. Mode 3. Bock diagona mode with q bocks: Σ = diag(σ 1,..., Σ q ). Σ i is of size p q p q or ( p q + 1) ( p q + 1) and is from Mode 1. For simpicity, the transformation functions for a dimensions were the same f (k) 1 =... = f p (k) = f (k). Define g (k) = (f (k) ) 1. In addition to the identity transformation, two different transformations g (k) were empoyed as in [15]: the symmetric power transformation and the Gaussian CDF transformation. See [15] for the definitions. We compare the performance of LR and CLR on the above modes with p = 16, ρ = 0.5 and q = 2, 4, 8. We generated 20/20/10000 observations for 15
16 each cass and use average inkage custering to estimate the communities. Two exampes were studied: Exampe 1: The transformations g (k) for a three casses are the identity transformation. The popuation cass conditiona distributions are norma with different means and common covariance matrix. The ogits are inear in this exampe. Exampe 2: The transformations g (k) for cass 1,2 and 3 are the identity transformation, the symmetric power transformation with α = 3, and the Gaussian CDF transformation with µ g0 = 0 and σ g0 = 1 respectivey. α, µ g0 and σ g0 are defined in [15]. The power and CDF transformations map a univariate norma distribution into a highy skewed and a bi-moda distribution respectivey. The ogits are noninear in this exampe. Tabe 5 summarizes the test miscassification errors of these two methods and the estimated numbers of communities. CLR performs we in a cases with ower test errors and smaer standard errors than LR, incuding the ones where the covariance matrices don t have a bock structure. The community Bayes agorithm introduces a more significant improvement when the cass conditiona distributions are not norma. Tabe 5: Miscassification errors and estimated numbers of communities for the two simuated exampes. The vaues are averages over 50 repications, with the standard errors in parentheses. Exampe 1 CLR test error LR test error Number of communities Exampe 2 CLR test error LR test error Number of communites Fu Decreasing Bock q = 2 q = 4 q = (0.041) 0.274(0.040) 5.220(4.292) 0.221(0.049) 0.255(0.061) 7.480(4.253) 0.221(0.032) 0.286(0.036) 6.040(4.262) 0.227(0.040) 0.282(0.062) 7.320(3.977) 0.227(0.033) 0.271(0.039) 4.700(3.882) 0.208(0.041) 0.263(0.062) 6.940(4.533) 0.203(0.035) 0.278(0.041) 5.320(3.395) 0.203(0.036) 0.266(0.063) 6.220(3.627) 0.262(0.030) 0.325(0.036) 7.780(3.935) 0.239(0.031) 0.309(0.057) 9.160(3.782) Rea data exampe: emai spam The data for this exampe consists of information from 4601 emai messages, in a study to screen emai for spam. The true outcome emai or spam is avaiabe, aong with the reative frequencies of 57 of the most commony occurring words and punctuation marks in the emai message. Since most of the spam predictors 16
17 have a very ong-taied distribution, we og-transformed each variabe (actuay og(x + 0.1)) before fitting the modes. We compare CLR with LR and four other commony used cassifiers, namey SVM, knn, Random Forest and Boosting. Same as in section 4.1.2, the SVM, knn, Random Forest and Boosting were impemented by the R packages of "e1071", "cass", "randomforest" and "ada" with defaut settings, respectivey. The range of the number of communities for CLR were fixed to be between 1 and 20, and the optima number of communities were seected by 5-fod cross vaidation. We randomy chose 1000 sampes from this data, and spit them into equay sized training and test sets. We repeated the random samping and spitting 20 times. The mean miscassification percentage of each method is isted in Tabe 6 with standard error in parenthesis. Tabe 6: Emai spam cassification resuts. The vaues are test miscassification errors averaged over 20 repications, with standard errors in parentheses. CLR LR SVM knn Random Forest Boosting 0.068(0.019) 0.087(0.026) 0.070(0.011) 0.106(0.017) 0.071(0.010) 0.075(0.012) The mean test error rate for LR is 8.7%. By comparison, CLR has a mean test error rate of 6.8%, yieding a 21.8% improvement. Figure 5 shows the test error and the cross-vaidation error of CLR over the range of the number of communities for one repication. It corresponds to LR when the number of communities is 1. The estimated number of communities in this repication is 6, and these communities are isted beow. Most features in community 1 are negativey correated with spam, whie most features in community 2 are positivey correated. hp, hp, george, ab, abs, tenet, technoogy, direct, origina, pm, cs, re, edu, conference, 650, 857, 415, 85, 1999, ch;, ch(, ch[, CAPAVE, CAPMAX, CAPTOT. over, remove, internet, order, free, money, credit, business, emai, mai, receive, wi, peope, report, address, addresses, make, a, our, you, your, 000, ch!, ch$. font, ch#. data, project. parts, meeting, tabe. 3d. 5 Concusions In this paper we have proposed sparse quadratic discriminant anaysis, a cassifier ranging between QDA and naive Bayes through a path of sparse graphica 17
18 Figure 5: The 5-fod cross-vaidation errors (bue) and the test errors (red) of CLR on the spam data. modes. By aowing interaction terms into the mode, this cassifier reaxes the strict and often unreasonabe assumption of naive Bayes. Moreover, the resuting estimates of interactions are sparse and easier to interpret compared to other existing cassifiers that reguarize QDA. Motivated by the connection between the estimated precision matrices and singe inkage custering with the sampe covariance matrices, we present the community Bayes mode, a simpe procedure appicabe to non-gaussian data and any ikeihood-based cassifiers. By expoiting the bock-diagona structure of the estimated correation matrices, we reduce the probem into severa subprobems that can be soved independenty. Simuated and rea data exampes show that the community Bayes mode can improve accuracy and reduce variance in predictions. A Proofs A.1 Proof of Theorem 1 Proof. The standard KKT conditions [2] of the optimization probem (5) give: ( ( ) ) 1 n ˆΘ(k) k + S (k) + λγ (k) = 0, for k = 1,..., K (24) 18
19 where Γ (k) is a p p matrix with diagona eements being zero: 0 γ (k) 12 γ (k) 1p Γ (k) = Denote γ ij = (γ (1) ij,..., γ(k) ˆθ ij 2 for i j, and ( Let Ŵ(k) = ( ˆΘ(k)) 1. Then ( n 1 (Ŵ(1) n 1 (Ŵ(1) ij S (1) ij γ (k) 21 0 γ (k) 2p γ (k) p1 ij ) and ˆθ ij = (ˆθ (1) ij γ (k) p2 0 { B2 (1) if θ ij = 0; θ ij 2 = otherwise. ),..., n K(Ŵ(K) ij θ ij θ ij 2 ) S (K) ij ). (K),..., ˆθ ij ). The subgradient γ ij = λ ˆθ ij, if i j and ˆθ ij (25) 0; ˆθ ij 2 ij S (1) ij ),..., nk(ŵ(k) ij S (K) 2 ij )) λ, if i j and ˆθ ij = 0; (26) (Ŵ(1) ) ( ) ii,..., Ŵ(K) ii = S (1) ii,..., S(K) ii. (27) There exists an ordering of the vertices {1,..., p} such that the edge-matrix of the threshoded covariance graph is bock-diagona. For notationa convenience, we wi assume that the matrix is aready in this order: E (λ) 1 0 E (λ) =....., (28) 0 E (λ) k(λ) where E (λ) represents the bock of indices given by V (λ), = 1,..., k(λ). We wi construct Ŵ(1),..., Ŵ(K) having the same structure as E (λ) which is a soution to the optimization probem (5). Note that if Ŵ(k) is bock diagona then so is its inverse. Let Ŵ(k) and its inverse ˆΘ (k) be given by: Ŵ (k) = Ŵ (k) Ŵ (k) k(λ), ˆΘ(k) = , for a k. 0 ˆΘ(k) k(λ) (29) ˆΘ (k) 19
20 Define Ŵ(k) or equivaenty ˆΘ = argmin Θ (k) 0 K k=1 ˆΘ (k) = n k ( ogdet Θ (k) (Ŵ(k) ) 1 via the foowing sub-probems ) tr(s (k) Θ (k) ) + λ i,j V (λ) i j θ ij 2 (30) is a sub-bock of S (k) for = 1,..., k(λ), where ˆΘ (1) (K) = ( ˆΘ,..., ˆΘ ), and S (k) with row/coumn indices from V (λ) V (λ) (1) (K). Thus for every, ( ˆΘ,..., ˆΘ ) satisfies the KKT conditions corresponding to the th bock of the p p dimensiona probem. By construction of the threshoded sampe covariance graph, if i V (λ) and j V (λ) with, then S (k) ij λ. Hence, the choice ˆΘ ij = Ŵ(k) ij = 0, k = 1,..., K satisfies the KKT conditions (26) for a the off-diagona entries in the bock-matrix (29). Hence ( ˆΘ (1),..., ˆΘ (K) ) soves the optimization probem (5). (λ) This shows that the vertex-partition { ˆV } 1 κ(λ) obtained from the estimated precision graph is a finer resoution of that obtained from the threshoded covariance graph, i.e., for every {1,..., k(λ)}, there is a {1,..., κ(λ)} such (λ) that ˆV V (λ). In particuar k(λ) κ(λ). Conversey, if ( ˆΘ (1),..., ˆΘ (K) ) admits the decomposition as in the statement (λ) (λ) of the theorem, then it foows from (26) that: for i ˆV and j ˆV with, we have S ij 2 λ since Ŵ(k) ij = 0 for k = 1,..., K. Thus the vertexpartition induced by the connected components of the threshoded covariance graph is a finer resoution of that induced by the connected components of the estimated precision graph. In particuar this impies that k(λ) κ(λ). Combining the above two we concude k(λ) = κ(λ) and aso the equaity (12). Since the abeing of the connected components is not unique, there is a permutation π in the theorem. Note that the connected components of the threshoded sampe covariance graph G (λ) are nested within the connected components of G (λ ). In particuar, the vertex-partition of the threshoded sampe covariance graph at λ is aso nested within that of the threshoded sampe covariance graph at λ. We have aready proved that the vertex-partition induced by the connected components of the estimated precision graph and the threshoded sampe covariance graph are equa. Using this resut, we concude that the vertex-partition induced by the components of the estimated precision graph at λ, given by { } 1 κ(λ) is contained inside the vertex-partition induced by the components of the estimated precision graph at λ, given by { ˆV (λ ) } 1 κ(λ ). ˆV (λ) 20
21 A.2 Proof of Theorem 2 Proof. Let a = min i,j V ;=1,...,L R 0 ij. a 2 } {ˆV = V, } [22]. Then P (, ˆV ) V k=1 First note that {max ij R ij R 0 ij < ( P max R ij R 0 ij a ) ij 2 ( K ) P max n ˆR(k) k ij ij R(k) ij a 2 k=1 K ( ) P max n ˆR(k) k ij ij R(k) ij a 2K K ( P max ij k=1 ˆR (k) ij R (k) ij a ) 2n k K The second inequaity is because a a2 m b b2 m a 1 b a m b m. Since a 16πK n k og p, we have P (max ˆR(k) ij ij R (k) ij a 2n k K ) 1 p by 2 Theorem 4.1 of [14]. Therefore P (, ˆV V ) K p. 2 References [1] Peter J Bicke and Eizaveta Levina. Some theory for fisher s inear discriminant function, naive bayes, and some aternatives when there are many more variabes than observations. Bernoui, pages , [2] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, [3] Peter Bühmann, Phiipp Rütimann, Sara van de Geer, and Cun-Hui Zhang. Correated variabes in regression: custering and sparse estimation. Journa of Statistica Panning and Inference, 143(11): , [4] Line Cemmensen, Trevor Hastie, Daniea Witten, and Bjarne Ersbø. Sparse discriminant anaysis. Technometrics, 53(4), [5] Patrick Danaher, Pei Wang, and Daniea M Witten. The joint graphica asso for inverse covariance estimation across mutipe casses. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), [6] Sandrine Dudoit, Jane Fridyand, and Terence P Speed. Comparison of discrimination methods for the cassification of tumors using gene expression data. Journa of the American statistica association, 97(457):77 87,
22 [7] Jerome H Friedman. Reguarized discriminant anaysis. Journa of the American statistica association, 84(405): , [8] Yaqian Guo, Trevor Hastie, and Robert Tibshirani. Reguarized inear discriminant anaysis and its appication in microarrays. Biostatistics, 8(1):86 100, [9] Trevor Hastie, Andreas Buja, and Robert Tibshirani. Penaized discriminant anaysis. The Annas of Statistics, pages , [10] Nadine Hussami and Robert Tibshirani. A component asso. arxiv preprint arxiv: , [11] WJ Krzanowski, Phiip Jonathan, WV McCarthy, and MR Thomas. Discriminant anaysis with singuar covariance matrices: methods and appications to spectroscopic data. Appied statistics, pages , [12] B Boser Le Cun, John S Denker, D Henderson, Richard E Howard, W Hubbard, and Lawrence D Jacke. Handwritten digit recognition with a backpropagation network. In Advances in neura information processing systems. Citeseer, [13] Quefeng Li and Jun Shao. Sparse quadratic discriminant anaysis for high dimensiona data. Statist. Sinica, [14] Han Liu, Fang Han, Ming Yuan, John Lafferty, and Larry Wasserman. The nonparanorma skeptic. arxiv preprint arxiv: , [15] Han Liu, John Lafferty, and Larry Wasserman. The nonparanorma: Semiparametric estimation of high dimensiona undirected graphs. The Journa of Machine Learning Research, 10: , [16] Rahu Mazumder and Trevor Hastie. Exact covariance threshoding into connected components for arge-scae graphica asso. The Journa of Machine Learning Research, 13: , [17] Finbarr O Suivan. A statistica perspective on i-posed inverse probems. Statistica science, pages , [18] Mee Young Park, Trevor Hastie, and Robert Tibshirani. Averaged gene expressions for regression. Biostatistics, 8(2): , [19] Bradey S Price, Chares J Geyer, and Adam J Rothman. Ridge fusion in statistica earning. arxiv preprint arxiv: , [20] Anthony John Robinson. Dynamic error propagation networks. PhD thesis, University of Cambridge, [21] Jiehuan Sun and Hongyu Zhao. The appication of sparse estimation of covariance matrix to quadratic discriminant anaysis. BMC bioinformatics, 16(1):48,
23 [22] Kean Ming Tan, Daniea Witten, and Ai Shojaie. The custer graphica asso for improved estimation of gaussian graphica modes. arxiv preprint arxiv: , [23] Robert Tibshirani, Trevor Hastie, Baasubramanian Narasimhan, and Gibert Chu. Diagnosis of mutipe cancer types by shrunken centroids of gene expression. Proceedings of the Nationa Academy of Sciences, 99(10): , [24] DM Titterington. Common structure of smoothing techniques in statistics. Internationa Statistica Review/Revue Internationae de Statistique, pages , [25] Daniea M Witten, Jerome H Friedman, and Noah Simon. New insights and faster computations for the graphica asso. Journa of Computationa and Graphica Statistics, 20(4): , [26] Ping Xu, Guy N Brock, and Rudoph S Parrish. Modified inear discriminant anaysis approaches for cassification of high-dimensiona microarray data. Computationa Statistics & Data Anaysis, 53(5): , [27] Ming Yuan and Yi Lin. Mode seection and estimation in regression with grouped variabes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 68(1):49 67, [28] Hui Zou and Trevor Hastie. Reguarization and variabe seection via the eastic net. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy), 67(2): ,
FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationExact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
Journa of Machine Learning Research 13 (2012) 723-736 Submitted 8/11; Revised 1/12; Pubished 3/12 Exact Covariance Threshoding into Connected Components for Large-Scae Graphica Lasso Rahu Mazumder Trevor
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationStatistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationCS229 Lecture notes. Andrew Ng
CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationProbabilistic Graphical Models
Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationMoreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationAkaike Information Criterion for ANOVA Model with a Simple Order Restriction
Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More informationA proposed nonparametric mixture density estimation using B-spline functions
A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationDo Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix
VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix
More informationSUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS
ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia
More informationSparse canonical correlation analysis from a predictive point of view
Sparse canonica correation anaysis from a predictive point of view arxiv:1501.01231v1 [stat.me] 6 Jan 2015 Ines Wims Facuty of Economics and Business, KU Leuven and Christophe Croux Facuty of Economics
More informationFORECASTING TELECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELS
FORECASTING TEECOMMUNICATIONS DATA WITH AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODES Niesh Subhash naawade a, Mrs. Meenakshi Pawar b a SVERI's Coege of Engineering, Pandharpur. nieshsubhash15@gmai.com
More informationAlberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain
CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationImproving the Accuracy of Boolean Tomography by Exploiting Path Congestion Degrees
Improving the Accuracy of Booean Tomography by Expoiting Path Congestion Degrees Zhiyong Zhang, Gaoei Fei, Fucai Yu, Guangmin Hu Schoo of Communication and Information Engineering, University of Eectronic
More informationFUSED MULTIPLE GRAPHICAL LASSO
FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused
More informationTUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL
Statistica Sinica 22 (2012), 1123-1146 doi:http://dx.doi.org/10.5705/ss.2009.210 TUNING PARAMETER SELECTION FOR PENALIZED LIKELIHOOD ESTIMATION OF GAUSSIAN GRAPHICAL MODEL Xin Gao, Danie Q. Pu, Yuehua
More informationA Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)
A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,
More informationResearch of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance
Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationXSAT of linear CNF formulas
XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open
More informationOptimality of Inference in Hierarchical Coding for Distributed Object-Based Representations
Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,
More informationStochastic Variational Inference with Gradient Linearization
Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationAST 418/518 Instrumentation and Statistics
AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the
More informationCONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION
CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationActive Learning & Experimental Design
Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection
More informationFirst-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries
c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische
More informationParagraph Topic Classification
Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationEvolutionary Product-Unit Neural Networks for Classification 1
Evoutionary Product-Unit Neura Networs for Cassification F.. Martínez-Estudio, C. Hervás-Martínez, P. A. Gutiérrez Peña A. C. Martínez-Estudio and S. Ventura-Soto Department of Management and Quantitative
More informationA Comparison Study of the Test for Right Censored and Grouped Data
Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the
More informationPartial permutation decoding for MacDonald codes
Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationBP neural network-based sports performance prediction model applied research
Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationLearning Structural Changes of Gaussian Graphical Models in Controlled Experiments
Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University
More informationAn Information Geometrical View of Stationary Subspace Analysis
An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationSeparation of Variables and a Spherical Shell with Surface Charge
Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation
More informationStatistics for Applications. Chapter 7: Regression 1/43
Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)
More informationLearning Fully Observed Undirected Graphical Models
Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,
More informationSupport Vector Machine and Its Application to Regression and Classification
BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views
More informationA Separability Index for Distance-based Clustering and Classification Algorithms
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL.?, NO.?, JANUARY 1? 1 A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationC. Fourier Sine Series Overview
12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a
More informationProblem set 6 The Perron Frobenius theorem.
Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator
More informationPARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR. Pierrick Bruneau, Marc Gelgon and Fabien Picarougne
17th European Signa Processing Conference (EUSIPCO 2009) Gasgow, Scotand, August 24-28, 2009 PARSIMONIOUS VARIATIONAL-BAYES MIXTURE AGGREGATION WITH A POISSON PRIOR Pierric Bruneau, Marc Gegon and Fabien
More informationDIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM
DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing
More informationEfficient Similarity Search across Top-k Lists under the Kendall s Tau Distance
Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de
More informationCryptanalysis of PKP: A New Approach
Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in
More informationA Solution to the 4-bit Parity Problem with a Single Quaternary Neuron
Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria
More informationUnconditional security of differential phase shift quantum key distribution
Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice
More informationTechnical Appendix for Voting, Speechmaking, and the Dimensions of Conflict in the US Senate
Technica Appendix for Voting, Speechmaking, and the Dimensions of Confict in the US Senate In Song Kim John Londregan Marc Ratkovic Juy 6, 205 Abstract We incude here severa technica appendices. First,
More informationComponentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems
Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,
More informationEfficient Generation of Random Bits from Finite State Markov Chains
Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationEmmanuel Abbe Colin Sandon
Detection in the stochastic bock mode with mutipe custers: proof of the achievabiity conjectures, acycic BP, and the information-computation gap Emmanue Abbe Coin Sandon Abstract In a paper that initiated
More informationScalable Spectrum Allocation for Large Networks Based on Sparse Optimization
Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationGraphical lassos for meta-elliptical distributions
The Canadian Journa of Statistics Vo. 42, No. 2, 2014, Pages 185 203 La revue canadienne de statistique 185 Graphica assos for meta-eiptica distributions Martin BILODEAU* Department of Mathematics and
More informationA Statistical Framework for Real-time Event Detection in Power Systems
1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem
More informationAutomobile Prices in Market Equilibrium. Berry, Pakes and Levinsohn
Automobie Prices in Market Equiibrium Berry, Pakes and Levinsohn Empirica Anaysis of demand and suppy in a differentiated products market: equiibrium in the U.S. automobie market. Oigopoistic Differentiated
More informationAnother Look at Linear Programming for Feature Selection via Methods of Regularization 1
Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007
More informationA Separability Index for Distance-based Clustering and Classification Algorithms
A Separabiity Index for Distance-based Custering and Cassification Agorithms Arka P. Ghosh, Ranjan Maitra and Anna D. Peterson 1 Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA.
More informationAsynchronous Control for Coupled Markov Decision Systems
INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of
More informationarxiv: v2 [cs.lg] 4 Sep 2014
Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2
More informationWavelet Methods for Time Series Analysis. Wavelet-Based Signal Estimation: I. Part VIII: Wavelet-Based Signal Extraction and Denoising
Waveet Methods for Time Series Anaysis Part VIII: Waveet-Based Signa Extraction and Denoising overview of key ideas behind waveet-based approach description of four basic modes for signa estimation discussion
More informationHigh Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method
High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationOnline Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)
Onine Appendices for The Economics of Nationaism Xiaohuan Lan and Ben Li) A. Derivation of inequaities 9) and 10) Consider Home without oss of generaity. Denote gobaized and ungobaized by g and ng, respectivey.
More informationAppendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model
Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this
More informationMultilayer Kerceptron
Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,
More informationConvergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems
Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,
More informationSubstructuring Preconditioners for the Bidomain Extracellular Potential Problem
Substructuring Preconditioners for the Bidomain Extraceuar Potentia Probem Mico Pennacchio 1 and Vaeria Simoncini 2,1 1 IMATI - CNR, via Ferrata, 1, 27100 Pavia, Itay mico@imaticnrit 2 Dipartimento di
More informationTesting for the Existence of Clusters
Testing for the Existence of Custers Caudio Fuentes and George Casea University of Forida November 13, 2008 Abstract The detection and determination of custers has been of specia interest, among researchers
More informationAn explicit Jordan Decomposition of Companion matrices
An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057
More informationTwo-sample inference for normal mean vectors based on monotone missing data
Journa of Mutivariate Anaysis 97 (006 6 76 wwweseviercom/ocate/jmva Two-sampe inference for norma mean vectors based on monotone missing data Jianqi Yu a, K Krishnamoorthy a,, Maruthy K Pannaa b a Department
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationGeneral Certificate of Education Advanced Level Examination June 2010
Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis
More informationAppendix A: MATLAB commands for neural networks
Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');
More informationIterative Decoding Performance Bounds for LDPC Codes on Noisy Channels
Iterative Decoding Performance Bounds for LDPC Codes on Noisy Channes arxiv:cs/060700v1 [cs.it] 6 Ju 006 Chun-Hao Hsu and Achieas Anastasopouos Eectrica Engineering and Computer Science Department University
More informationNonlinear Gaussian Filtering via Radial Basis Function Approximation
51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This
More informationA Robust Voice Activity Detection based on Noise Eigenspace Projection
A Robust Voice Activity Detection based on Noise Eigenspace Projection Dongwen Ying 1, Yu Shi 2, Frank Soong 2, Jianwu Dang 1, and Xugang Lu 1 1 Japan Advanced Institute of Science and Technoogy, Nomi
More information