Factorized Asymptotic Bayesian Inference for Mixture Modeling
|
|
- Archibald Francis
- 6 years ago
- Views:
Transcription
1 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Ryohei Fujimaki NEC Laboratories Ameria Department of Media Analytis Satoshi Morinaga NEC Corporation Information and Media Proessing Researh Laboratories Abstrat This paper proposes a novel Bayesian approximation inferene method for mixture modeling. Our key idea is to fatorize marginal log-likelihood using a variational distribution over latent variables. An asymptoti approximation, a fatorized information riterion (FIC), is obtained by applying the Laplae method to eah of the fatorized omponents. In order to evaluate FIC, we propose fatorized asymptoti Bayesian inferene (FAB), whih maximizes an asymptotially-onsistent lower bound of FIC. FIC and FAB have several desirable properties: 1) asymptoti onsisteny with the marginal log-likelihood, 2) automati omponent seletion on the basis of an intrinsi shrinkage mehanism, and 3) parameter identifiability in mixture modeling. Experimental results show that FAB outperforms state-of-the-art VB methods. 1 Introdution Model seletion is one of the most diffiult and important hallenges in mahine learning problems, in whih we optimize a model representation as well as model parameters. Bayesian learning provides a natural and sophistiated way to address the issue [1, 3]. A entral task in Bayesian model seletion is evaluation of marginal log-likelihood. Sine exat evaluation is often omputationally and analytially intratable, a number of approximation algorithms have been studied. One well-known preursor is Bayes information rite- Appearing in Proeedings of the 15 th International Conferene on Artifiial Intelligene and Statistis (AISTATS) 2012, La Palma, Canary Islands. Volume XX of JMLR: W&CP XX. Copyright 2012 by the authors. 400 rion (BIC) [18], an asymptoti seond-order approximation using the Laplae method. Sine BIC assumes regularity onditions that ensure the asymptoti normality of the maximum likelihood (ML) estimator, it loses its theoretial justifiation for non-regular models, inluding mixture models, hidden Markov models, multi-layer neural networks, et. Further, it is known that the ML estimation does not have a unique solution for non-identifiable models in whih the mapping between parameters and funtions is not one-toone [22, 23] (suh equivalent models are said to be in an equivalent lass). For suh non-singular and nonidentifiable models, the generalization error of the ML estimator signifiantly degrades [24]. Among reent advaned Bayesian methods, we fous on variational Bayesian (VB) inferene. A VB method maximizes a omputationally-tratable lower bound of the marginal log-likelihood by applying variational distributions on latent variables and parameters. Previous studies have investigated VB methods for a variety of models [2, 6, 5, 12, 15, 16, 20] and have demonstrated their pratiality. Further, theoretial studies [17, 21] have shown that VB methods an resolve the non-regularity and non-identifiability issues. One disadvantage, however, is that latent variables and model parameters are partitioned into sub-groups that are independent of eah other in variational distributions. While this independene makes omputation tratable, the lower bound an be loose sine their dependeny is essential in the true distribution. This paper proposes a novel Bayesian approximation inferene method for a family of mixture models whih is one of the most signifiant example of non-regular and non-identifiable model families. Our ideas and ontributions are summarized as follows: Fatorized Information Criterion We derive a new approximation of marginal log-likelihood and refer to it as a fatorized information riterion (FIC). A key observation is that, using a variational distribution over latent variables, the marginal log-likelihood an
2 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling be rewritten as a fatorized representation in whih the Laplae approximation is appliable to eah of the fatorized omponents. FIC is unique in the sense that it takes into aount dependenies among latent variables and parameters, and FIC is asymptotially onsistent with the marginal log-likelihood. Fatorized Asymptoti Bayesian Inferene We propose a new approximation inferene algorithm and refer to it as a fatorized asymptoti Bayesian inferene (FAB). Sine FIC is defined on the basis of both observed and latent variables, the latter of whih are not available, FAB maximizes a lower bound of FIC through an iterative optimization similar to the EM algorithm [8] and VB methods [2]. This FAB optimization is proved to monotonially inrease the lower bound of FIC. It is worth noting that FAB provides a natural way to ontrol model omplexity not only in terms of the number of omponents but also in terms of the types of individual omponents (e.g., degrees of omponents in a polynomial urve mixture (PCM).) FIC and FAB offer the following desirable properties: Gaussian mixture models (GMM), PCM, autoregressive mixture models) satisfy A1. A model M of p(x θ) is speified by C and models S of p (X φ ), i.e., M as M = (C, S 1,..., S C ). Although the representations of θ and φ depend on M and S, respetively, we omit them for notational simpliity. Let us denote a latent variable of X as Z = (Z 1,..., Z C ). Z is a omponent assignment vetor, and Z = 1 if X is generated from the -th omponent, and Z = 0 otherwise. The marginal distribution on Z and the onditional distribution on X given Z an be desribed as p(z α) = C =1 αz and p(x Z, φ ) = C =1 (p (X φ )) Z. The observed data and their latent variables are denoted as x N = x 1,..., x N and z N = z 1,..., z N, respetively, where z n = (z n1,..., z nc ) and z N = (z 1,..., z N ). We make another assumption that log P (X, Z θ) < holds (A2). Condition A2 is disussed in Setion Fatorized Information Criterion for Mixture Models 1. The FIC lower bound is asymptotially onsistent with FIC, and therefore with the marginal loglikelihood. 2. FAB automatially selets omponents on the basis of its intrinsi shrinkage mehanism different from prior-based regularization. FAB therefore mitigates overfitting even if it ignores the prior effet in the asymptoti sense, or even if we apply a non-informative prior [13]. 3. FAB addresses the non-identifiability issue. In an equivalent lass, FAB automatially selets the model whih maximizes the entropy of distributions for latent variables. 2 Preliminaries This paper onsiders mixture models p(x θ) = C =1 α p (X φ ) for a D-dimensional random variable X. C and α = (α 1,..., α C ) are the number of omponents and the mixture ratio. θ is θ = (α, φ 1,..., φ C ). We allow different omponents p (X φ ) to be different in their model representations from one another 1. We assume that p (X φ ) satisfies the regularity onditions 2 while p(x θ) is non-regular (A1). Assumption A1 is less stringent than the regularity onditions on p(x θ), and many models (e.g., 1 So-alled heterogeneous mixture models [11]. In the example of PCM, the degree of p 1(X φ 1 ) will be two while that of p 2(X φ 2 ) will one. 2 The Fisher information matrix of p (X φ ) is nonsingular around the maximum likelihood estimator. 401 A Bayesian selets the model whih maximizes the model posterior p(m x N ) p(m)p(x N M). With uniform model prior p(m), we are partiularly interested in p(x N M), whih is referred to as marginal likelihood or Bayesian evidene. VB methods for latent variable models [2, 5, 6, 21] onsider a lower bound of the marginal log-likelihood to be: log p(x N M) q(z N )q θ (θ) log p(xn θ)p(θ M) q(z N dθ, )q θ (θ) z N (1) where q and q θ are variational distributions on z N and θ, respetively. On q and q θ, z N and θ are assumed to be independent of eah other in order to make the lower bound omputationally and analytially tratable. This ignores, however, signifiant dependeny between z N and θ, and basially the equality does not hold. In ontrast to this, we onsider the lower bound on q(z N ) to be: log p(x N M) ( p(x q(z N N, z N M) ) ) log q(z N ) z N (2) VLB(q, x N, M), (3) Lemma 1 [2] guarantees that max q {VLB(q, x N, M)} is exatly onsistent with log p(x N M). Lemma 1 The inequality (3) holds for an arbitrary distribution q on z N, and the equality is satisfied by q(z N ) = p(z N M, x N ).
3 Ryohei Fujimaki, Satoshi Morinaga Sine this paper handles mixture models, we further assume mutual independene of z N, i.e. q(z N ) = C =1 q(zn ) and q(z N ) = N q(z n) zn. The lower bound (2) is the same with that the ollapse variational Bayesian (CVB) method [15, 20] onsiders. A key differene is that FAB employs an asymptoti seond approximation of (2), whih produes several desirable properties of FAB whih we have desribed in Setion 1, while CVB methods employ seond oder approximations of variational parameters in eah iterative update step. Note that the numerator of (3) has the form of the parameter integration of p(x N, z N M), as: C p(x N, z N M) = p(z N α) p (x N z N, φ )p(θ M)dθ. =1 (4) A key idea in FIC is to apply the Laplae method to the individual fatorized distributions p(z N α) and p (x N z N, φ ) as follows 3 : [ FZ, (α ᾱ)] log p(x N,z N θ) log p(x N, z N θ) N 2 C N z [ n F, (φ 2 φ )], (5) =1 Then, by applying a prior of log p(θ M) = O(1), p(x N, z N M) an be asymptotially approximated as: (2π) Dα/2 p(x N, z N M) p(x N, z N θ) N Dα/2 F (9) Z 1/2 C =1 (2π) D/2 ( N z n) D/2 F 1/2 Here, D α D(α) = C 1 and D D(φ ), in whih D( ) is the dimensionality of. On the basis of A1 and Lemma 2, both log F 1/2 and log F Z 1/2 are O(1). By substituting (9) into (3) and ignoring asymptotially small terms, we obtain an asymptoti approximation of log p(x N M) = max q {VLB(q, x N, M)} (i.e., FIC) as follows: F IC(x N,M) = max q J (q, θ, x N ) = ( q(z N ) z N C =1 D { J (q, θ, } x N ) 2 log( N (10) log p(x N, z N θ) D α 2 log N )} z n ) log q(z N ) The following theorem justifies FIC as an approximation of the marginal log-likelihood. where [A, a] represents the quadrati form a T Aa for a matrix A and a vetor a. Here we denote the ML estimator of log p(x N, z N θ) as θ = (ᾱ, φ 1,..., φ C ). We disuss appliation of the Laplae method around the maximum a priori (MAP) estimator in Setion 4.5. F Z and F are fatorized sample approximations of Fisher information matries defined as follows: F Z = 1 2 log p(z N α) α=ᾱ N α α T, (6) 1 F 2 log p (x N z N, φ = ) N z n φ φ T. φ = φ The following lemma is satisfied for F Z and F. Lemma 2 With N, F Z and F respetively onverge to the Fisher information matries desribed as: 2 log p(z α) F Z = α α T p(z α)dz, (7) α=ᾱ 2 log p (X φ F = ) φ φ T p (X φ )dx. (8) φ = φ The proof follows the law of large numbers by taking into aount that the effetive number of samples for the -th omponent is N z n. 3 The Laplae approximation is not appliable to p(x N θ) beause of its non-regularity. 402 Theorem 3 F IC(x N, M) is asymptotially onsistent with log p(x N M) under A1 and log p(θ M) = O(1). Sketh of Proof 3 Sine both p (x N z N, φ ) and p(z N α) satisfy the regularity ondition (A1 and A2), their Laplae approximations appearing in the left side of (9) have asymptoti onsisteny (for the same reason as with BIC [7, 18, 19]). Therefore, their produt (9) asymptotially agrees with p(x N, z N M). By applying Lemma 1, this theorem is proved. Despite our ignoring the prior effet, FIC still has the regularization term D z N q(zn ) log( N z n)/2. This term has several interesting properties. First, a dependeny between z N and θ, whih most of VB methods ignore, expliitly appears. Like BIC, it is not the parameter or funtion representation of p(x φ ) but only its dimensionality D that plays an important role in the bridge between the latent variables and the parameters. Seond, the omplexity of the -th omponent is adjusted by the value of z N q(zn ) log( N z n). Therefore, omponents of different sizes are automatially regularized in their individual appropriate levels. Third, roughly speaking, with respet to the terms, it is preferable for the entropy of q(z N ) to be large. This makes the parameter estimation in FAB identifiable. More details are disussed in Setions 4.4 and 4.5.
4 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling 4 Fatorized Asymptoti Bayesian Inferene Algorithm 4.1 Lower bound of FIC Sine θ is not available in pratie, we annot evaluate FIC itself. Instead, FAB maximizes an asymptotiallyonsistent lower bound for FIC. We firstly derive the lower bound as follows: Lemma 4 Let us define L(a, b) log b + (a b)/b. F IC(x N, M) is lower bounded as follows: F IC(x N, M) G(q, q, θ, x N ) (11) ( q(z N ) log p(x N, z N θ) D α 2 log N z N C N D 2 L( N z n, =1 ) q(z n )) log q(z N ), with arbitrary hoies of q, θ and a distribution q on z N ( N q(z n) > 0). Sketh of Proof 4 Sine θ is the ML estimator of log p(x N, z N θ), log p(x N, z N θ) log p(x N, z N θ) is satisfied. Further, on the basis of the onavity of the logarithm funtion, log( N z n) L( N z n, N q(z n)) is satisfied. By substituting these two inequalities into (10), we obtain (11). In addition to the replaement of θ with the unavailable estimator θ, a omputationally intratable log( N z n) is linearized around N q(z n) with a new parameter (distribution) q. Now our problem is to solve the following maximization problem (note that q, θ and q are funtions of M): M, q, θ, q = arg max M,q,θ, q G(q, q, θ, xn ). (12) The following lemma gives us a guide for optimizing the newly introdued distribution q. We omit the proof beause of spae limitations. Lemma 5 If we fix q and θ, then q = q maximizes G(q, q, θ, x N ). With a finite number of model andidates M, we an use a two-stage algorithm in whih we first solve the maximization of (12) for all andidates in M, and then selet the best model. When we optimize only the number of omponents C (e.g., as in a GMM), we need to solve the inner maximization C max times (the maximum searh number of omponents.) However, if we intend to optimize S 1,..., S C (e.g., as in PCM or an autoregressive mixture), we must avoid a ombinatorial salability issue. FAB provides a natural way to do this beause the objetive funtion is separated into independent parts in terms of S 1,..., S C Iterative Optimization Algorithm Let us first fix C, and onsider the following optimization problem: S, q, θ, q = arg max S,q,θ, q G(q, q, θ, xn ), (13) where S = (S 1,..., S C ). As the inner maximization, FAB solves (13) on the basis of iterations of two substeps (V-step and M-step). Let the supersription (t) represent the t-th iteration. V step The V-step optimizes the variational distribution q as follows: { } q (t) = arg max G(q, q = q (t 1), θ (t 1), x N ). (14) q More speifially, q (t) is obtained as follows: q (t) (z n ) α (t 1) p(x n φ (t 1) ) exp( D 2α (t 1) N ), (15) where we use N q(t 1) (z n ) = α (t 1) N. The regularization terms whih we disussed in Setion 3 appear here as exponentiated update terms, i.e., D exp( ). Roughly speaking, (15) indiates that 2α (t 1) N 2α (t 1) N smaller and more omplex omponents are likely to beome smaller through the iterations. The V-step is different from the E-step of the EM algorithm in D the term exp( ). This makes an essential differene between FAB inferene and the ML estimation that employs the EM algorithm (Setions 4.4 and 4.5). M step The M-step optimizes the models of individual omponents S and the parameter θ as follows: S (t), θ (t) = arg max S,θ G(q(t), q = q (t), θ, x N ). (16) G(q, q, θ, x N ) has a signifiant advantage in its avoiding of the ombinatorial salability issue on S seletion. Sine G(q, q, θ, x N ) has no ross term between omponents (if we fix q), we an separately optimize S and θ as follows: α (t) = N q (t) (z n )/N, (17) S (t), φ (t) = arg max H (q (t), q (t), φ, x N ) (18) S,φ H (q (t), q (t), φ,x N ) = N q (t) (z n ) log p(x n φ ) D N 2 L( N q (t) (z n ), q (t) (z n )).
5 Ryohei Fujimaki, Satoshi Morinaga H (q (t), q (t), φ, x N ) is a part of G(q (t), q (t), φ, x N ), whih is related to the -th omponent. With a finite set of omponent andidates, in (18), we first optimize φ for eah element of a fixed S and then selet the optimal one by omparing them. Note that, if we use a single omponent type (e.g., GMM), our M-step eventually beomes the same as that in the EM algorithm. 4.3 Convergene and Consisteny After the t-th iteration, FIC is lower bounded as: F IC(x N, M) F IC (t) LB (xn, M) G(q (t), q (t), θ (t), x N ). (19) We have the following theorem whih guarantees that FAB will monotonially inrease F IC (t) LB (xn, M) over the iterative optimization and also onverge to the loal minima under ertain regularity onditions. Theorem 6 For the iteration of the V-step and the M-step, the following inequality is satisfied: F IC (t) LB (xn, M) F IC (t 1) LB (xn, M). (20) Sketh of Proof 6 The theorem is proved as follows: G(q (t), q (t), θ (t), x N ) G(q (t), q (t), θ (t 1), x N ) G(q (t), q (t 1), θ (t 1), x N ) G(q (t 1), q (t 1), θ (t 1), x N ). The first and the third inequalities arise from (16) and (14), respetively. The seond inequality arises from Lemma 5. We then use the following stopping riterion for the FAB iteration steps: F IC (t) LB (xn, M) F IC (t 1) LB (xn, M) ε, (21) where ε is an optimization tolerane parameter and is set to 1e-6 in Setion 5. In addition to the monotoni property of FAB, we present the theorem below, whih asymptotitheoretially supports FAB. Let us denote M (t) = (C, S (t) ) and let the supersriptions T and represent the number of steps at onvergene and the true model/parameters, respetively. Theorem 7 F IC (T ) LB (xn, M (T ) ) is asymptotially onsistent with F IC(x N, M (T ) ). Sketh of Proof 7 1) In (15), exp( D ) onverges to one, and S (T 1) = S (T ) holds without 2α (T ) N loss of generality. Therefore, the FAB algorithm is asymptotially redued to the EM algorithm. This means θ (T ) onverges to the ML estimator ˆθ of 404 log p(x N θ). Then, z N q(t ) (z N )(log p(x N, z N θ) log p(x N, z N θ (T ) )) /N 0 holds. 2) On the basis of the law of large numbers, N z n/n onverges to α. Then, z N q(t ) (z N ) C =1 (log( N z n) log( N q(t ) (z n ))) /N 0 holds. The substitution of 1) and 2) into (11) proves the theorem. The following theorem arises from Theorems 3 and 7 and guarantees that FAB will be asymptotially apable of evaluating the marginal log-likelihood itself: Theorem 8 F IC (T ) LB (xn, M (T ) ) is asymptotially onsistent with log p(x N M (T ) ). 4.4 Shrinkage Mehanism As has been noted in Setions 3 and 4.2, the term exp(d /2α (t) N) in (15), whih arises from D z N q(zn ) log( N z n)/2 in (10), plays a signifiant role in the ontrol of model omplexity. Fig.1 illustrates the shrinkage effets of FAB in the ase of of a GMM. For D = 10 and N = 50, for example, a omponent of α < 0.2 is severely regularized, and suh omponents are automatially shrunk (see the left graph.) With inreasing N, the shrinkage effet dereases, and small omponents manage to survive in the FAB optimization proess. The right graph shows the shrinkage effet in terms of the dimensionality. In a low dimensional spae (e.g., D = 1, 3), small omponents are not stritly regularized. For D = 20 and N = 100, however, a omponent of α < 0.4 may be shrunk, and only one omponent might survive beause of the onstraint C =1 α. Let us onsider the gradient of exp( D 2α N ) whih is derived as exp( D 2α )/ α N = D D α 2 N exp( 2α ) 0. N This indiates that the iteration of FAB aelerates the shrinkages in order of O(α 2 ). More intuitively speaking, the smaller the omponent is, the more likely it is to be shrunk. The above observations explain the overfitting mitigation mehanism in FAB. The shrinkage effet of FAB is different from the priorbased regularization of the MAP estimation. In fat, it appears despite our asymptotially ignoring of the prior effet, and even if a non-informative prior or a flat prior is applied. In terms of pratial implementation, α does not get to exatly zero beause (15) takes an exponentiated update. Therefore, in our FAB proedure, we apply a thresholding-based shrinkage of small omponents, after the V-step, as follows: { N q (t) 0 if (z n ) = q(t) (z n ) < δ q (t) (z n )/Q (t) (22) otherwise
6 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Figure 1: Shrinkage effets of FAB in GMM (D = D + D(D +1)/2, where D and D are, respetively, the data dimensionality and the parameter dimensionality.) The horizontal and vertial axes represent α and exp( D /(α N)), respetively. The effet is evident when the number of data is not suffiient for the parameter dimensionality. Algorithm 1 F AB two : Two-stage FAB input : x N, C max, S, ε output : C, S, θ, q (z N ), F ICLB 1: F ICLB = 2: for C = 1,..., C max do 3: Calulate F IC (T ) LB, C, S(T ), θ (T ), and q (T ) (z N ) by F AB shrink (x N, C max = C, S, ε, δ = 0). 4: end for 5: Choose C, S, θ and q (z N ) by (12). δ and Q (t) are a threshold value and a normalization onstant for C =1 q(t) (z n ) = 1, respetively. We used the threshold value δ = 0.01N in Setion 5. With the shrinkage operation, FAB does not require the twostage optimization in (12). By starting from a suffiient number of omponents C max, FAB iteratively and simultaneously optimizes all of M = (C, S), θ, and q. Sine FAB annot revive shrunk omponents, it is a greedy algorithm, like the least angle regression method [9] whih does not revive shrunk features. The two-stage algorithm and the shrinkage algorithm are shown here as Algorithms 1 and 2. S is a set of omponent andidates (e.g., S = {Gaussian} for GMM. S = {0,..., K max } for PCM where K max is the maximum degree of urves.) 4.5 Identifiability A well-known diffiulty in ML estimation for mixture models is non-identiability, as we have noted in Setion 1. Let us onsider a simple mixture model, p(x a, b, ) = ap(x b) + (1 a)p(x ). All models p(x a, b, = b) with arbitrary a are equivalent (i.e., in an equivalent lass.) Therefore, the ML estimator is not unique. Theoretial studies have shown that Bayesian estimators and VB estimators avoid the above issue and signifiantly outperform the ML estimator in terms of generalization error [17, 21, 24]. 405 Algorithm 2 F AB shrink : Shrinkage FAB input : x N, C max, S, ε, δ output : F IC (T ) LB, C, S(T ), θ (T ), q (T ) (z N ) 1: t = 0, F IC (0) LB = 2: randomly initialize q (0) (z N ). 3: while onvergene do 4: Calulate S (t) and θ (t) by (17) and (18). 5: Chek onvergene by (21). 6: Calulate q (t+1) by (15). 7: Shrinkage omponents by (22). 8: t = t + 1 9: end while 10: T = t In FAB, at the stationary (onvergene) point of (15), the following equality is satisfied as N q (z n ) = αn = N α p(x n φ ) exp( D 2α N )/Z n, where Z n = C =1 α p(x n φ ) exp( D 2α N ) is the normalization onstant. For all pairs of α, α > 0, we have the following ondition: N p(x n φ ) exp( D 2α N ) = N p(x n φ ) exp( D 2α N ). Then, the neessary ondition of φ = φ is α = α. Therefore, FAB hooses the model, in an equivalent lass, whih maximizes the entropy of q and addresses the non-identifiability issue. Unfortunately, FIC and FAB do not resolve another non-identifiability issue, i.e., divergene of the riterion. Without the assumption A2, (10) and (11) an diverge to infinity (e.g., a GMM with a zero-variane omponent 4.) We an address this issue by applying the Laplae approximation around the MAP estimator θ MAP of p(x N, z N θ)p(θ) rather than around the ML estimator θ. In suh a ase, a flat onjugate prior might be used in order to avoid the divergene. While suh priors do not provide regularization effets, FAB has an overfitting mitigation mehanism, as has been previously disussed. Beause of spae limitations, we omit here a detailed disussion of FIC and FAB with prior distributions. 5 Experiments 5.1 Polynomial Curve Mixture We first evaluated F AB shrink for PCM to demonstrate how its model seletion works. We used the settings N = 300 and C max = K max = 10 in this evaluation 5. Let Y be a random dependent variable of X. 4 Pratially speaking, with the shrinkage (22), FAB does not have the divergene issue beause small omponents are automatially removed. 5 Larger values of C max and K max did not make a large differene in results, but they were hard to visualize.
7 Ryohei Fujimaki, Satoshi Morinaga Figure 2: Model seletion proedure for F AB shrink in appliation to PCM. The true urves are : 1) Y = 5 + ε, 2) Y = X ε, 3) Y = X + ε, and 4) Y = 0.5X 3 + 2X + ε where ε N (0, 1). The top shows the hange in F IC (t) LB over iteration t. The bottom three graphs are the estimated urves at t = 1, 6, and 44. PCM has the mixed omponent p (Y X, φ ) = N (Y, X(S )β, σ 2 ), where X(S ) = (1, X,..., X S ). The true model has four urves as shown in Fig. 2.We used the setting α = (0.3, 0.2, 0.3, 0.2) for the urves 1) - 4). x N was uniformly sampled from [ 5, 5]. Fig. 2 illustrates F IC (t) LB (top) and intermediate estimation results (bottom). From the top figure, we are able to onfirm that FAB monotonially inreases F IC (t) LB, exept for the points (dashed irles) of the shrinkage operation. In the initial state t = 1 (bottom left), q (0) (z N ) is randomly initialized, and therefore the degrees of all ten urves are zero (S = 0). Over the iteration, F AB shrink simultaneously searhes C, S, and θ. For example, at t = 15 (bottom enter), we have one urve of S = 9, two urves of S = 2, and six urves of S = 0. Here two urves have already been shrunk. Finally, F AB shrink selets the model onsistent with the true model at t = 44 (bottom left). These results demonstrate the powerful simultaneous model seletion proedure of F AB shrink. 5.2 Gaussian Mixture Model We applied F AB two and F AB shrink to GMM in appliation to artifiial datasets and UCI datasets [10], and ompared them with the variational Bayesian EM algorithm for GMM (VBEM) that is desribed in Setion 10.2 of [3] and the variational Bayesian Dirihlet Proess GMM (VBDP) [5]. The former and the latter use a Dirihlet prior and a Dirihlet proess prior for α, respetively. We used the implementations by M. E. Khan 6 for VBEM and by K. Kurihara 7 for VBDP. We used the default hyperparameter values in the software. In the following experiments, C max was set to C max = 20 for all methods Artifiial Data We generated the true model with D = 15, C = 5 and φ = (µ, Σ ) (mean and ovariane), where the supersription denotes the true model. The parameter values were randomly sampled as α [0.4, 0.6] (before normalization), and µ [ 5, 5]. Σ were generated as a Σ, where a [0.5, 1.5] is a sale parameter and Σ is generated using the gallery funtion in Matlab. The results are averages of ten runs. The suffiient number of data is important measure to evaluate asymptoti methods, and this is usually measured in terms of the empirial onvergenes of their riteria. Table 1 shows how the values of F IC (T ) LB /N onverge over N. Roughly speaking, N = was suffiient for onvergene (depending on data dimensionality D.) Our results indiate that FAB an be applied in atual pratie to reent data analysis senarios with large N values. We next ompared the four methods on the basis of three evaluation metris: 1) the Kullbak-Leibler divergene (KL) between the true distribution and the estimated distribution (estimation error), 2) C C (model seletion error), and 3) CPU time. For N 500, VBDP performed better than or omparably with F AB two and F AB shrink w.r.t. KL and C C (left and middle graphs of Fig. 3). With inreasing N, those of the F ABs signifiantly derease while that of VBDP does not. This might be beause the lower bound in (1) generally does not reah the marginal log-likelihood as has been noted. VBEM was signifiantly inferior for both metris. Another observation is that VBDP and VBEM were likely to re-
8 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Table 1: Convergene of F ICLB over data size N. The standard deviations are in parenthesis. N F AB two (3.51) (2.51) (1.19) (1.05) (1.10) (0.50) (0.39) F AB shrink (2.61) (1.56) (1.24) (1.02) (0.85) (0.74) (0.39) Table 2: Preditive likelihood per data. The standard deviations are in parenthesis. The maximum number of training samples was limited to 2000 (parenthesis in the N olumn), and the rest of them were used for test. The best results and those not signifiantly worse than them are highlighted in boldfae (one-side t-test with 95% onfidene.) data N D F AB two F AB shrink VBEM VBDP iris (0.76) (0.68) (0.64) (0.57) yeast (1.81) (3.38) 3.52 (3.80) (0.72) loud (2.81) 9.09 (2.62) 5.47 (2.62) 2.67 (2.28) wine quality 6497 (2000) (0.22) (0.17) (0.17) (0.06) olor moments (2000) (0.20) (0.21) (0.21) (0.11) oo texture (2000) (0.22) (0.46) (0.13) 6.78 (0.28) spoken arabi digits (2000) (0.11) (0.09) (0.09) (0.19) Figure 3: Comparisons of four methods in terms of KL (left), C C (middle), and 3) CPU time (right). spetively under-fit and to over-fit their models, while we did not find suh biases in F ABs. While KLs of F ABs are similar, F AB shrink performed better in terms of C C beause small omponents survived in F AB two. Further, both F ABs had a signifiant advantage over VBDP and VBEM in CPU time (right graph). In partiular, while the CPU time of VBDP grew explosively over N, those of F ABs remained in a pratial range. Sine F AB shrink does not use a loop to optimize C, it was signifiantly faster than F AB two UCI data For the UCI datasets summarized in Table 2, we do not know their true distributions and therefore employed preditive log-likelihood as an evaluation metri. For large sale datasets, we randomly seleted two thousands data for training and used the rest of the data for predition beause of the salability issue in VBDP. As shown in Table 2, F ABs gave better performane then VBDP and VBEM with a suffiient number of data (the sale O(10 3 ) agrees with the results in the previous setion). The trend of VBDP (under-fit) and VBEM (over-fit) was the same with the previous setion. Also, while the preditive log-likelihood of F AB two and F AB shrink are ompetitive, F AB shrink 407 obtained more ompat models (smaller C values) than F AB two. Unfortunately, we have no spae to show the results here. 6 Summary and Future Work We have proposed approximation of marginal loglikelihood (FIC) and an inferene method (FAB) for Bayesian model seletion for mixture models, as an alternative to VB inferene. We have given their justifiations (asymptoti onsisteny, onvergene, et) and analyzed FAB mehanisms in terms of overfitting mitigation (shrinkage) and identifiability. Experimental results have shown that FAB outperforms state-ofthe-art VB methods for a pratial number of data in terms of both model seletion performane and omputational effiieny. Our next step is extensions of FAB to more general non-regular and non-identifiable models whih ontain ontinuous latent variables (e.g., fator analyzer models [12] and matrix fatorization models [17]), timedependent latent variables (e.g., hidden Markov models [16]), hierarhial latent variables [14, 4], et. We believe that our idea, fatorized representation of the marginal log-likelihood in whih the asymptoti approximation works, is widely appliable to them.
9 Ryohei Fujimaki, Satoshi Morinaga Referenes [1] T. Ando. Bayesian Model Seletion and Statistial Modeling. Chapman and Hall/CRC, [2] M. Beal. Variational Algorithms for Approximate Bayesian Inferene. PhD thesis, Gatsby Computational Neurosiene Unit, University College London, [3] C. M. Bishop. Pattern Reognition and Mahine Learning. Springer-Verlag, [4] C. M. Bishop and M. E. Tipping. A hierarhial latent variable model for data visualization. IEEE Transations on Pattern Analysis and Mahine Intelligene, 20(3): , [5] D. M. Blei and M. I. Jordan. Variational inferene for Dirihlet proess mixtures. Bayesian Analysis, 1(1): , [6] A. Corduneanu and C. M. Bishop. Variational Bayesian model seletion for mixture distributions. In Proeedings of Artiffiial Intelligene and Statistis, pages 27 34, [7] I. Csiszar and P. C. Shields. The onsisteny of the BIC Markov order estimator. Annals of Statistis, 28(6): , [8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from imomplete data via the EM algorithm. Journal of the Royal Statistial Soiety, B39(1):1 38, [9] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistis, 32(2): , [10] A. Frank and A. Asunion. UCI mahine learning repository [11] R. Fujimaki, Y. Sogawa, and S. Morinaga. Online heterogeneous mixture modeling with marginal and opula seletion. In Proeedings of ACM SIGKDD Conferene on Knowledge Disovery and Data Mining, pages , [14] M. I. Jordan and R. A. Jaobs. Hierarhial mixtures of experts and the EM algorithm. Neural Computation, 6: , [15] K. Kurihara, M. Welling, and Y. W. Teh. Collapsed variational dirihlet proess mixture models. In Proeedings of Twentieth International Joint Conferene on Artifiial Intelligene, pages , [16] D. J. MaKay. Ensemble learning for hidden Markov models. Tehnial report, University of Cambridge, [17] S. Nakajima and M. Sugiyama. Theoretial analysis of Bayesian matrix fatorization. Journal of Mahine Learning Researh, 12: , [18] G. Shwarz. Estimating the dimension of a model. The Annals of Statistis, 6(2): , [19] M. Stone. Comments on model seletion riteria of Akaike and Shwarz. Journal of the Royal Statistial Soiety: Series B, 41(2): , [20] Y. W. Teh, D. Newman, and M. Welling. A ollapsed variational bayesian inferene algorithm for latent dirihlet alloation. In Advanes in Neural Information Proessing Systems, pages , [21] K. Watanabe and S. Watanabe. Stohasti omplexities of Gaussian mixtures in variational Bayesian approximation. Journal of Mahine Learning Researh, 7: , [22] S. Watanabe. Algebrai analysis for nonidentiable learning mahines. Neural Computation, 13: , [23] S. Watanabe. Algebrai geometry and statistial learning. UK: Cambridge University Press, [24] K. Yamazaki and S. Watanabe. Mixture models and upper bounds of stohasti omplexity. Neural Networks, 16: , [12] Z. Ghahramani and M. Beal. Variational inferene for Bayesian mixtures of fator analysers. In Advanes in Neural Information Proessing Systems 12, pages MIT Press, [13] H. Jeffreys. An invariant form for the prior probability in estimation problems. Proeedings of the Royal Soiety of London. Series A, Mathematial and Physial Sienes, 186: ,
Complexity of Regularization RBF Networks
Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw
More informationModel-based mixture discriminant analysis an experimental study
Model-based mixture disriminant analysis an experimental study Zohar Halbe and Mayer Aladjem Department of Eletrial and Computer Engineering, Ben-Gurion University of the Negev P.O.Box 653, Beer-Sheva,
More informationDanielle Maddix AA238 Final Project December 9, 2016
Struture and Parameter Learning in Bayesian Networks with Appliations to Prediting Breast Caner Tumor Malignany in a Lower Dimension Feature Spae Danielle Maddix AA238 Final Projet Deember 9, 2016 Abstrat
More information10.5 Unsupervised Bayesian Learning
The Bayes Classifier Maximum-likelihood methods: Li Yu Hongda Mao Joan Wang parameter vetor is a fixed but unknown value Bayes methods: parameter vetor is a random variable with known prior distribution
More informationMethods of evaluating tests
Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed
More informationMaximum Entropy and Exponential Families
Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It
More informationSensitivity Analysis in Markov Networks
Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores
More informationMillennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion
Millennium Relativity Aeleration Composition he Relativisti Relationship between Aeleration and niform Motion Copyright 003 Joseph A. Rybzyk Abstrat he relativisti priniples developed throughout the six
More informationThe Effectiveness of the Linear Hull Effect
The Effetiveness of the Linear Hull Effet S. Murphy Tehnial Report RHUL MA 009 9 6 Otober 009 Department of Mathematis Royal Holloway, University of London Egham, Surrey TW0 0EX, England http://www.rhul.a.uk/mathematis/tehreports
More informationProbabilistic Graphical Models
Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)
More informationA Unified View on Multi-class Support Vector Classification Supplement
Journal of Mahine Learning Researh??) Submitted 7/15; Published?/?? A Unified View on Multi-lass Support Vetor Classifiation Supplement Ürün Doğan Mirosoft Researh Tobias Glasmahers Institut für Neuroinformatik
More informationError Bounds for Context Reduction and Feature Omission
Error Bounds for Context Redution and Feature Omission Eugen Bek, Ralf Shlüter, Hermann Ney,2 Human Language Tehnology and Pattern Reognition, Computer Siene Department RWTH Aahen University, Ahornstr.
More informationFeature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification
Feature Seletion by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classifiation Tian Lan, Deniz Erdogmus, Andre Adami, Mihael Pavel BME Department, Oregon Health & Siene
More informationBilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem
Bilinear Formulated Multiple Kernel Learning for Multi-lass Classifiation Problem Takumi Kobayashi and Nobuyuki Otsu National Institute of Advaned Industrial Siene and Tehnology, -- Umezono, Tsukuba, Japan
More informationOptimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach
Amerian Journal of heoretial and Applied tatistis 6; 5(-): -8 Published online January 7, 6 (http://www.sienepublishinggroup.om/j/ajtas) doi:.648/j.ajtas.s.65.4 IN: 36-8999 (Print); IN: 36-96 (Online)
More informationA Queueing Model for Call Blending in Call Centers
A Queueing Model for Call Blending in Call Centers Sandjai Bhulai and Ger Koole Vrije Universiteit Amsterdam Faulty of Sienes De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {sbhulai, koole}@s.vu.nl
More informationSensor management for PRF selection in the track-before-detect context
Sensor management for PRF seletion in the tra-before-detet ontext Fotios Katsilieris, Yvo Boers, and Hans Driessen Thales Nederland B.V. Haasbergerstraat 49, 7554 PA Hengelo, the Netherlands Email: {Fotios.Katsilieris,
More information23.1 Tuning controllers, in the large view Quoting from Section 16.7:
Lesson 23. Tuning a real ontroller - modeling, proess identifiation, fine tuning 23.0 Context We have learned to view proesses as dynami systems, taking are to identify their input, intermediate, and output
More informationA new initial search direction for nonlinear conjugate gradient method
International Journal of Mathematis Researh. ISSN 0976-5840 Volume 6, Number 2 (2014), pp. 183 190 International Researh Publiation House http://www.irphouse.om A new initial searh diretion for nonlinear
More informationLikelihood-confidence intervals for quantiles in Extreme Value Distributions
Likelihood-onfidene intervals for quantiles in Extreme Value Distributions A. Bolívar, E. Díaz-Franés, J. Ortega, and E. Vilhis. Centro de Investigaión en Matemátias; A.P. 42, Guanajuato, Gto. 36; Méxio
More informationLOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION
LOGISIC REGRESSIO I DEPRESSIO CLASSIFICAIO J. Kual,. V. ran, M. Bareš KSE, FJFI, CVU v Praze PCP, CS, 3LF UK v Praze Abstrat Well nown logisti regression and the other binary response models an be used
More informationA Characterization of Wavelet Convergence in Sobolev Spaces
A Charaterization of Wavelet Convergene in Sobolev Spaes Mark A. Kon 1 oston University Louise Arakelian Raphael Howard University Dediated to Prof. Robert Carroll on the oasion of his 70th birthday. Abstrat
More informationControl Theory association of mathematics and engineering
Control Theory assoiation of mathematis and engineering Wojieh Mitkowski Krzysztof Oprzedkiewiz Department of Automatis AGH Univ. of Siene & Tehnology, Craow, Poland, Abstrat In this paper a methodology
More informationWeighted K-Nearest Neighbor Revisited
Weighted -Nearest Neighbor Revisited M. Biego University of Verona Verona, Italy Email: manuele.biego@univr.it M. Loog Delft University of Tehnology Delft, The Netherlands Email: m.loog@tudelft.nl Abstrat
More informationLecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.
Stat60/CS94: Randomized Algorithms for Matries and Data Leture 7-09/5/013 Leture 7: Sampling/Projetions for Least-squares Approximation, Cont. Leturer: Mihael Mahoney Sribe: Mihael Mahoney Warning: these
More informationAssessing the Performance of a BCI: A Task-Oriented Approach
Assessing the Performane of a BCI: A Task-Oriented Approah B. Dal Seno, L. Mainardi 2, M. Matteui Department of Eletronis and Information, IIT-Unit, Politenio di Milano, Italy 2 Department of Bioengineering,
More informationThe Laws of Acceleration
The Laws of Aeleration The Relationships between Time, Veloity, and Rate of Aeleration Copyright 2001 Joseph A. Rybzyk Abstrat Presented is a theory in fundamental theoretial physis that establishes the
More information18.05 Problem Set 6, Spring 2014 Solutions
8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H
More informationCSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM
CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/
More informationHankel Optimal Model Order Reduction 1
Massahusetts Institute of Tehnology Department of Eletrial Engineering and Computer Siene 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Hankel Optimal Model Order Redution 1 This leture overs both
More informationSupplementary Materials
Supplementary Materials Neural population partitioning and a onurrent brain-mahine interfae for sequential motor funtion Maryam M. Shanehi, Rollin C. Hu, Marissa Powers, Gregory W. Wornell, Emery N. Brown
More informationAverage Rate Speed Scaling
Average Rate Speed Saling Nikhil Bansal David P. Bunde Ho-Leung Chan Kirk Pruhs May 2, 2008 Abstrat Speed saling is a power management tehnique that involves dynamially hanging the speed of a proessor.
More informationA NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM
NETWORK SIMPLEX LGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM Cen Çalışan, Utah Valley University, 800 W. University Parway, Orem, UT 84058, 801-863-6487, en.alisan@uvu.edu BSTRCT The minimum
More informationRemark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.
Leture Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the funtion V ( x ) to be positive definite. ost often, our interest will be to show that x( t) as t. For that we will need
More informationAn I-Vector Backend for Speaker Verification
An I-Vetor Bakend for Speaker Verifiation Patrik Kenny, 1 Themos Stafylakis, 1 Jahangir Alam, 1 and Marel Kokmann 2 1 CRIM, Canada, {patrik.kenny, themos.stafylakis, jahangir.alam}@rim.a 2 VoieTrust, Canada,
More informationWhere as discussed previously we interpret solutions to this partial differential equation in the weak sense: b
Consider the pure initial value problem for a homogeneous system of onservation laws with no soure terms in one spae dimension: Where as disussed previously we interpret solutions to this partial differential
More informationNonreversibility of Multiple Unicast Networks
Nonreversibility of Multiple Uniast Networks Randall Dougherty and Kenneth Zeger September 27, 2005 Abstrat We prove that for any finite direted ayli network, there exists a orresponding multiple uniast
More informationBayesian Optimization Under Uncertainty
Bayesian Optimization Under Unertainty Justin J. Beland University o Toronto justin.beland@mail.utoronto.a Prasanth B. Nair University o Toronto pbn@utias.utoronto.a Abstrat We onsider the problem o robust
More informationSimplified Buckling Analysis of Skeletal Structures
Simplified Bukling Analysis of Skeletal Strutures B.A. Izzuddin 1 ABSRAC A simplified approah is proposed for bukling analysis of skeletal strutures, whih employs a rotational spring analogy for the formulation
More informationAnalysis of discretization in the direct simulation Monte Carlo
PHYSICS OF FLUIDS VOLUME 1, UMBER 1 OCTOBER Analysis of disretization in the diret simulation Monte Carlo iolas G. Hadjionstantinou a) Department of Mehanial Engineering, Massahusetts Institute of Tehnology,
More informationCoding for Random Projections and Approximate Near Neighbor Search
Coding for Random Projetions and Approximate Near Neighbor Searh Ping Li Department of Statistis & Biostatistis Department of Computer Siene Rutgers University Pisataay, NJ 8854, USA pingli@stat.rutgers.edu
More informationA Spatiotemporal Approach to Passive Sound Source Localization
A Spatiotemporal Approah Passive Sound Soure Loalization Pasi Pertilä, Mikko Parviainen, Teemu Korhonen and Ari Visa Institute of Signal Proessing Tampere University of Tehnology, P.O.Box 553, FIN-330,
More informationA simple expression for radial distribution functions of pure fluids and mixtures
A simple expression for radial distribution funtions of pure fluids and mixtures Enrio Matteoli a) Istituto di Chimia Quantistia ed Energetia Moleolare, CNR, Via Risorgimento, 35, 56126 Pisa, Italy G.
More informationComputer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1
Computer Siene 786S - Statistial Methods in Natural Language Proessing and Data Analysis Page 1 Hypothesis Testing A statistial hypothesis is a statement about the nature of the distribution of a random
More informationActive Learning of Features and Labels
Ative Learning of Features and Labels Balaji Krishnapuram BALAJI.KRISHNAPURAM@SIEMENS.COM Siemens Medial Solutions USA, 51 Valley Stream Parkway, Malvern, PA 19355 David Williams DPW@EE.DUKE.EDU Ya Xue
More informationmax min z i i=1 x j k s.t. j=1 x j j:i T j
AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be
More informationRobust Recovery of Signals From a Structured Union of Subspaces
Robust Reovery of Signals From a Strutured Union of Subspaes 1 Yonina C. Eldar, Senior Member, IEEE and Moshe Mishali, Student Member, IEEE arxiv:87.4581v2 [nlin.cg] 3 Mar 29 Abstrat Traditional sampling
More informationChapter 8 Hypothesis Testing
Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two
More informationDesign and Development of Three Stages Mixed Sampling Plans for Variable Attribute Variable Quality Characteristics
International Journal of Statistis and Systems ISSN 0973-2675 Volume 12, Number 4 (2017), pp. 763-772 Researh India Publiations http://www.ripubliation.om Design and Development of Three Stages Mixed Sampling
More informationWavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013
Ultrafast Pulses and GVD John O Hara Created: De. 6, 3 Introdution This doument overs the basi onepts of group veloity dispersion (GVD) and ultrafast pulse propagation in an optial fiber. Neessarily, it
More informationDistributed Gaussian Mixture Model for Monitoring Multimode Plant-wide Process
istributed Gaussian Mixture Model for Monitoring Multimode Plant-wide Proess Jinlin Zhu, Zhiqiang Ge, Zhihuan Song. State ey Laboratory of Industrial Control Tehnology, Institute of Industrial Proess Control,
More informationMeasuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach
Measuring & Induing Neural Ativity Using Extraellular Fields I: Inverse systems approah Keith Dillon Department of Eletrial and Computer Engineering University of California San Diego 9500 Gilman Dr. La
More informationConvergence of reinforcement learning with general function approximators
Convergene of reinforement learning with general funtion approximators assilis A. Papavassiliou and Stuart Russell Computer Siene Division, U. of California, Berkeley, CA 94720-1776 fvassilis,russellg@s.berkeley.edu
More informationThe Hanging Chain. John McCuan. January 19, 2006
The Hanging Chain John MCuan January 19, 2006 1 Introdution We onsider a hain of length L attahed to two points (a, u a and (b, u b in the plane. It is assumed that the hain hangs in the plane under a
More informationModeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems
009 9th IEEE International Conferene on Distributed Computing Systems Modeling Probabilisti Measurement Correlations for Problem Determination in Large-Sale Distributed Systems Jing Gao Guofei Jiang Haifeng
More informationOn Certain Singular Integral Equations Arising in the Analysis of Wellbore Recharge in Anisotropic Formations
On Certain Singular Integral Equations Arising in the Analysis of Wellbore Reharge in Anisotropi Formations C. Atkinson a, E. Sarris b, E. Gravanis b, P. Papanastasiou a Department of Mathematis, Imperial
More informationCommon Trends in European School Populations
Common rends in European Shool Populations P. Sebastiani 1 (1) and M. Ramoni (2) (1) Department of Mathematis and Statistis, University of Massahusetts. (2) Children's Hospital Informatis Program, Harvard
More informationTransformation to approximate independence for locally stationary Gaussian processes
ransformation to approximate independene for loally stationary Gaussian proesses Joseph Guinness, Mihael L. Stein We provide new approximations for the likelihood of a time series under the loally stationary
More informationCase I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1
MUTLIUSER DETECTION (Letures 9 and 0) 6:33:546 Wireless Communiations Tehnologies Instrutor: Dr. Narayan Mandayam Summary By Shweta Shrivastava (shwetash@winlab.rutgers.edu) bstrat This artile ontinues
More informationThe transition between quasi-static and fully dynamic for interfaces
Physia D 198 (24) 136 147 The transition between quasi-stati and fully dynami for interfaes G. Caginalp, H. Merdan Department of Mathematis, University of Pittsburgh, Pittsburgh, PA 1526, USA Reeived 6
More informationEE 321 Project Spring 2018
EE 21 Projet Spring 2018 This ourse projet is intended to be an individual effort projet. The student is required to omplete the work individually, without help from anyone else. (The student may, however,
More information10.2 The Occurrence of Critical Flow; Controls
10. The Ourrene of Critial Flow; Controls In addition to the type of problem in whih both q and E are initially presribed; there is a problem whih is of pratial interest: Given a value of q, what fators
More informationA Small Footprint i-vector Extractor
A Small Footprint i-vetor Extrator Patrik Kenny Centre de reherhe informatique de Montréal (CRIM) Patrik.Kenny@rim.a Abstrat Both the memory and omputational requirements of algorithms traditionally used
More informationGrasp Planning: How to Choose a Suitable Task Wrench Space
Grasp Planning: How to Choose a Suitable Task Wrenh Spae Ch. Borst, M. Fisher and G. Hirzinger German Aerospae Center - DLR Institute for Robotis and Mehatronis 8223 Wessling, Germany Email: [Christoph.Borst,
More informationPlanning with Uncertainty in Position: an Optimal Planner
Planning with Unertainty in Position: an Optimal Planner Juan Pablo Gonzalez Anthony (Tony) Stentz CMU-RI -TR-04-63 The Robotis Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Otober
More informationSingular Event Detection
Singular Event Detetion Rafael S. Garía Eletrial Engineering University of Puerto Rio at Mayagüez Rafael.Garia@ee.uprm.edu Faulty Mentor: S. Shankar Sastry Researh Supervisor: Jonathan Sprinkle Graduate
More informationTests of fit for symmetric variance gamma distributions
Tests of fit for symmetri variane gamma distributions Fragiadakis Kostas UADPhilEon, National and Kapodistrian University of Athens, 4 Euripidou Street, 05 59 Athens, Greee. Keywords: Variane Gamma Distribution,
More informationDIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS
CHAPTER 4 DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS 4.1 INTRODUCTION Around the world, environmental and ost onsiousness are foring utilities to install
More informationProduct Policy in Markets with Word-of-Mouth Communication. Technical Appendix
rodut oliy in Markets with Word-of-Mouth Communiation Tehnial Appendix August 05 Miro-Model for Inreasing Awareness In the paper, we make the assumption that awareness is inreasing in ustomer type. I.e.,
More informationGeneral Closed-form Analytical Expressions of Air-gap Inductances for Surfacemounted Permanent Magnet and Induction Machines
General Closed-form Analytial Expressions of Air-gap Indutanes for Surfaemounted Permanent Magnet and Indution Mahines Ronghai Qu, Member, IEEE Eletroni & Photoni Systems Tehnologies General Eletri Company
More informationEvaluation of effect of blade internal modes on sensitivity of Advanced LIGO
Evaluation of effet of blade internal modes on sensitivity of Advaned LIGO T0074-00-R Norna A Robertson 5 th Otober 00. Introdution The urrent model used to estimate the isolation ahieved by the quadruple
More informationLyapunov Exponents of Second Order Linear Systems
Reent Researhes in Computational Intelligene and Information Seurity Lyapunov Exponents of Seond Order Linear Systems ADAM CZORNIK and ALEKSANDER NAWRAT Department of Automati Control Silesian Tehnial
More informationUnderstanding Elementary Landscapes
Understanding Elementary Landsapes L. Darrell Whitley Andrew M. Sutton Adele E. Howe Department of Computer Siene Colorado State University Fort Collins, CO 853 {whitley,sutton,howe}@s.olostate.edu ABSTRACT
More informationCMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017
CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain,
More informationOPPORTUNISTIC SENSING FOR OBJECT RECOGNITION A UNIFIED FORMULATION FOR DYNAMIC SENSOR SELECTION AND FEATURE EXTRACTION
OPPORTUNISTIC SENSING FOR OBJECT RECOGNITION A UNIFIED FORMULATION FOR DYNAMIC SENSOR SELECTION AND FEATURE EXTRACTION Zhaowen Wang, Jianhao Yang, Nasser Nasrabadi and Thomas Huang Bekman Institute, University
More informationWave Propagation through Random Media
Chapter 3. Wave Propagation through Random Media 3. Charateristis of Wave Behavior Sound propagation through random media is the entral part of this investigation. This hapter presents a frame of referene
More informationarxiv: v2 [math.pr] 9 Dec 2016
Omnithermal Perfet Simulation for Multi-server Queues Stephen B. Connor 3th Deember 206 arxiv:60.0602v2 [math.pr] 9 De 206 Abstrat A number of perfet simulation algorithms for multi-server First Come First
More informationRobust Flight Control Design for a Turn Coordination System with Parameter Uncertainties
Amerian Journal of Applied Sienes 4 (7): 496-501, 007 ISSN 1546-939 007 Siene Publiations Robust Flight ontrol Design for a urn oordination System with Parameter Unertainties 1 Ari Legowo and Hiroshi Okubo
More informationCONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION. Smail Mahdi
Serdia Math. J. 30 (2004), 55 70 CONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION Smail Mahdi Communiated by N. M. Yanev Abstrat. A two-sided onditional onfidene interval
More information22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E')
22.54 Neutron Interations and Appliations (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E') Referenes -- J. R. Lamarsh, Introdution to Nulear Reator Theory (Addison-Wesley, Reading, 1966),
More informationNormative and descriptive approaches to multiattribute decision making
De. 009, Volume 8, No. (Serial No.78) China-USA Business Review, ISSN 57-54, USA Normative and desriptive approahes to multiattribute deision making Milan Terek (Department of Statistis, University of
More informationSome recent developments in probability distributions
Proeedings 59th ISI World Statistis Congress, 25-30 August 2013, Hong Kong (Session STS084) p.2873 Some reent developments in probability distributions Felix Famoye *1, Carl Lee 1, and Ayman Alzaatreh
More informationUniversity of Groningen
University of Groningen Port Hamiltonian Formulation of Infinite Dimensional Systems II. Boundary Control by Interonnetion Mahelli, Alessandro; van der Shaft, Abraham; Melhiorri, Claudio Published in:
More informationPredicting the confirmation time of Bitcoin transactions
Prediting the onfirmation time of Bitoin transations D.T. Koops Korteweg-de Vries Institute, University of Amsterdam arxiv:189.1596v1 [s.dc] 21 Sep 218 September 28, 218 Abstrat We study the probabilisti
More informationDevelopment of Fuzzy Extreme Value Theory. Populations
Applied Mathematial Sienes, Vol. 6, 0, no. 7, 58 5834 Development of Fuzzy Extreme Value Theory Control Charts Using α -uts for Sewed Populations Rungsarit Intaramo Department of Mathematis, Faulty of
More informationDeveloping Excel Macros for Solving Heat Diffusion Problems
Session 50 Developing Exel Maros for Solving Heat Diffusion Problems N. N. Sarker and M. A. Ketkar Department of Engineering Tehnology Prairie View A&M University Prairie View, TX 77446 Abstrat This paper
More informationChapter 2. Conditional Probability
Chapter. Conditional Probability The probabilities assigned to various events depend on what is known about the experimental situation when the assignment is made. For a partiular event A, we have used
More informationLightpath routing for maximum reliability in optical mesh networks
Vol. 7, No. 5 / May 2008 / JOURNAL OF OPTICAL NETWORKING 449 Lightpath routing for maximum reliability in optial mesh networks Shengli Yuan, 1, * Saket Varma, 2 and Jason P. Jue 2 1 Department of Computer
More informationGeneral Equilibrium. What happens to cause a reaction to come to equilibrium?
General Equilibrium Chemial Equilibrium Most hemial reations that are enountered are reversible. In other words, they go fairly easily in either the forward or reverse diretions. The thing to remember
More informationOn the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION
5188 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 [8] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood estimation from inomplete data via the EM algorithm, J.
More informationPerturbation Analyses for the Cholesky Factorization with Backward Rounding Errors
Perturbation Analyses for the holesky Fatorization with Bakward Rounding Errors Xiao-Wen hang Shool of omputer Siene, MGill University, Montreal, Quebe, anada, H3A A7 Abstrat. This paper gives perturbation
More informationUTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker.
UTC Engineering 329 Proportional Controller Design for Speed System By John Beverly Green Team John Beverly Keith Skiles John Barker 24 Mar 2006 Introdution This experiment is intended test the variable
More informationTaste for variety and optimum product diversity in an open economy
Taste for variety and optimum produt diversity in an open eonomy Javier Coto-Martínez City University Paul Levine University of Surrey Otober 0, 005 María D.C. Garía-Alonso University of Kent Abstrat We
More informationSINCE Zadeh s compositional rule of fuzzy inference
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 6, DECEMBER 2006 709 Error Estimation of Perturbations Under CRI Guosheng Cheng Yuxi Fu Abstrat The analysis of stability robustness of fuzzy reasoning
More informationThe gravitational phenomena without the curved spacetime
The gravitational phenomena without the urved spaetime Mirosław J. Kubiak Abstrat: In this paper was presented a desription of the gravitational phenomena in the new medium, different than the urved spaetime,
More informationSimplification of Network Dynamics in Large Systems
Simplifiation of Network Dynamis in Large Systems Xiaojun Lin and Ness B. Shroff Shool of Eletrial and Computer Engineering Purdue University, West Lafayette, IN 47906, U.S.A. Email: {linx, shroff}@en.purdue.edu
More informationMarket Segmentation for Privacy Differentiated Free Services
1 Market Segmentation for Privay Differentiated Free Servies Chong Huang, Lalitha Sankar arxiv:1611.538v [s.gt] 18 Nov 16 Abstrat The emerging marketplae for online free servies in whih servie providers
More informationExploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake
Exploring the feasibility of on-site earthquake early warning using lose-in reords of the 2007 Noto Hanto earthquake Yih-Min Wu 1 and Hiroo Kanamori 2 1. Department of Geosienes, National Taiwan University,
More informationSolving Constrained Lasso and Elastic Net Using
Solving Constrained Lasso and Elasti Net Using ν s Carlos M. Alaı z, Alberto Torres and Jose R. Dorronsoro Universidad Auto noma de Madrid - Departamento de Ingenierı a Informa tia Toma s y Valiente 11,
More informationA Functional Representation of Fuzzy Preferences
Theoretial Eonomis Letters, 017, 7, 13- http://wwwsirporg/journal/tel ISSN Online: 16-086 ISSN Print: 16-078 A Funtional Representation of Fuzzy Preferenes Susheng Wang Department of Eonomis, Hong Kong
More information