Factorized Asymptotic Bayesian Inference for Mixture Modeling

Size: px
Start display at page:

Download "Factorized Asymptotic Bayesian Inference for Mixture Modeling"

Transcription

1 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Ryohei Fujimaki NEC Laboratories Ameria Department of Media Analytis Satoshi Morinaga NEC Corporation Information and Media Proessing Researh Laboratories Abstrat This paper proposes a novel Bayesian approximation inferene method for mixture modeling. Our key idea is to fatorize marginal log-likelihood using a variational distribution over latent variables. An asymptoti approximation, a fatorized information riterion (FIC), is obtained by applying the Laplae method to eah of the fatorized omponents. In order to evaluate FIC, we propose fatorized asymptoti Bayesian inferene (FAB), whih maximizes an asymptotially-onsistent lower bound of FIC. FIC and FAB have several desirable properties: 1) asymptoti onsisteny with the marginal log-likelihood, 2) automati omponent seletion on the basis of an intrinsi shrinkage mehanism, and 3) parameter identifiability in mixture modeling. Experimental results show that FAB outperforms state-of-the-art VB methods. 1 Introdution Model seletion is one of the most diffiult and important hallenges in mahine learning problems, in whih we optimize a model representation as well as model parameters. Bayesian learning provides a natural and sophistiated way to address the issue [1, 3]. A entral task in Bayesian model seletion is evaluation of marginal log-likelihood. Sine exat evaluation is often omputationally and analytially intratable, a number of approximation algorithms have been studied. One well-known preursor is Bayes information rite- Appearing in Proeedings of the 15 th International Conferene on Artifiial Intelligene and Statistis (AISTATS) 2012, La Palma, Canary Islands. Volume XX of JMLR: W&CP XX. Copyright 2012 by the authors. 400 rion (BIC) [18], an asymptoti seond-order approximation using the Laplae method. Sine BIC assumes regularity onditions that ensure the asymptoti normality of the maximum likelihood (ML) estimator, it loses its theoretial justifiation for non-regular models, inluding mixture models, hidden Markov models, multi-layer neural networks, et. Further, it is known that the ML estimation does not have a unique solution for non-identifiable models in whih the mapping between parameters and funtions is not one-toone [22, 23] (suh equivalent models are said to be in an equivalent lass). For suh non-singular and nonidentifiable models, the generalization error of the ML estimator signifiantly degrades [24]. Among reent advaned Bayesian methods, we fous on variational Bayesian (VB) inferene. A VB method maximizes a omputationally-tratable lower bound of the marginal log-likelihood by applying variational distributions on latent variables and parameters. Previous studies have investigated VB methods for a variety of models [2, 6, 5, 12, 15, 16, 20] and have demonstrated their pratiality. Further, theoretial studies [17, 21] have shown that VB methods an resolve the non-regularity and non-identifiability issues. One disadvantage, however, is that latent variables and model parameters are partitioned into sub-groups that are independent of eah other in variational distributions. While this independene makes omputation tratable, the lower bound an be loose sine their dependeny is essential in the true distribution. This paper proposes a novel Bayesian approximation inferene method for a family of mixture models whih is one of the most signifiant example of non-regular and non-identifiable model families. Our ideas and ontributions are summarized as follows: Fatorized Information Criterion We derive a new approximation of marginal log-likelihood and refer to it as a fatorized information riterion (FIC). A key observation is that, using a variational distribution over latent variables, the marginal log-likelihood an

2 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling be rewritten as a fatorized representation in whih the Laplae approximation is appliable to eah of the fatorized omponents. FIC is unique in the sense that it takes into aount dependenies among latent variables and parameters, and FIC is asymptotially onsistent with the marginal log-likelihood. Fatorized Asymptoti Bayesian Inferene We propose a new approximation inferene algorithm and refer to it as a fatorized asymptoti Bayesian inferene (FAB). Sine FIC is defined on the basis of both observed and latent variables, the latter of whih are not available, FAB maximizes a lower bound of FIC through an iterative optimization similar to the EM algorithm [8] and VB methods [2]. This FAB optimization is proved to monotonially inrease the lower bound of FIC. It is worth noting that FAB provides a natural way to ontrol model omplexity not only in terms of the number of omponents but also in terms of the types of individual omponents (e.g., degrees of omponents in a polynomial urve mixture (PCM).) FIC and FAB offer the following desirable properties: Gaussian mixture models (GMM), PCM, autoregressive mixture models) satisfy A1. A model M of p(x θ) is speified by C and models S of p (X φ ), i.e., M as M = (C, S 1,..., S C ). Although the representations of θ and φ depend on M and S, respetively, we omit them for notational simpliity. Let us denote a latent variable of X as Z = (Z 1,..., Z C ). Z is a omponent assignment vetor, and Z = 1 if X is generated from the -th omponent, and Z = 0 otherwise. The marginal distribution on Z and the onditional distribution on X given Z an be desribed as p(z α) = C =1 αz and p(x Z, φ ) = C =1 (p (X φ )) Z. The observed data and their latent variables are denoted as x N = x 1,..., x N and z N = z 1,..., z N, respetively, where z n = (z n1,..., z nc ) and z N = (z 1,..., z N ). We make another assumption that log P (X, Z θ) < holds (A2). Condition A2 is disussed in Setion Fatorized Information Criterion for Mixture Models 1. The FIC lower bound is asymptotially onsistent with FIC, and therefore with the marginal loglikelihood. 2. FAB automatially selets omponents on the basis of its intrinsi shrinkage mehanism different from prior-based regularization. FAB therefore mitigates overfitting even if it ignores the prior effet in the asymptoti sense, or even if we apply a non-informative prior [13]. 3. FAB addresses the non-identifiability issue. In an equivalent lass, FAB automatially selets the model whih maximizes the entropy of distributions for latent variables. 2 Preliminaries This paper onsiders mixture models p(x θ) = C =1 α p (X φ ) for a D-dimensional random variable X. C and α = (α 1,..., α C ) are the number of omponents and the mixture ratio. θ is θ = (α, φ 1,..., φ C ). We allow different omponents p (X φ ) to be different in their model representations from one another 1. We assume that p (X φ ) satisfies the regularity onditions 2 while p(x θ) is non-regular (A1). Assumption A1 is less stringent than the regularity onditions on p(x θ), and many models (e.g., 1 So-alled heterogeneous mixture models [11]. In the example of PCM, the degree of p 1(X φ 1 ) will be two while that of p 2(X φ 2 ) will one. 2 The Fisher information matrix of p (X φ ) is nonsingular around the maximum likelihood estimator. 401 A Bayesian selets the model whih maximizes the model posterior p(m x N ) p(m)p(x N M). With uniform model prior p(m), we are partiularly interested in p(x N M), whih is referred to as marginal likelihood or Bayesian evidene. VB methods for latent variable models [2, 5, 6, 21] onsider a lower bound of the marginal log-likelihood to be: log p(x N M) q(z N )q θ (θ) log p(xn θ)p(θ M) q(z N dθ, )q θ (θ) z N (1) where q and q θ are variational distributions on z N and θ, respetively. On q and q θ, z N and θ are assumed to be independent of eah other in order to make the lower bound omputationally and analytially tratable. This ignores, however, signifiant dependeny between z N and θ, and basially the equality does not hold. In ontrast to this, we onsider the lower bound on q(z N ) to be: log p(x N M) ( p(x q(z N N, z N M) ) ) log q(z N ) z N (2) VLB(q, x N, M), (3) Lemma 1 [2] guarantees that max q {VLB(q, x N, M)} is exatly onsistent with log p(x N M). Lemma 1 The inequality (3) holds for an arbitrary distribution q on z N, and the equality is satisfied by q(z N ) = p(z N M, x N ).

3 Ryohei Fujimaki, Satoshi Morinaga Sine this paper handles mixture models, we further assume mutual independene of z N, i.e. q(z N ) = C =1 q(zn ) and q(z N ) = N q(z n) zn. The lower bound (2) is the same with that the ollapse variational Bayesian (CVB) method [15, 20] onsiders. A key differene is that FAB employs an asymptoti seond approximation of (2), whih produes several desirable properties of FAB whih we have desribed in Setion 1, while CVB methods employ seond oder approximations of variational parameters in eah iterative update step. Note that the numerator of (3) has the form of the parameter integration of p(x N, z N M), as: C p(x N, z N M) = p(z N α) p (x N z N, φ )p(θ M)dθ. =1 (4) A key idea in FIC is to apply the Laplae method to the individual fatorized distributions p(z N α) and p (x N z N, φ ) as follows 3 : [ FZ, (α ᾱ)] log p(x N,z N θ) log p(x N, z N θ) N 2 C N z [ n F, (φ 2 φ )], (5) =1 Then, by applying a prior of log p(θ M) = O(1), p(x N, z N M) an be asymptotially approximated as: (2π) Dα/2 p(x N, z N M) p(x N, z N θ) N Dα/2 F (9) Z 1/2 C =1 (2π) D/2 ( N z n) D/2 F 1/2 Here, D α D(α) = C 1 and D D(φ ), in whih D( ) is the dimensionality of. On the basis of A1 and Lemma 2, both log F 1/2 and log F Z 1/2 are O(1). By substituting (9) into (3) and ignoring asymptotially small terms, we obtain an asymptoti approximation of log p(x N M) = max q {VLB(q, x N, M)} (i.e., FIC) as follows: F IC(x N,M) = max q J (q, θ, x N ) = ( q(z N ) z N C =1 D { J (q, θ, } x N ) 2 log( N (10) log p(x N, z N θ) D α 2 log N )} z n ) log q(z N ) The following theorem justifies FIC as an approximation of the marginal log-likelihood. where [A, a] represents the quadrati form a T Aa for a matrix A and a vetor a. Here we denote the ML estimator of log p(x N, z N θ) as θ = (ᾱ, φ 1,..., φ C ). We disuss appliation of the Laplae method around the maximum a priori (MAP) estimator in Setion 4.5. F Z and F are fatorized sample approximations of Fisher information matries defined as follows: F Z = 1 2 log p(z N α) α=ᾱ N α α T, (6) 1 F 2 log p (x N z N, φ = ) N z n φ φ T. φ = φ The following lemma is satisfied for F Z and F. Lemma 2 With N, F Z and F respetively onverge to the Fisher information matries desribed as: 2 log p(z α) F Z = α α T p(z α)dz, (7) α=ᾱ 2 log p (X φ F = ) φ φ T p (X φ )dx. (8) φ = φ The proof follows the law of large numbers by taking into aount that the effetive number of samples for the -th omponent is N z n. 3 The Laplae approximation is not appliable to p(x N θ) beause of its non-regularity. 402 Theorem 3 F IC(x N, M) is asymptotially onsistent with log p(x N M) under A1 and log p(θ M) = O(1). Sketh of Proof 3 Sine both p (x N z N, φ ) and p(z N α) satisfy the regularity ondition (A1 and A2), their Laplae approximations appearing in the left side of (9) have asymptoti onsisteny (for the same reason as with BIC [7, 18, 19]). Therefore, their produt (9) asymptotially agrees with p(x N, z N M). By applying Lemma 1, this theorem is proved. Despite our ignoring the prior effet, FIC still has the regularization term D z N q(zn ) log( N z n)/2. This term has several interesting properties. First, a dependeny between z N and θ, whih most of VB methods ignore, expliitly appears. Like BIC, it is not the parameter or funtion representation of p(x φ ) but only its dimensionality D that plays an important role in the bridge between the latent variables and the parameters. Seond, the omplexity of the -th omponent is adjusted by the value of z N q(zn ) log( N z n). Therefore, omponents of different sizes are automatially regularized in their individual appropriate levels. Third, roughly speaking, with respet to the terms, it is preferable for the entropy of q(z N ) to be large. This makes the parameter estimation in FAB identifiable. More details are disussed in Setions 4.4 and 4.5.

4 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling 4 Fatorized Asymptoti Bayesian Inferene Algorithm 4.1 Lower bound of FIC Sine θ is not available in pratie, we annot evaluate FIC itself. Instead, FAB maximizes an asymptotiallyonsistent lower bound for FIC. We firstly derive the lower bound as follows: Lemma 4 Let us define L(a, b) log b + (a b)/b. F IC(x N, M) is lower bounded as follows: F IC(x N, M) G(q, q, θ, x N ) (11) ( q(z N ) log p(x N, z N θ) D α 2 log N z N C N D 2 L( N z n, =1 ) q(z n )) log q(z N ), with arbitrary hoies of q, θ and a distribution q on z N ( N q(z n) > 0). Sketh of Proof 4 Sine θ is the ML estimator of log p(x N, z N θ), log p(x N, z N θ) log p(x N, z N θ) is satisfied. Further, on the basis of the onavity of the logarithm funtion, log( N z n) L( N z n, N q(z n)) is satisfied. By substituting these two inequalities into (10), we obtain (11). In addition to the replaement of θ with the unavailable estimator θ, a omputationally intratable log( N z n) is linearized around N q(z n) with a new parameter (distribution) q. Now our problem is to solve the following maximization problem (note that q, θ and q are funtions of M): M, q, θ, q = arg max M,q,θ, q G(q, q, θ, xn ). (12) The following lemma gives us a guide for optimizing the newly introdued distribution q. We omit the proof beause of spae limitations. Lemma 5 If we fix q and θ, then q = q maximizes G(q, q, θ, x N ). With a finite number of model andidates M, we an use a two-stage algorithm in whih we first solve the maximization of (12) for all andidates in M, and then selet the best model. When we optimize only the number of omponents C (e.g., as in a GMM), we need to solve the inner maximization C max times (the maximum searh number of omponents.) However, if we intend to optimize S 1,..., S C (e.g., as in PCM or an autoregressive mixture), we must avoid a ombinatorial salability issue. FAB provides a natural way to do this beause the objetive funtion is separated into independent parts in terms of S 1,..., S C Iterative Optimization Algorithm Let us first fix C, and onsider the following optimization problem: S, q, θ, q = arg max S,q,θ, q G(q, q, θ, xn ), (13) where S = (S 1,..., S C ). As the inner maximization, FAB solves (13) on the basis of iterations of two substeps (V-step and M-step). Let the supersription (t) represent the t-th iteration. V step The V-step optimizes the variational distribution q as follows: { } q (t) = arg max G(q, q = q (t 1), θ (t 1), x N ). (14) q More speifially, q (t) is obtained as follows: q (t) (z n ) α (t 1) p(x n φ (t 1) ) exp( D 2α (t 1) N ), (15) where we use N q(t 1) (z n ) = α (t 1) N. The regularization terms whih we disussed in Setion 3 appear here as exponentiated update terms, i.e., D exp( ). Roughly speaking, (15) indiates that 2α (t 1) N 2α (t 1) N smaller and more omplex omponents are likely to beome smaller through the iterations. The V-step is different from the E-step of the EM algorithm in D the term exp( ). This makes an essential differene between FAB inferene and the ML estimation that employs the EM algorithm (Setions 4.4 and 4.5). M step The M-step optimizes the models of individual omponents S and the parameter θ as follows: S (t), θ (t) = arg max S,θ G(q(t), q = q (t), θ, x N ). (16) G(q, q, θ, x N ) has a signifiant advantage in its avoiding of the ombinatorial salability issue on S seletion. Sine G(q, q, θ, x N ) has no ross term between omponents (if we fix q), we an separately optimize S and θ as follows: α (t) = N q (t) (z n )/N, (17) S (t), φ (t) = arg max H (q (t), q (t), φ, x N ) (18) S,φ H (q (t), q (t), φ,x N ) = N q (t) (z n ) log p(x n φ ) D N 2 L( N q (t) (z n ), q (t) (z n )).

5 Ryohei Fujimaki, Satoshi Morinaga H (q (t), q (t), φ, x N ) is a part of G(q (t), q (t), φ, x N ), whih is related to the -th omponent. With a finite set of omponent andidates, in (18), we first optimize φ for eah element of a fixed S and then selet the optimal one by omparing them. Note that, if we use a single omponent type (e.g., GMM), our M-step eventually beomes the same as that in the EM algorithm. 4.3 Convergene and Consisteny After the t-th iteration, FIC is lower bounded as: F IC(x N, M) F IC (t) LB (xn, M) G(q (t), q (t), θ (t), x N ). (19) We have the following theorem whih guarantees that FAB will monotonially inrease F IC (t) LB (xn, M) over the iterative optimization and also onverge to the loal minima under ertain regularity onditions. Theorem 6 For the iteration of the V-step and the M-step, the following inequality is satisfied: F IC (t) LB (xn, M) F IC (t 1) LB (xn, M). (20) Sketh of Proof 6 The theorem is proved as follows: G(q (t), q (t), θ (t), x N ) G(q (t), q (t), θ (t 1), x N ) G(q (t), q (t 1), θ (t 1), x N ) G(q (t 1), q (t 1), θ (t 1), x N ). The first and the third inequalities arise from (16) and (14), respetively. The seond inequality arises from Lemma 5. We then use the following stopping riterion for the FAB iteration steps: F IC (t) LB (xn, M) F IC (t 1) LB (xn, M) ε, (21) where ε is an optimization tolerane parameter and is set to 1e-6 in Setion 5. In addition to the monotoni property of FAB, we present the theorem below, whih asymptotitheoretially supports FAB. Let us denote M (t) = (C, S (t) ) and let the supersriptions T and represent the number of steps at onvergene and the true model/parameters, respetively. Theorem 7 F IC (T ) LB (xn, M (T ) ) is asymptotially onsistent with F IC(x N, M (T ) ). Sketh of Proof 7 1) In (15), exp( D ) onverges to one, and S (T 1) = S (T ) holds without 2α (T ) N loss of generality. Therefore, the FAB algorithm is asymptotially redued to the EM algorithm. This means θ (T ) onverges to the ML estimator ˆθ of 404 log p(x N θ). Then, z N q(t ) (z N )(log p(x N, z N θ) log p(x N, z N θ (T ) )) /N 0 holds. 2) On the basis of the law of large numbers, N z n/n onverges to α. Then, z N q(t ) (z N ) C =1 (log( N z n) log( N q(t ) (z n ))) /N 0 holds. The substitution of 1) and 2) into (11) proves the theorem. The following theorem arises from Theorems 3 and 7 and guarantees that FAB will be asymptotially apable of evaluating the marginal log-likelihood itself: Theorem 8 F IC (T ) LB (xn, M (T ) ) is asymptotially onsistent with log p(x N M (T ) ). 4.4 Shrinkage Mehanism As has been noted in Setions 3 and 4.2, the term exp(d /2α (t) N) in (15), whih arises from D z N q(zn ) log( N z n)/2 in (10), plays a signifiant role in the ontrol of model omplexity. Fig.1 illustrates the shrinkage effets of FAB in the ase of of a GMM. For D = 10 and N = 50, for example, a omponent of α < 0.2 is severely regularized, and suh omponents are automatially shrunk (see the left graph.) With inreasing N, the shrinkage effet dereases, and small omponents manage to survive in the FAB optimization proess. The right graph shows the shrinkage effet in terms of the dimensionality. In a low dimensional spae (e.g., D = 1, 3), small omponents are not stritly regularized. For D = 20 and N = 100, however, a omponent of α < 0.4 may be shrunk, and only one omponent might survive beause of the onstraint C =1 α. Let us onsider the gradient of exp( D 2α N ) whih is derived as exp( D 2α )/ α N = D D α 2 N exp( 2α ) 0. N This indiates that the iteration of FAB aelerates the shrinkages in order of O(α 2 ). More intuitively speaking, the smaller the omponent is, the more likely it is to be shrunk. The above observations explain the overfitting mitigation mehanism in FAB. The shrinkage effet of FAB is different from the priorbased regularization of the MAP estimation. In fat, it appears despite our asymptotially ignoring of the prior effet, and even if a non-informative prior or a flat prior is applied. In terms of pratial implementation, α does not get to exatly zero beause (15) takes an exponentiated update. Therefore, in our FAB proedure, we apply a thresholding-based shrinkage of small omponents, after the V-step, as follows: { N q (t) 0 if (z n ) = q(t) (z n ) < δ q (t) (z n )/Q (t) (22) otherwise

6 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Figure 1: Shrinkage effets of FAB in GMM (D = D + D(D +1)/2, where D and D are, respetively, the data dimensionality and the parameter dimensionality.) The horizontal and vertial axes represent α and exp( D /(α N)), respetively. The effet is evident when the number of data is not suffiient for the parameter dimensionality. Algorithm 1 F AB two : Two-stage FAB input : x N, C max, S, ε output : C, S, θ, q (z N ), F ICLB 1: F ICLB = 2: for C = 1,..., C max do 3: Calulate F IC (T ) LB, C, S(T ), θ (T ), and q (T ) (z N ) by F AB shrink (x N, C max = C, S, ε, δ = 0). 4: end for 5: Choose C, S, θ and q (z N ) by (12). δ and Q (t) are a threshold value and a normalization onstant for C =1 q(t) (z n ) = 1, respetively. We used the threshold value δ = 0.01N in Setion 5. With the shrinkage operation, FAB does not require the twostage optimization in (12). By starting from a suffiient number of omponents C max, FAB iteratively and simultaneously optimizes all of M = (C, S), θ, and q. Sine FAB annot revive shrunk omponents, it is a greedy algorithm, like the least angle regression method [9] whih does not revive shrunk features. The two-stage algorithm and the shrinkage algorithm are shown here as Algorithms 1 and 2. S is a set of omponent andidates (e.g., S = {Gaussian} for GMM. S = {0,..., K max } for PCM where K max is the maximum degree of urves.) 4.5 Identifiability A well-known diffiulty in ML estimation for mixture models is non-identiability, as we have noted in Setion 1. Let us onsider a simple mixture model, p(x a, b, ) = ap(x b) + (1 a)p(x ). All models p(x a, b, = b) with arbitrary a are equivalent (i.e., in an equivalent lass.) Therefore, the ML estimator is not unique. Theoretial studies have shown that Bayesian estimators and VB estimators avoid the above issue and signifiantly outperform the ML estimator in terms of generalization error [17, 21, 24]. 405 Algorithm 2 F AB shrink : Shrinkage FAB input : x N, C max, S, ε, δ output : F IC (T ) LB, C, S(T ), θ (T ), q (T ) (z N ) 1: t = 0, F IC (0) LB = 2: randomly initialize q (0) (z N ). 3: while onvergene do 4: Calulate S (t) and θ (t) by (17) and (18). 5: Chek onvergene by (21). 6: Calulate q (t+1) by (15). 7: Shrinkage omponents by (22). 8: t = t + 1 9: end while 10: T = t In FAB, at the stationary (onvergene) point of (15), the following equality is satisfied as N q (z n ) = αn = N α p(x n φ ) exp( D 2α N )/Z n, where Z n = C =1 α p(x n φ ) exp( D 2α N ) is the normalization onstant. For all pairs of α, α > 0, we have the following ondition: N p(x n φ ) exp( D 2α N ) = N p(x n φ ) exp( D 2α N ). Then, the neessary ondition of φ = φ is α = α. Therefore, FAB hooses the model, in an equivalent lass, whih maximizes the entropy of q and addresses the non-identifiability issue. Unfortunately, FIC and FAB do not resolve another non-identifiability issue, i.e., divergene of the riterion. Without the assumption A2, (10) and (11) an diverge to infinity (e.g., a GMM with a zero-variane omponent 4.) We an address this issue by applying the Laplae approximation around the MAP estimator θ MAP of p(x N, z N θ)p(θ) rather than around the ML estimator θ. In suh a ase, a flat onjugate prior might be used in order to avoid the divergene. While suh priors do not provide regularization effets, FAB has an overfitting mitigation mehanism, as has been previously disussed. Beause of spae limitations, we omit here a detailed disussion of FIC and FAB with prior distributions. 5 Experiments 5.1 Polynomial Curve Mixture We first evaluated F AB shrink for PCM to demonstrate how its model seletion works. We used the settings N = 300 and C max = K max = 10 in this evaluation 5. Let Y be a random dependent variable of X. 4 Pratially speaking, with the shrinkage (22), FAB does not have the divergene issue beause small omponents are automatially removed. 5 Larger values of C max and K max did not make a large differene in results, but they were hard to visualize.

7 Ryohei Fujimaki, Satoshi Morinaga Figure 2: Model seletion proedure for F AB shrink in appliation to PCM. The true urves are : 1) Y = 5 + ε, 2) Y = X ε, 3) Y = X + ε, and 4) Y = 0.5X 3 + 2X + ε where ε N (0, 1). The top shows the hange in F IC (t) LB over iteration t. The bottom three graphs are the estimated urves at t = 1, 6, and 44. PCM has the mixed omponent p (Y X, φ ) = N (Y, X(S )β, σ 2 ), where X(S ) = (1, X,..., X S ). The true model has four urves as shown in Fig. 2.We used the setting α = (0.3, 0.2, 0.3, 0.2) for the urves 1) - 4). x N was uniformly sampled from [ 5, 5]. Fig. 2 illustrates F IC (t) LB (top) and intermediate estimation results (bottom). From the top figure, we are able to onfirm that FAB monotonially inreases F IC (t) LB, exept for the points (dashed irles) of the shrinkage operation. In the initial state t = 1 (bottom left), q (0) (z N ) is randomly initialized, and therefore the degrees of all ten urves are zero (S = 0). Over the iteration, F AB shrink simultaneously searhes C, S, and θ. For example, at t = 15 (bottom enter), we have one urve of S = 9, two urves of S = 2, and six urves of S = 0. Here two urves have already been shrunk. Finally, F AB shrink selets the model onsistent with the true model at t = 44 (bottom left). These results demonstrate the powerful simultaneous model seletion proedure of F AB shrink. 5.2 Gaussian Mixture Model We applied F AB two and F AB shrink to GMM in appliation to artifiial datasets and UCI datasets [10], and ompared them with the variational Bayesian EM algorithm for GMM (VBEM) that is desribed in Setion 10.2 of [3] and the variational Bayesian Dirihlet Proess GMM (VBDP) [5]. The former and the latter use a Dirihlet prior and a Dirihlet proess prior for α, respetively. We used the implementations by M. E. Khan 6 for VBEM and by K. Kurihara 7 for VBDP. We used the default hyperparameter values in the software. In the following experiments, C max was set to C max = 20 for all methods Artifiial Data We generated the true model with D = 15, C = 5 and φ = (µ, Σ ) (mean and ovariane), where the supersription denotes the true model. The parameter values were randomly sampled as α [0.4, 0.6] (before normalization), and µ [ 5, 5]. Σ were generated as a Σ, where a [0.5, 1.5] is a sale parameter and Σ is generated using the gallery funtion in Matlab. The results are averages of ten runs. The suffiient number of data is important measure to evaluate asymptoti methods, and this is usually measured in terms of the empirial onvergenes of their riteria. Table 1 shows how the values of F IC (T ) LB /N onverge over N. Roughly speaking, N = was suffiient for onvergene (depending on data dimensionality D.) Our results indiate that FAB an be applied in atual pratie to reent data analysis senarios with large N values. We next ompared the four methods on the basis of three evaluation metris: 1) the Kullbak-Leibler divergene (KL) between the true distribution and the estimated distribution (estimation error), 2) C C (model seletion error), and 3) CPU time. For N 500, VBDP performed better than or omparably with F AB two and F AB shrink w.r.t. KL and C C (left and middle graphs of Fig. 3). With inreasing N, those of the F ABs signifiantly derease while that of VBDP does not. This might be beause the lower bound in (1) generally does not reah the marginal log-likelihood as has been noted. VBEM was signifiantly inferior for both metris. Another observation is that VBDP and VBEM were likely to re-

8 Fatorized Asymptoti Bayesian Inferene for Mixture Modeling Table 1: Convergene of F ICLB over data size N. The standard deviations are in parenthesis. N F AB two (3.51) (2.51) (1.19) (1.05) (1.10) (0.50) (0.39) F AB shrink (2.61) (1.56) (1.24) (1.02) (0.85) (0.74) (0.39) Table 2: Preditive likelihood per data. The standard deviations are in parenthesis. The maximum number of training samples was limited to 2000 (parenthesis in the N olumn), and the rest of them were used for test. The best results and those not signifiantly worse than them are highlighted in boldfae (one-side t-test with 95% onfidene.) data N D F AB two F AB shrink VBEM VBDP iris (0.76) (0.68) (0.64) (0.57) yeast (1.81) (3.38) 3.52 (3.80) (0.72) loud (2.81) 9.09 (2.62) 5.47 (2.62) 2.67 (2.28) wine quality 6497 (2000) (0.22) (0.17) (0.17) (0.06) olor moments (2000) (0.20) (0.21) (0.21) (0.11) oo texture (2000) (0.22) (0.46) (0.13) 6.78 (0.28) spoken arabi digits (2000) (0.11) (0.09) (0.09) (0.19) Figure 3: Comparisons of four methods in terms of KL (left), C C (middle), and 3) CPU time (right). spetively under-fit and to over-fit their models, while we did not find suh biases in F ABs. While KLs of F ABs are similar, F AB shrink performed better in terms of C C beause small omponents survived in F AB two. Further, both F ABs had a signifiant advantage over VBDP and VBEM in CPU time (right graph). In partiular, while the CPU time of VBDP grew explosively over N, those of F ABs remained in a pratial range. Sine F AB shrink does not use a loop to optimize C, it was signifiantly faster than F AB two UCI data For the UCI datasets summarized in Table 2, we do not know their true distributions and therefore employed preditive log-likelihood as an evaluation metri. For large sale datasets, we randomly seleted two thousands data for training and used the rest of the data for predition beause of the salability issue in VBDP. As shown in Table 2, F ABs gave better performane then VBDP and VBEM with a suffiient number of data (the sale O(10 3 ) agrees with the results in the previous setion). The trend of VBDP (under-fit) and VBEM (over-fit) was the same with the previous setion. Also, while the preditive log-likelihood of F AB two and F AB shrink are ompetitive, F AB shrink 407 obtained more ompat models (smaller C values) than F AB two. Unfortunately, we have no spae to show the results here. 6 Summary and Future Work We have proposed approximation of marginal loglikelihood (FIC) and an inferene method (FAB) for Bayesian model seletion for mixture models, as an alternative to VB inferene. We have given their justifiations (asymptoti onsisteny, onvergene, et) and analyzed FAB mehanisms in terms of overfitting mitigation (shrinkage) and identifiability. Experimental results have shown that FAB outperforms state-ofthe-art VB methods for a pratial number of data in terms of both model seletion performane and omputational effiieny. Our next step is extensions of FAB to more general non-regular and non-identifiable models whih ontain ontinuous latent variables (e.g., fator analyzer models [12] and matrix fatorization models [17]), timedependent latent variables (e.g., hidden Markov models [16]), hierarhial latent variables [14, 4], et. We believe that our idea, fatorized representation of the marginal log-likelihood in whih the asymptoti approximation works, is widely appliable to them.

9 Ryohei Fujimaki, Satoshi Morinaga Referenes [1] T. Ando. Bayesian Model Seletion and Statistial Modeling. Chapman and Hall/CRC, [2] M. Beal. Variational Algorithms for Approximate Bayesian Inferene. PhD thesis, Gatsby Computational Neurosiene Unit, University College London, [3] C. M. Bishop. Pattern Reognition and Mahine Learning. Springer-Verlag, [4] C. M. Bishop and M. E. Tipping. A hierarhial latent variable model for data visualization. IEEE Transations on Pattern Analysis and Mahine Intelligene, 20(3): , [5] D. M. Blei and M. I. Jordan. Variational inferene for Dirihlet proess mixtures. Bayesian Analysis, 1(1): , [6] A. Corduneanu and C. M. Bishop. Variational Bayesian model seletion for mixture distributions. In Proeedings of Artiffiial Intelligene and Statistis, pages 27 34, [7] I. Csiszar and P. C. Shields. The onsisteny of the BIC Markov order estimator. Annals of Statistis, 28(6): , [8] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from imomplete data via the EM algorithm. Journal of the Royal Statistial Soiety, B39(1):1 38, [9] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistis, 32(2): , [10] A. Frank and A. Asunion. UCI mahine learning repository [11] R. Fujimaki, Y. Sogawa, and S. Morinaga. Online heterogeneous mixture modeling with marginal and opula seletion. In Proeedings of ACM SIGKDD Conferene on Knowledge Disovery and Data Mining, pages , [14] M. I. Jordan and R. A. Jaobs. Hierarhial mixtures of experts and the EM algorithm. Neural Computation, 6: , [15] K. Kurihara, M. Welling, and Y. W. Teh. Collapsed variational dirihlet proess mixture models. In Proeedings of Twentieth International Joint Conferene on Artifiial Intelligene, pages , [16] D. J. MaKay. Ensemble learning for hidden Markov models. Tehnial report, University of Cambridge, [17] S. Nakajima and M. Sugiyama. Theoretial analysis of Bayesian matrix fatorization. Journal of Mahine Learning Researh, 12: , [18] G. Shwarz. Estimating the dimension of a model. The Annals of Statistis, 6(2): , [19] M. Stone. Comments on model seletion riteria of Akaike and Shwarz. Journal of the Royal Statistial Soiety: Series B, 41(2): , [20] Y. W. Teh, D. Newman, and M. Welling. A ollapsed variational bayesian inferene algorithm for latent dirihlet alloation. In Advanes in Neural Information Proessing Systems, pages , [21] K. Watanabe and S. Watanabe. Stohasti omplexities of Gaussian mixtures in variational Bayesian approximation. Journal of Mahine Learning Researh, 7: , [22] S. Watanabe. Algebrai analysis for nonidentiable learning mahines. Neural Computation, 13: , [23] S. Watanabe. Algebrai geometry and statistial learning. UK: Cambridge University Press, [24] K. Yamazaki and S. Watanabe. Mixture models and upper bounds of stohasti omplexity. Neural Networks, 16: , [12] Z. Ghahramani and M. Beal. Variational inferene for Bayesian mixtures of fator analysers. In Advanes in Neural Information Proessing Systems 12, pages MIT Press, [13] H. Jeffreys. An invariant form for the prior probability in estimation problems. Proeedings of the Royal Soiety of London. Series A, Mathematial and Physial Sienes, 186: ,

Complexity of Regularization RBF Networks

Complexity of Regularization RBF Networks Complexity of Regularization RBF Networks Mark A Kon Department of Mathematis and Statistis Boston University Boston, MA 02215 mkon@buedu Leszek Plaskota Institute of Applied Mathematis University of Warsaw

More information

Model-based mixture discriminant analysis an experimental study

Model-based mixture discriminant analysis an experimental study Model-based mixture disriminant analysis an experimental study Zohar Halbe and Mayer Aladjem Department of Eletrial and Computer Engineering, Ben-Gurion University of the Negev P.O.Box 653, Beer-Sheva,

More information

Danielle Maddix AA238 Final Project December 9, 2016

Danielle Maddix AA238 Final Project December 9, 2016 Struture and Parameter Learning in Bayesian Networks with Appliations to Prediting Breast Caner Tumor Malignany in a Lower Dimension Feature Spae Danielle Maddix AA238 Final Projet Deember 9, 2016 Abstrat

More information

10.5 Unsupervised Bayesian Learning

10.5 Unsupervised Bayesian Learning The Bayes Classifier Maximum-likelihood methods: Li Yu Hongda Mao Joan Wang parameter vetor is a fixed but unknown value Bayes methods: parameter vetor is a random variable with known prior distribution

More information

Methods of evaluating tests

Methods of evaluating tests Methods of evaluating tests Let X,, 1 Xn be i.i.d. Bernoulli( p ). Then 5 j= 1 j ( 5, ) T = X Binomial p. We test 1 H : p vs. 1 1 H : p>. We saw that a LRT is 1 if t k* φ ( x ) =. otherwise (t is the observed

More information

Maximum Entropy and Exponential Families

Maximum Entropy and Exponential Families Maximum Entropy and Exponential Families April 9, 209 Abstrat The goal of this note is to derive the exponential form of probability distribution from more basi onsiderations, in partiular Entropy. It

More information

Sensitivity Analysis in Markov Networks

Sensitivity Analysis in Markov Networks Sensitivity Analysis in Markov Networks Hei Chan and Adnan Darwihe Computer Siene Department University of California, Los Angeles Los Angeles, CA 90095 {hei,darwihe}@s.ula.edu Abstrat This paper explores

More information

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion

Millennium Relativity Acceleration Composition. The Relativistic Relationship between Acceleration and Uniform Motion Millennium Relativity Aeleration Composition he Relativisti Relationship between Aeleration and niform Motion Copyright 003 Joseph A. Rybzyk Abstrat he relativisti priniples developed throughout the six

More information

The Effectiveness of the Linear Hull Effect

The Effectiveness of the Linear Hull Effect The Effetiveness of the Linear Hull Effet S. Murphy Tehnial Report RHUL MA 009 9 6 Otober 009 Department of Mathematis Royal Holloway, University of London Egham, Surrey TW0 0EX, England http://www.rhul.a.uk/mathematis/tehreports

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilisti Graphial Models David Sontag New York University Leture 12, April 19, 2012 Aknowledgement: Partially based on slides by Eri Xing at CMU and Andrew MCallum at UMass Amherst David Sontag (NYU)

More information

A Unified View on Multi-class Support Vector Classification Supplement

A Unified View on Multi-class Support Vector Classification Supplement Journal of Mahine Learning Researh??) Submitted 7/15; Published?/?? A Unified View on Multi-lass Support Vetor Classifiation Supplement Ürün Doğan Mirosoft Researh Tobias Glasmahers Institut für Neuroinformatik

More information

Error Bounds for Context Reduction and Feature Omission

Error Bounds for Context Reduction and Feature Omission Error Bounds for Context Redution and Feature Omission Eugen Bek, Ralf Shlüter, Hermann Ney,2 Human Language Tehnology and Pattern Reognition, Computer Siene Department RWTH Aahen University, Ahornstr.

More information

Feature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification

Feature Selection by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classification Feature Seletion by Independent Component Analysis and Mutual Information Maximization in EEG Signal Classifiation Tian Lan, Deniz Erdogmus, Andre Adami, Mihael Pavel BME Department, Oregon Health & Siene

More information

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem

Bilinear Formulated Multiple Kernel Learning for Multi-class Classification Problem Bilinear Formulated Multiple Kernel Learning for Multi-lass Classifiation Problem Takumi Kobayashi and Nobuyuki Otsu National Institute of Advaned Industrial Siene and Tehnology, -- Umezono, Tsukuba, Japan

More information

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach

Optimization of Statistical Decisions for Age Replacement Problems via a New Pivotal Quantity Averaging Approach Amerian Journal of heoretial and Applied tatistis 6; 5(-): -8 Published online January 7, 6 (http://www.sienepublishinggroup.om/j/ajtas) doi:.648/j.ajtas.s.65.4 IN: 36-8999 (Print); IN: 36-96 (Online)

More information

A Queueing Model for Call Blending in Call Centers

A Queueing Model for Call Blending in Call Centers A Queueing Model for Call Blending in Call Centers Sandjai Bhulai and Ger Koole Vrije Universiteit Amsterdam Faulty of Sienes De Boelelaan 1081a 1081 HV Amsterdam The Netherlands E-mail: {sbhulai, koole}@s.vu.nl

More information

Sensor management for PRF selection in the track-before-detect context

Sensor management for PRF selection in the track-before-detect context Sensor management for PRF seletion in the tra-before-detet ontext Fotios Katsilieris, Yvo Boers, and Hans Driessen Thales Nederland B.V. Haasbergerstraat 49, 7554 PA Hengelo, the Netherlands Email: {Fotios.Katsilieris,

More information

23.1 Tuning controllers, in the large view Quoting from Section 16.7:

23.1 Tuning controllers, in the large view Quoting from Section 16.7: Lesson 23. Tuning a real ontroller - modeling, proess identifiation, fine tuning 23.0 Context We have learned to view proesses as dynami systems, taking are to identify their input, intermediate, and output

More information

A new initial search direction for nonlinear conjugate gradient method

A new initial search direction for nonlinear conjugate gradient method International Journal of Mathematis Researh. ISSN 0976-5840 Volume 6, Number 2 (2014), pp. 183 190 International Researh Publiation House http://www.irphouse.om A new initial searh diretion for nonlinear

More information

Likelihood-confidence intervals for quantiles in Extreme Value Distributions

Likelihood-confidence intervals for quantiles in Extreme Value Distributions Likelihood-onfidene intervals for quantiles in Extreme Value Distributions A. Bolívar, E. Díaz-Franés, J. Ortega, and E. Vilhis. Centro de Investigaión en Matemátias; A.P. 42, Guanajuato, Gto. 36; Méxio

More information

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION

LOGISTIC REGRESSION IN DEPRESSION CLASSIFICATION LOGISIC REGRESSIO I DEPRESSIO CLASSIFICAIO J. Kual,. V. ran, M. Bareš KSE, FJFI, CVU v Praze PCP, CS, 3LF UK v Praze Abstrat Well nown logisti regression and the other binary response models an be used

More information

A Characterization of Wavelet Convergence in Sobolev Spaces

A Characterization of Wavelet Convergence in Sobolev Spaces A Charaterization of Wavelet Convergene in Sobolev Spaes Mark A. Kon 1 oston University Louise Arakelian Raphael Howard University Dediated to Prof. Robert Carroll on the oasion of his 70th birthday. Abstrat

More information

Control Theory association of mathematics and engineering

Control Theory association of mathematics and engineering Control Theory assoiation of mathematis and engineering Wojieh Mitkowski Krzysztof Oprzedkiewiz Department of Automatis AGH Univ. of Siene & Tehnology, Craow, Poland, Abstrat In this paper a methodology

More information

Weighted K-Nearest Neighbor Revisited

Weighted K-Nearest Neighbor Revisited Weighted -Nearest Neighbor Revisited M. Biego University of Verona Verona, Italy Email: manuele.biego@univr.it M. Loog Delft University of Tehnology Delft, The Netherlands Email: m.loog@tudelft.nl Abstrat

More information

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont.

Lecture 7: Sampling/Projections for Least-squares Approximation, Cont. 7 Sampling/Projections for Least-squares Approximation, Cont. Stat60/CS94: Randomized Algorithms for Matries and Data Leture 7-09/5/013 Leture 7: Sampling/Projetions for Least-squares Approximation, Cont. Leturer: Mihael Mahoney Sribe: Mihael Mahoney Warning: these

More information

Assessing the Performance of a BCI: A Task-Oriented Approach

Assessing the Performance of a BCI: A Task-Oriented Approach Assessing the Performane of a BCI: A Task-Oriented Approah B. Dal Seno, L. Mainardi 2, M. Matteui Department of Eletronis and Information, IIT-Unit, Politenio di Milano, Italy 2 Department of Bioengineering,

More information

The Laws of Acceleration

The Laws of Acceleration The Laws of Aeleration The Relationships between Time, Veloity, and Rate of Aeleration Copyright 2001 Joseph A. Rybzyk Abstrat Presented is a theory in fundamental theoretial physis that establishes the

More information

18.05 Problem Set 6, Spring 2014 Solutions

18.05 Problem Set 6, Spring 2014 Solutions 8.5 Problem Set 6, Spring 4 Solutions Problem. pts.) a) Throughout this problem we will let x be the data of 4 heads out of 5 tosses. We have 4/5 =.56. Computing the likelihoods: 5 5 px H )=.5) 5 px H

More information

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM

CSC2515 Winter 2015 Introduc3on to Machine Learning. Lecture 5: Clustering, mixture models, and EM CSC2515 Winter 2015 Introdu3on to Mahine Learning Leture 5: Clustering, mixture models, and EM All leture slides will be available as.pdf on the ourse website: http://www.s.toronto.edu/~urtasun/ourses/csc2515/

More information

Hankel Optimal Model Order Reduction 1

Hankel Optimal Model Order Reduction 1 Massahusetts Institute of Tehnology Department of Eletrial Engineering and Computer Siene 6.245: MULTIVARIABLE CONTROL SYSTEMS by A. Megretski Hankel Optimal Model Order Redution 1 This leture overs both

More information

Supplementary Materials

Supplementary Materials Supplementary Materials Neural population partitioning and a onurrent brain-mahine interfae for sequential motor funtion Maryam M. Shanehi, Rollin C. Hu, Marissa Powers, Gregory W. Wornell, Emery N. Brown

More information

Average Rate Speed Scaling

Average Rate Speed Scaling Average Rate Speed Saling Nikhil Bansal David P. Bunde Ho-Leung Chan Kirk Pruhs May 2, 2008 Abstrat Speed saling is a power management tehnique that involves dynamially hanging the speed of a proessor.

More information

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM

A NETWORK SIMPLEX ALGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM NETWORK SIMPLEX LGORITHM FOR THE MINIMUM COST-BENEFIT NETWORK FLOW PROBLEM Cen Çalışan, Utah Valley University, 800 W. University Parway, Orem, UT 84058, 801-863-6487, en.alisan@uvu.edu BSTRCT The minimum

More information

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite.

Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the function V ( x ) to be positive definite. Leture Remark 4.1 Unlike Lyapunov theorems, LaSalle s theorem does not require the funtion V ( x ) to be positive definite. ost often, our interest will be to show that x( t) as t. For that we will need

More information

An I-Vector Backend for Speaker Verification

An I-Vector Backend for Speaker Verification An I-Vetor Bakend for Speaker Verifiation Patrik Kenny, 1 Themos Stafylakis, 1 Jahangir Alam, 1 and Marel Kokmann 2 1 CRIM, Canada, {patrik.kenny, themos.stafylakis, jahangir.alam}@rim.a 2 VoieTrust, Canada,

More information

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b

Where as discussed previously we interpret solutions to this partial differential equation in the weak sense: b Consider the pure initial value problem for a homogeneous system of onservation laws with no soure terms in one spae dimension: Where as disussed previously we interpret solutions to this partial differential

More information

Nonreversibility of Multiple Unicast Networks

Nonreversibility of Multiple Unicast Networks Nonreversibility of Multiple Uniast Networks Randall Dougherty and Kenneth Zeger September 27, 2005 Abstrat We prove that for any finite direted ayli network, there exists a orresponding multiple uniast

More information

Bayesian Optimization Under Uncertainty

Bayesian Optimization Under Uncertainty Bayesian Optimization Under Unertainty Justin J. Beland University o Toronto justin.beland@mail.utoronto.a Prasanth B. Nair University o Toronto pbn@utias.utoronto.a Abstrat We onsider the problem o robust

More information

Simplified Buckling Analysis of Skeletal Structures

Simplified Buckling Analysis of Skeletal Structures Simplified Bukling Analysis of Skeletal Strutures B.A. Izzuddin 1 ABSRAC A simplified approah is proposed for bukling analysis of skeletal strutures, whih employs a rotational spring analogy for the formulation

More information

Analysis of discretization in the direct simulation Monte Carlo

Analysis of discretization in the direct simulation Monte Carlo PHYSICS OF FLUIDS VOLUME 1, UMBER 1 OCTOBER Analysis of disretization in the diret simulation Monte Carlo iolas G. Hadjionstantinou a) Department of Mehanial Engineering, Massahusetts Institute of Tehnology,

More information

Coding for Random Projections and Approximate Near Neighbor Search

Coding for Random Projections and Approximate Near Neighbor Search Coding for Random Projetions and Approximate Near Neighbor Searh Ping Li Department of Statistis & Biostatistis Department of Computer Siene Rutgers University Pisataay, NJ 8854, USA pingli@stat.rutgers.edu

More information

A Spatiotemporal Approach to Passive Sound Source Localization

A Spatiotemporal Approach to Passive Sound Source Localization A Spatiotemporal Approah Passive Sound Soure Loalization Pasi Pertilä, Mikko Parviainen, Teemu Korhonen and Ari Visa Institute of Signal Proessing Tampere University of Tehnology, P.O.Box 553, FIN-330,

More information

A simple expression for radial distribution functions of pure fluids and mixtures

A simple expression for radial distribution functions of pure fluids and mixtures A simple expression for radial distribution funtions of pure fluids and mixtures Enrio Matteoli a) Istituto di Chimia Quantistia ed Energetia Moleolare, CNR, Via Risorgimento, 35, 56126 Pisa, Italy G.

More information

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1

Computer Science 786S - Statistical Methods in Natural Language Processing and Data Analysis Page 1 Computer Siene 786S - Statistial Methods in Natural Language Proessing and Data Analysis Page 1 Hypothesis Testing A statistial hypothesis is a statement about the nature of the distribution of a random

More information

Active Learning of Features and Labels

Active Learning of Features and Labels Ative Learning of Features and Labels Balaji Krishnapuram BALAJI.KRISHNAPURAM@SIEMENS.COM Siemens Medial Solutions USA, 51 Valley Stream Parkway, Malvern, PA 19355 David Williams DPW@EE.DUKE.EDU Ya Xue

More information

max min z i i=1 x j k s.t. j=1 x j j:i T j

max min z i i=1 x j k s.t. j=1 x j j:i T j AM 221: Advaned Optimization Spring 2016 Prof. Yaron Singer Leture 22 April 18th 1 Overview In this leture, we will study the pipage rounding tehnique whih is a deterministi rounding proedure that an be

More information

Robust Recovery of Signals From a Structured Union of Subspaces

Robust Recovery of Signals From a Structured Union of Subspaces Robust Reovery of Signals From a Strutured Union of Subspaes 1 Yonina C. Eldar, Senior Member, IEEE and Moshe Mishali, Student Member, IEEE arxiv:87.4581v2 [nlin.cg] 3 Mar 29 Abstrat Traditional sampling

More information

Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Leture 5 for BST 63: Statistial Theory II Kui Zhang, Spring Chapter 8 Hypothesis Testing Setion 8 Introdution Definition 8 A hypothesis is a statement about a population parameter Definition 8 The two

More information

Design and Development of Three Stages Mixed Sampling Plans for Variable Attribute Variable Quality Characteristics

Design and Development of Three Stages Mixed Sampling Plans for Variable Attribute Variable Quality Characteristics International Journal of Statistis and Systems ISSN 0973-2675 Volume 12, Number 4 (2017), pp. 763-772 Researh India Publiations http://www.ripubliation.om Design and Development of Three Stages Mixed Sampling

More information

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013

Wavetech, LLC. Ultrafast Pulses and GVD. John O Hara Created: Dec. 6, 2013 Ultrafast Pulses and GVD John O Hara Created: De. 6, 3 Introdution This doument overs the basi onepts of group veloity dispersion (GVD) and ultrafast pulse propagation in an optial fiber. Neessarily, it

More information

Distributed Gaussian Mixture Model for Monitoring Multimode Plant-wide Process

Distributed Gaussian Mixture Model for Monitoring Multimode Plant-wide Process istributed Gaussian Mixture Model for Monitoring Multimode Plant-wide Proess Jinlin Zhu, Zhiqiang Ge, Zhihuan Song. State ey Laboratory of Industrial Control Tehnology, Institute of Industrial Proess Control,

More information

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach

Measuring & Inducing Neural Activity Using Extracellular Fields I: Inverse systems approach Measuring & Induing Neural Ativity Using Extraellular Fields I: Inverse systems approah Keith Dillon Department of Eletrial and Computer Engineering University of California San Diego 9500 Gilman Dr. La

More information

Convergence of reinforcement learning with general function approximators

Convergence of reinforcement learning with general function approximators Convergene of reinforement learning with general funtion approximators assilis A. Papavassiliou and Stuart Russell Computer Siene Division, U. of California, Berkeley, CA 94720-1776 fvassilis,russellg@s.berkeley.edu

More information

The Hanging Chain. John McCuan. January 19, 2006

The Hanging Chain. John McCuan. January 19, 2006 The Hanging Chain John MCuan January 19, 2006 1 Introdution We onsider a hain of length L attahed to two points (a, u a and (b, u b in the plane. It is assumed that the hain hangs in the plane under a

More information

Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems

Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems 009 9th IEEE International Conferene on Distributed Computing Systems Modeling Probabilisti Measurement Correlations for Problem Determination in Large-Sale Distributed Systems Jing Gao Guofei Jiang Haifeng

More information

On Certain Singular Integral Equations Arising in the Analysis of Wellbore Recharge in Anisotropic Formations

On Certain Singular Integral Equations Arising in the Analysis of Wellbore Recharge in Anisotropic Formations On Certain Singular Integral Equations Arising in the Analysis of Wellbore Reharge in Anisotropi Formations C. Atkinson a, E. Sarris b, E. Gravanis b, P. Papanastasiou a Department of Mathematis, Imperial

More information

Common Trends in European School Populations

Common Trends in European School Populations Common rends in European Shool Populations P. Sebastiani 1 (1) and M. Ramoni (2) (1) Department of Mathematis and Statistis, University of Massahusetts. (2) Children's Hospital Informatis Program, Harvard

More information

Transformation to approximate independence for locally stationary Gaussian processes

Transformation to approximate independence for locally stationary Gaussian processes ransformation to approximate independene for loally stationary Gaussian proesses Joseph Guinness, Mihael L. Stein We provide new approximations for the likelihood of a time series under the loally stationary

More information

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1

Case I: 2 users In case of 2 users, the probability of error for user 1 was earlier derived to be 2 A1 MUTLIUSER DETECTION (Letures 9 and 0) 6:33:546 Wireless Communiations Tehnologies Instrutor: Dr. Narayan Mandayam Summary By Shweta Shrivastava (shwetash@winlab.rutgers.edu) bstrat This artile ontinues

More information

The transition between quasi-static and fully dynamic for interfaces

The transition between quasi-static and fully dynamic for interfaces Physia D 198 (24) 136 147 The transition between quasi-stati and fully dynami for interfaes G. Caginalp, H. Merdan Department of Mathematis, University of Pittsburgh, Pittsburgh, PA 1526, USA Reeived 6

More information

EE 321 Project Spring 2018

EE 321 Project Spring 2018 EE 21 Projet Spring 2018 This ourse projet is intended to be an individual effort projet. The student is required to omplete the work individually, without help from anyone else. (The student may, however,

More information

10.2 The Occurrence of Critical Flow; Controls

10.2 The Occurrence of Critical Flow; Controls 10. The Ourrene of Critial Flow; Controls In addition to the type of problem in whih both q and E are initially presribed; there is a problem whih is of pratial interest: Given a value of q, what fators

More information

A Small Footprint i-vector Extractor

A Small Footprint i-vector Extractor A Small Footprint i-vetor Extrator Patrik Kenny Centre de reherhe informatique de Montréal (CRIM) Patrik.Kenny@rim.a Abstrat Both the memory and omputational requirements of algorithms traditionally used

More information

Grasp Planning: How to Choose a Suitable Task Wrench Space

Grasp Planning: How to Choose a Suitable Task Wrench Space Grasp Planning: How to Choose a Suitable Task Wrenh Spae Ch. Borst, M. Fisher and G. Hirzinger German Aerospae Center - DLR Institute for Robotis and Mehatronis 8223 Wessling, Germany Email: [Christoph.Borst,

More information

Planning with Uncertainty in Position: an Optimal Planner

Planning with Uncertainty in Position: an Optimal Planner Planning with Unertainty in Position: an Optimal Planner Juan Pablo Gonzalez Anthony (Tony) Stentz CMU-RI -TR-04-63 The Robotis Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Otober

More information

Singular Event Detection

Singular Event Detection Singular Event Detetion Rafael S. Garía Eletrial Engineering University of Puerto Rio at Mayagüez Rafael.Garia@ee.uprm.edu Faulty Mentor: S. Shankar Sastry Researh Supervisor: Jonathan Sprinkle Graduate

More information

Tests of fit for symmetric variance gamma distributions

Tests of fit for symmetric variance gamma distributions Tests of fit for symmetri variane gamma distributions Fragiadakis Kostas UADPhilEon, National and Kapodistrian University of Athens, 4 Euripidou Street, 05 59 Athens, Greee. Keywords: Variane Gamma Distribution,

More information

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS

DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS CHAPTER 4 DIGITAL DISTANCE RELAYING SCHEME FOR PARALLEL TRANSMISSION LINES DURING INTER-CIRCUIT FAULTS 4.1 INTRODUCTION Around the world, environmental and ost onsiousness are foring utilities to install

More information

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix

Product Policy in Markets with Word-of-Mouth Communication. Technical Appendix rodut oliy in Markets with Word-of-Mouth Communiation Tehnial Appendix August 05 Miro-Model for Inreasing Awareness In the paper, we make the assumption that awareness is inreasing in ustomer type. I.e.,

More information

General Closed-form Analytical Expressions of Air-gap Inductances for Surfacemounted Permanent Magnet and Induction Machines

General Closed-form Analytical Expressions of Air-gap Inductances for Surfacemounted Permanent Magnet and Induction Machines General Closed-form Analytial Expressions of Air-gap Indutanes for Surfaemounted Permanent Magnet and Indution Mahines Ronghai Qu, Member, IEEE Eletroni & Photoni Systems Tehnologies General Eletri Company

More information

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO

Evaluation of effect of blade internal modes on sensitivity of Advanced LIGO Evaluation of effet of blade internal modes on sensitivity of Advaned LIGO T0074-00-R Norna A Robertson 5 th Otober 00. Introdution The urrent model used to estimate the isolation ahieved by the quadruple

More information

Lyapunov Exponents of Second Order Linear Systems

Lyapunov Exponents of Second Order Linear Systems Reent Researhes in Computational Intelligene and Information Seurity Lyapunov Exponents of Seond Order Linear Systems ADAM CZORNIK and ALEKSANDER NAWRAT Department of Automati Control Silesian Tehnial

More information

Understanding Elementary Landscapes

Understanding Elementary Landscapes Understanding Elementary Landsapes L. Darrell Whitley Andrew M. Sutton Adele E. Howe Department of Computer Siene Colorado State University Fort Collins, CO 853 {whitley,sutton,howe}@s.olostate.edu ABSTRACT

More information

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017

CMSC 451: Lecture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 CMSC 451: Leture 9 Greedy Approximation: Set Cover Thursday, Sep 28, 2017 Reading: Chapt 11 of KT and Set 54 of DPV Set Cover: An important lass of optimization problems involves overing a ertain domain,

More information

OPPORTUNISTIC SENSING FOR OBJECT RECOGNITION A UNIFIED FORMULATION FOR DYNAMIC SENSOR SELECTION AND FEATURE EXTRACTION

OPPORTUNISTIC SENSING FOR OBJECT RECOGNITION A UNIFIED FORMULATION FOR DYNAMIC SENSOR SELECTION AND FEATURE EXTRACTION OPPORTUNISTIC SENSING FOR OBJECT RECOGNITION A UNIFIED FORMULATION FOR DYNAMIC SENSOR SELECTION AND FEATURE EXTRACTION Zhaowen Wang, Jianhao Yang, Nasser Nasrabadi and Thomas Huang Bekman Institute, University

More information

Wave Propagation through Random Media

Wave Propagation through Random Media Chapter 3. Wave Propagation through Random Media 3. Charateristis of Wave Behavior Sound propagation through random media is the entral part of this investigation. This hapter presents a frame of referene

More information

arxiv: v2 [math.pr] 9 Dec 2016

arxiv: v2 [math.pr] 9 Dec 2016 Omnithermal Perfet Simulation for Multi-server Queues Stephen B. Connor 3th Deember 206 arxiv:60.0602v2 [math.pr] 9 De 206 Abstrat A number of perfet simulation algorithms for multi-server First Come First

More information

Robust Flight Control Design for a Turn Coordination System with Parameter Uncertainties

Robust Flight Control Design for a Turn Coordination System with Parameter Uncertainties Amerian Journal of Applied Sienes 4 (7): 496-501, 007 ISSN 1546-939 007 Siene Publiations Robust Flight ontrol Design for a urn oordination System with Parameter Unertainties 1 Ari Legowo and Hiroshi Okubo

More information

CONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION. Smail Mahdi

CONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION. Smail Mahdi Serdia Math. J. 30 (2004), 55 70 CONDITIONAL CONFIDENCE INTERVAL FOR THE SCALE PARAMETER OF A WEIBULL DISTRIBUTION Smail Mahdi Communiated by N. M. Yanev Abstrat. A two-sided onditional onfidene interval

More information

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E')

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E') 22.54 Neutron Interations and Appliations (Spring 2004) Chapter 6 (2/24/04) Energy Transfer Kernel F(E E') Referenes -- J. R. Lamarsh, Introdution to Nulear Reator Theory (Addison-Wesley, Reading, 1966),

More information

Normative and descriptive approaches to multiattribute decision making

Normative and descriptive approaches to multiattribute decision making De. 009, Volume 8, No. (Serial No.78) China-USA Business Review, ISSN 57-54, USA Normative and desriptive approahes to multiattribute deision making Milan Terek (Department of Statistis, University of

More information

Some recent developments in probability distributions

Some recent developments in probability distributions Proeedings 59th ISI World Statistis Congress, 25-30 August 2013, Hong Kong (Session STS084) p.2873 Some reent developments in probability distributions Felix Famoye *1, Carl Lee 1, and Ayman Alzaatreh

More information

University of Groningen

University of Groningen University of Groningen Port Hamiltonian Formulation of Infinite Dimensional Systems II. Boundary Control by Interonnetion Mahelli, Alessandro; van der Shaft, Abraham; Melhiorri, Claudio Published in:

More information

Predicting the confirmation time of Bitcoin transactions

Predicting the confirmation time of Bitcoin transactions Prediting the onfirmation time of Bitoin transations D.T. Koops Korteweg-de Vries Institute, University of Amsterdam arxiv:189.1596v1 [s.dc] 21 Sep 218 September 28, 218 Abstrat We study the probabilisti

More information

Development of Fuzzy Extreme Value Theory. Populations

Development of Fuzzy Extreme Value Theory. Populations Applied Mathematial Sienes, Vol. 6, 0, no. 7, 58 5834 Development of Fuzzy Extreme Value Theory Control Charts Using α -uts for Sewed Populations Rungsarit Intaramo Department of Mathematis, Faulty of

More information

Developing Excel Macros for Solving Heat Diffusion Problems

Developing Excel Macros for Solving Heat Diffusion Problems Session 50 Developing Exel Maros for Solving Heat Diffusion Problems N. N. Sarker and M. A. Ketkar Department of Engineering Tehnology Prairie View A&M University Prairie View, TX 77446 Abstrat This paper

More information

Chapter 2. Conditional Probability

Chapter 2. Conditional Probability Chapter. Conditional Probability The probabilities assigned to various events depend on what is known about the experimental situation when the assignment is made. For a partiular event A, we have used

More information

Lightpath routing for maximum reliability in optical mesh networks

Lightpath routing for maximum reliability in optical mesh networks Vol. 7, No. 5 / May 2008 / JOURNAL OF OPTICAL NETWORKING 449 Lightpath routing for maximum reliability in optial mesh networks Shengli Yuan, 1, * Saket Varma, 2 and Jason P. Jue 2 1 Department of Computer

More information

General Equilibrium. What happens to cause a reaction to come to equilibrium?

General Equilibrium. What happens to cause a reaction to come to equilibrium? General Equilibrium Chemial Equilibrium Most hemial reations that are enountered are reversible. In other words, they go fairly easily in either the forward or reverse diretions. The thing to remember

More information

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION

On the Bit Error Probability of Noisy Channel Networks With Intermediate Node Encoding I. INTRODUCTION 5188 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 [8] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood estimation from inomplete data via the EM algorithm, J.

More information

Perturbation Analyses for the Cholesky Factorization with Backward Rounding Errors

Perturbation Analyses for the Cholesky Factorization with Backward Rounding Errors Perturbation Analyses for the holesky Fatorization with Bakward Rounding Errors Xiao-Wen hang Shool of omputer Siene, MGill University, Montreal, Quebe, anada, H3A A7 Abstrat. This paper gives perturbation

More information

UTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker.

UTC. Engineering 329. Proportional Controller Design. Speed System. John Beverly. Green Team. John Beverly Keith Skiles John Barker. UTC Engineering 329 Proportional Controller Design for Speed System By John Beverly Green Team John Beverly Keith Skiles John Barker 24 Mar 2006 Introdution This experiment is intended test the variable

More information

Taste for variety and optimum product diversity in an open economy

Taste for variety and optimum product diversity in an open economy Taste for variety and optimum produt diversity in an open eonomy Javier Coto-Martínez City University Paul Levine University of Surrey Otober 0, 005 María D.C. Garía-Alonso University of Kent Abstrat We

More information

SINCE Zadeh s compositional rule of fuzzy inference

SINCE Zadeh s compositional rule of fuzzy inference IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 14, NO. 6, DECEMBER 2006 709 Error Estimation of Perturbations Under CRI Guosheng Cheng Yuxi Fu Abstrat The analysis of stability robustness of fuzzy reasoning

More information

The gravitational phenomena without the curved spacetime

The gravitational phenomena without the curved spacetime The gravitational phenomena without the urved spaetime Mirosław J. Kubiak Abstrat: In this paper was presented a desription of the gravitational phenomena in the new medium, different than the urved spaetime,

More information

Simplification of Network Dynamics in Large Systems

Simplification of Network Dynamics in Large Systems Simplifiation of Network Dynamis in Large Systems Xiaojun Lin and Ness B. Shroff Shool of Eletrial and Computer Engineering Purdue University, West Lafayette, IN 47906, U.S.A. Email: {linx, shroff}@en.purdue.edu

More information

Market Segmentation for Privacy Differentiated Free Services

Market Segmentation for Privacy Differentiated Free Services 1 Market Segmentation for Privay Differentiated Free Servies Chong Huang, Lalitha Sankar arxiv:1611.538v [s.gt] 18 Nov 16 Abstrat The emerging marketplae for online free servies in whih servie providers

More information

Exploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake

Exploring the feasibility of on-site earthquake early warning using close-in records of the 2007 Noto Hanto earthquake Exploring the feasibility of on-site earthquake early warning using lose-in reords of the 2007 Noto Hanto earthquake Yih-Min Wu 1 and Hiroo Kanamori 2 1. Department of Geosienes, National Taiwan University,

More information

Solving Constrained Lasso and Elastic Net Using

Solving Constrained Lasso and Elastic Net Using Solving Constrained Lasso and Elasti Net Using ν s Carlos M. Alaı z, Alberto Torres and Jose R. Dorronsoro Universidad Auto noma de Madrid - Departamento de Ingenierı a Informa tia Toma s y Valiente 11,

More information

A Functional Representation of Fuzzy Preferences

A Functional Representation of Fuzzy Preferences Theoretial Eonomis Letters, 017, 7, 13- http://wwwsirporg/journal/tel ISSN Online: 16-086 ISSN Print: 16-078 A Funtional Representation of Fuzzy Preferenes Susheng Wang Department of Eonomis, Hong Kong

More information