Model-based Clustering by Probabilistic Self-organizing Maps

Size: px
Start display at page:

Download "Model-based Clustering by Probabilistic Self-organizing Maps"

Transcription

1 EEE TRANATN N NERA NETRK, V. XX, N., ode-based ustering by robabiistic ef-organizing aps hih-ian heng, Hsin-hia u, ember, EEE, and Hsin-in ang, ember, EEE Abstract n this paper, we consider the earning process of a probabiistic sef-organizing map (b) as a modebased data custering procedure that preserves the topoogica reationships between data custers in a neura network. ased on this concept, we deveop a couping-ikeihood mixture mode for the b that extends the reference vectors in Kohonen s to mutivariate aussians distributions. e aso derive three E-type agorithms, caed the E, E, and DAE agorithms, for earning the mode (b) based on the maximum ikeihood criterion. E is derived by using the cassification E (E) agorithm to maximize the cassification ikeihood; E is derived by using the E agorithm to maximize the mixture ikeihood; and DAE is a deterministic anneaing (DA) variant of E and E. oreover, by shrinking the neighborhood size, E and E can be interpreted, respectivey, as DA variants of the E and E agorithms for aussian mode-based custering. The experiment resuts show that the proposed b earning agorithms achieve comparabe data custering performance to that of the deterministic anneaing E (DAE) approach, whie maintaining the topoogy-preserving property. ndex Terms mode-based custering, sef-organizing map (), probabiistic sef-organizing map (b), E agorithm, DAE agorithm, E agorithm.. NTRDTN n mode-based custering, data sampes are grouped by earning a mixture mode (usuay a aussian mixture mode) in which each mixture component represents a group or custer. There are two major earning methods for modebased custering: the mixture ikeihood approach, where the ikeihood of each data sampe is a mixture of a the component ikeihoods of the data sampe; and the cassification ikeihood approach, where the ikeihood of each data sampe is generated by its winning component ony [], [2], [3], [4], [5], [6], [7], [8], [9]. n both approaches, when the gobay optima estimation of the mode parameters cannot be obtained anayticay, iterative earning agorithms that ony guarantee obtaining ocay optima soutions are usuay empoyed. The expectation-maximization (E) agorithm for mixture ikeihood earning [], [] and the cassification E (E) anuscript received November, 27; revised November 3, 28. This work was supported in part by the Nationa cience ounci of Taiwan, R..., under rants N E--24-Y3 and N96-33-H heng is with the Department of omputer cience, Nationa hiao Tung niversity, Hsinchu, Taiwan, R..., and aso with the nstitute of nformation cience, Academia inica, Taipei, Taiwan, R... e-mai: (sscheng@iis.sinica.edu.tw). H.. u is with the Department of omputer cience, Nationa hiao Tung niversity, Hsinchu, Taiwan, R... e-mai: (hcfu@csie.nctu.edu.tw). H.. ang is with the nstitute of nformation cience, Academia inica, Taipei, Taiwan, R... e-mai: (whm@iis.sinica.edu.tw). agorithm for cassification ikeihood earning [8] are two such agorithms. However, a critica aspect of the E and E agorithms is that their earning performance is very sensitive to the initia conditions of the mode s parameters. To address this issue, eda and Nakano [2] proposed a deterministic anneaing E (DAE) agorithm that tackes the initiaization issue via a deterministic anneaing process, which performs robust optimization based on an anaogy to the cooing of a system in statistica physics. ome heuristic-ike earning agorithms have aso been proposed. or exampe, in [3], the authors propose an agorithm that finds the appropriate initia conditions for E earning by using spit and merge operations. Another method, proposed in [4], overcomes the initiaization issue of E by iterativey spitting the mixture components using the ayesian nformation riterion as the spitting vaidity measure. n addition to the initiaization issue of the earning agorithms, conventiona mode-based custering suffers from another imitation in that it cannot preserve the topoogica reationships among custers after the custering procedure. To overcome this shortcoming, the custering task can be performed by using Kohonen s sef-organizing map () [5], [6], a we-known neura network mode for data custering and visuaization. After the custering procedure, the topoogica reationships among data custers can be preserved (or visuaized) on the network, which is usuay a two dimensiona attice. Kohonen s sequentia and batch earning agorithms have proved successfu in many practica appications [5], [6]. However, they aso suffer from some shortcomings, such as the ack of an objective (cost) function, a genera proof of convergence, and a probabiistic framework [7]. The foowing are some reated works that have addressed these issues. n [8], [9], the behavior of Kohonen s sequentia earning agorithm was studied in terms of energy functions, based on which, heng [2] proposed an energy function for whose parameters can be earned by a K-means type agorithm. uttre [2], [22] proposed a noisy vector quantization mode caed the topographic vector quantizer (TVQ), whose training process coincides with the earning process of. The cost function of TVQ represents the topographic distortion between the input data and the output code vectors in terms of Eucidean distance. raepe et a. [23], [24] derived a soft topographic vector quantization (TVQ) agorithm by appying a deterministic anneaing process to the optimization of TVQ s cost function. ased on the topographic distortion concept, Heskes [25] appied a different DA impementation from that of TVQ, and obtained an agorithm identica to TVQ when the quantization error is

2 EEE TRANATN N NERA NETRK, V. XX, N., 2 expressed in terms of Eucidean distance. n [26], how and u proposed an on-ine agorithm for TVQ; ater, motivated by TVQ, they proposed a data visuaization method that integrates and muti-dimensiona scaing [27]. ased on the ayesian anaysis of s in [28], Anouar et a. [29] proposed a probabiistic formaism for, where the parameters are earned by a K-means type agorithm. To hep users seect the correct mode compexity for by probabiistic assessment, ampinen and Kostiainen [3] deveoped a generative mode in which the is trained by Kohonen s agorithm. eanwhie, Van Hue [3] deveoped a kernebased topographic formation in which the parameters are adjusted to maximize the joint entropy of the kerne outputs. He subsequenty deveoped a new agorithm with heteroscedastic aussian mixtures that aows for a unified account of vector quantization, og-ikeihood, and Kuback-eiber divergence [32]. Another probabiistic formuation is proposed in [33], whereby a normaized neighborhood function of is used as the posterior distribution in the E-step of the E agorithm for a mixture mode to enforce the sef-organizing of the mixture components. um et a. [34] interpreted Kohonen s sequentia earning agorithm in terms of maximizing the oca correations (couping energies) between neurons and their neighborhoods for the given input data. They then proposed an energy function for that reveas the correations, and a gradient ascent earning agorithm for the energy function. n Kohonen s architecture, neurons in the network associate with reference vectors in the data space. This contrasts with a whose neurons associate with reference modes that present probabiity distributions, such as the isotropic aussians used in [33] and the heteroscedastic aussians used in [29], [32]. n this paper, we ca the atter a probabiistic (b). otivated by the couping energy concept in um et a. s work [34], we deveop a couping-ikeihood mixture mode for the b that uses mutivariate aussian distributions as the reference modes. n the proposed mode, oca couping energies between neurons and their neighborhoods are expressed in terms of probabiistic ikeihoods; and each mixture component expresses the oca coupingikeihood between one neuron and its neighborhood. ased on this mode, we deveop E, E, and DAE agorithms for earning bs, namey the E, E, and DAE agorithms, respectivey. ecause they inherit the properties of the E and E agorithms, the proposed agorithms are characterized by reiabe convergence, ow cost per iteration, economy of storage, and ease of programming. rom our experiments on the organizing property, we observe that E is ess sensitive to the initiaization of the parameters than E when using a sma-fixed neighborhood, whie DAE overcomes the initiaization probem of E and E through an anneaing process. urthermore, we show that E and E can be interpreted, respectivey, as deterministic anneaing variants of the E and E agorithms for aussian mode-based custering, where the neighborhood shrinking is interpreted as an anneaing process. e conducted experiments on data sets from the achine earning Database Repository [35]. The experiment resuts show that the proposed b earning agorithms achieve comparabe data custering performance to the DAE agorithm, whie maintaining the topoogy-preserving property. The remainder of the paper is organized as foows. n ec., we review the E, E, and DAE agorithms for mode-based custering. Then, the proposed coupingikeihood mixture mode, and the E, E, and DAE agorithms are described in ec.. The experiment resuts are detaied in ec. V. The differences and reations between the proposed agorithms and other ones are discussed in ec. V. e then present our concusions in ec. V.. THE E, E, AND DAE ARTH R DE-AED TERN A. The mixture ikeihood approach and E agorithm n the mixture ikeihood approach for mode-based custering, it is assumed that the given data set X {x, x 2,, x N } R d is generated by a set of independenty and identicay distributed (i.i.d.) random vectors from a mixture mode: p(x i ; Θ) w(k)p(x i ; θ k ), () k where w(k) is the mixing weight of the mixture component p(x i ; θ k ), subject to w(k) for k, 2,, ; k w(k) ; and θ k denotes the parameter set of p(x i ; θ k ). The maximum ikeihood estimate of the parameter set of the mixture mode ˆΘ {ŵ(), ŵ(2),, ŵ(),ˆθ, ˆθ 2,, ˆθ } can be obtained by maximizing the foowing ogikeihood function: N (Θ; X ) og p(x i ; Θ) i N og( w(k)p(x i ; θ k )). (2) i k This is usuay achieved by using the expectationmaximization (E) agorithm [], []. After earning the mixture mode, we derive a partition of X, ˆ { ˆ, ˆ 2,, ˆ }, by assigning each x i X to the mixture component that has the argest posterior probabiity for x i, i.e., x i ˆ j if j arg max k p(ˆθ k x i ; ˆΘ). ) The E agorithm for mixture modes: f the maximum ikeihood estimation of the parameters cannot be accompished anayticay, the E agorithm is normay used as an aternative approach when the given data is incompete or contains hidden information. n the case of the mixture mode, suppose that Θ (t) denotes the current estimate of the parameter set, and k is the hidden variabe that indicates the mixture component from which the observation is generated. The E-step of the E agorithm then computes the foowing so-caed auxiiary function: N Q(Θ; Θ (t) ) p(k x i ; Θ (t) ) og p(x i, k; Θ), (3) where i k p(x i, k; Θ) w(k)p(x i ; θ k ), (4)

3 EEE TRANATN N NERA NETRK, V. XX, N., 3 and p(k x i ; Θ (t) ) w(k) (t) p(x i ; θ (t) k ) j w(j)(t) p(x i ; θ (t) j ) (5) denotes the posterior probabiity of the kth mixture component for x i with the given Θ (t). Then, in the foowing -step, the Θ (t+) that satisfies Q(Θ (t+) ; Θ (t) ) max Θ Q(Θ; Θ(t) ) (6) is chosen as the new estimate of the parameter set. y iterativey creating the auxiiary function in Eq. (3) and performing the subsequent maximization step, the E agorithm guarantees to converge to a oca maximum of the og-ikeihood function in Eq. (2). hen Q(Θ; Θ (t) ) can not be maximized anayticay, the -step is modified to find some Θ (t+) such that Q(Θ (t+) ; Θ (t) ) > Q(Θ (t) ; Θ (t) ). This type of agorithm, caed eneraized E (E), is aso guaranteed to converge to a oca maximum [], [].. The cassification ikeihood approach and E agorithm n the cassification ikeihood approach for mode-based custering [6], [7], [8], instead of maximizing the ogikeihood function of the mixture mode in Eq. (2), the objective is to find the partition ˆ { ˆ, ˆ 2,, ˆ } of X and the mode parameters that maximize or (, {θ, θ 2,, θ }; X ) 2 (, Θ; X ) k k The reation between and 2 is x i k og p(x i ; θ k ), (7) x i k og(w(k)p(x i ; θ k )). (8) 2 (, Θ; X ) (, {θ, θ 2,, θ }; X ) + k og w(k), (9) k where k denotes the number of sampes in k. f a the mixture components are equay weighted, k k og w(k) becomes a constant, such that and 2 are equivaent. ) The E agorithm for mixture modes: eeux and ovaert [8] proposed the assification E (E) agorithm for estimating the parameter set Θ and partition. ike the E agorithm, the E agorithm is aso an iterative earning approach. n each iteration, E inserts a cassification step (-step) between the E-step and -step of the E agorithm. n the E-step, the posterior probabiity of each mixture component is cacuated for each data sampe. n the -step, to obtain the partition ˆ of the data sampes, each sampe is assigned to the mixture component that yieds the argest posterior probabiity for that sampe. n the -step, the maximization process is appied to ˆ k individuay for k, 2,,. or exampe, if a mutivariate aussian is used as the mixture component, the re-estimated mean vector and covariance matrix are the mean vector and the covariance matrix of the data sampes in ˆ k, respectivey; whie the reestimated mixture weight is ˆ k /N. rom a practica point of view, E is a K-means type agorithm that represents the prototypes with probabiity distributions [8].. The DAE agorithm n the DAE agorithm for earning a mixture mode [2], the objective is to minimize the foowing system energy function during the anneaing process: β (Θ; X ) N og( (w(k)p(x i ; θ k )) β ), () β i k where /β corresponds to the temperature that contros the anneaing process. The auxiiary function in this case is N β (Θ; Θ (t) ) f(k x i ; Θ (t) ) og p(x i, k; Θ), where i k () f(k x i ; Θ (t) (w(k) (t) p(x i ; θ (t) k ) ))β (2) j (w(j)(t) p(x i ; θ (t) j ))β is the posterior probabiity derived by using the maximum entropy principe. eda and Nakano [2] showed that β (Θ; X ) can be iterativey minimized by iterativey minimizing β (Θ; Θ (t) ). hen using DAE to earn a mixture mode, β is initiaized with a sma vaue (ess then ) such that the energy function itsef is simpe enough to be optimized. Then, the vaue of β is graduay increased to. During the earning process, the parameters earned in the current earning phase are used as the initia parameters of the next phase. n the case of β, β (Θ; X ) and β (Θ; Θ (t) ) are the negatives of the og-ikeihood function in Eq. (2) and the Q-function in Eq. (3), respectivey; thus, minimizing β (Θ; X ) is equivaent to maximizing the og-ikeihood function. According to [2], Eq. () can be rewritten as where and β (Θ) β (Θ) β (Θ; X ) β (Θ) β β(θ), (3) N i k N i k f(k x i ; Θ) og p(x i, k; Θ), (4) f(k x i ; Θ) og f(k x i ; Θ) (5) is the entropy of the posterior distribution. hen β, the rationa function f(k x i ; Θ) approximates to a zeroone function; thus, the entropy term β (Θ). n this case, β (Θ; X ) is equivaent to the negative of the objective function for E in Eq. (8). Therefore, DAE can be viewed as a DA variant of E. Each β vaue corresponds to a earning phase. The agorithm proceeds to the next phase after it converges in the current phase.

4 EEE TRANATN N NERA NETRK, V. XX, N., 4. THE E-TYE ARTH R EARNN A. ormuation of the couping-ikeihood mixture mode n this paper, we define a b as a that consists of neurons R{r, r 2,, r } in a network with a neighborhood function h k that defines the strength of atera interaction between two neurons, r k and r, for k, {, 2,, }; and each neuron r k associates with a reference mode θ k that represents some probabiity distribution in the data space. um et a. [34] interpreted Kohonen s sequentia earning agorithm in terms of maximizing the oca correations (couping energies) between the neurons and their neighborhoods with the given input data. iven a data sampe x i X {x, x 2,, x N }, the oca couping energy between r k and its neighborhood is defined as E xi k h k r k (x i ; θ k )r (x i ; θ ) r k (x i ; θ k ) h k r (x i ; θ ), (6) where r k (x i ; θ k ) denotes the response of neuron r k to x i, which is modeed by an isotropic aussian density. Then, the couping energy over the network for x i is defined as E xi E xi k, (7) k and the energy function to be maximized is N og E xi. (8) i n Eq. (6), the term h kr (x i ; θ ) can be considered as the neighborhood response of r k, where the conjunction between the neuron responses is impemented using the summing operation. n this study, we express the neuron response r (x i ; θ ) as a mutivariate aussian distribution as foows: r (x i ; θ ) (2π) d/2 Σ /2 exp( 2 (x i µ ) T Σ (x i µ )) (9) for, 2,, ; and formuate the neighborhood response of r k as r (x i ; θ ) h k, (2) k where the conjunction between the neuron responses in the neighborhood of r k is impemented using the mutipicative operation. Then, for a given x i, we define the oca couping energy between r k and its neighborhood as the foowing couping-ikeihood: p s (x i k; Θ, h) r k (x i ; θ k ) h kk r (x i ; θ ) h k k r (x i ; θ ) h k exp( h k og r (x i ; θ )), (2) where Θ is the set of reference modes, and h denotes the given neighborhood function 2. Then, we define the coupingikeihood of x i over the network as the foowing (unnormaized) mixture ikeihood: p s (x i ; Θ, h) w s (k)p s (x i k; Θ, h), (22) k where w s (k) for k, 2,, is fixed at /. Note that, theoreticay, the mixture weights can be earned automaticay. hen maximizing the oca couping-ikeihood p s (x i k; Θ, h) for each neuron r k, k, 2,,, the topoogica order between neuron r k and its neighborhood for the given data sampe x i is earned in the earning process; therefore, we use equa mixture weights in the mixture mode to take account of the topoogica order earning induced by the neurons faithfuy (with equa prior importance). n fact, this is important for earning an ordered map. rom our experimenta anaysis, if the mixture weights are updated in the earning process, the earning of topoogica order is frequenty dominated by some particuar mixture components, which makes it difficut to obtain an ordered map. or detais, one can refer to Appendix after reading the main body of the paper. omparing the network structure of the proposed coupingikeihood mixture mode in Eq. (22) with that of the aussian mixture mode (), as shown in ig., the proposed mode inserts a couping-ikeihood ayer between the aussian ikeihood ayer and the mixture ikeihood ayer to take account of the couping between the neurons and their neighborhoods. hen the neighborhood size is reduced to zero (i.e., h k δ k ), the couping-ikeihood mixture mode becomes a with equa mixture weights. Note that other probabiity distributions are possibe for r (x i ; θ ) in the formuation of the couping-ikeihood mixture mode, athough we use the mutivariate aussian distribution in this paper.. The E agorithm The sef-organizing process of b can be described as a mode-based data custering procedure that preserves the spatia reationships between the custers in a network. ased on the cassification ikeihood criterion for data custering 2 rom Eq. (2), it is obvious that, in our formuation, the couping between r k and its neighboring neurons is considered jointy, whereas um et a. s formuation considers it in a pairwise manner, as shown in Eq. (6). Note that we use the term couping-ikeihood instead of couping energy for two reasons: ) Eq. (2) is a couping of aussian ikeihoods; and 2) using couping-ikeihood can hep describe the ink between our proposed approaches and mode-based custering.

5 EEE TRANATN N NERA NETRK, V. XX, N., 5 As w s (k) for k, 2,, is fixed at /, the objective function can be rewritten as (a) aussian mixture mode s (, Θ; X, h) k x i k h k og r (x i ; θ ) + onst. (24) imiar to the derivation of the cassification E (E) agorithm for mode-based custering in [8], the E agorithm for the proposed b, i.e., the E agorithm, is derived as foows. E-step: iven the current reference mode set, Θ (t), compute the posterior probabiity of each mixture component of p s (x i ; Θ (t), h) for each x i as foows: γ (t) k i p s(k x i ; Θ (t), h) p s(x i, k; Θ (t), h) p s (x i ; Θ (t), h) exp( h k og r (x i ; θ (t) )) j exp( h j og r (x i ; θ (t) )), (25) for k, 2,,, and i, 2,, N. -step: Assign each x i to the custer whose corresponding mixture component has the argest posterior probabiity for x i, i.e., x i ˆ (t) j if j arg max k γ (t) k i. -step: After the -step, the partition of X (i.e., ˆ(t) ) is formed, and the objective function s defined in Eq. (24) becomes (b) The proposed couping-ikeihood mixture mode ig.. (a) The network structure of a aussian mixture mode, and (b) the proposed couping-ikeihood mixture mode. Here, r (x i ; θ ) denotes the mutivariate aussian distribution described in Eq. (9). [8], the computation of the couping-ikeihood of a data sampe is restricted to its winning neuron. Thus, the goa is to estimate the partition of X, ˆ { ˆ, ˆ 2,, ˆ }, and the set of reference modes, ˆΘ, so as to maximize the accumuated cassification og-ikeihood over a the data sampes as foows: s (, Θ; X, h) og(w s (k)p s (x i k; Θ, h)) k x i k og(w s (k) exp( h k og r (x i ; θ ))). x i k k (23) s (Θ; ˆ (t), X, h) k (t) x i ˆ k h k og r (x i ; θ )+onst. (26) imiar to the derivation of the -step of the E agorithm for earning a aussian mixture mode [], we can obtain the re-estimation formuae for the mean vectors and covariance matrices by substituting Eq. (9) into Eq. (26), taking the derivative of s with respect to individua parameters, and then setting it to zero. The re-estimation formuae are as foows: µ (t+) Σ (t+) k k k x i ˆ (t) k (t) ˆ k h k x i ˆ (t) k h k x i, (27) h k (x i µ (t+) k (t) ˆ k h k )(x i µ (t+) ) T (28) for, 2,,. hen the neighborhood size is reduced to zero (i.e., h k δ k ), E reduces to the E agorithm for earning s with equa mixture weights. ) E - a DA variant of E for : imiar to Kohonen s sequentia or batch agorithm, the E agorithm is appied in two stages. irst, it is appied to a arge neighborhood to form an ordered map near the center of the data sampes. Then, the reference modes are adapted to fit the distribution of the data sampes by graduay shrinking the neighborhood.

6 EEE TRANATN N NERA NETRK, V. XX, N., 6 ithout oss of generaity, we suppose the neighborhood function is the widey adopted (unnormaized) aussian kerne h k exp( r k r 2 2σ 2 ), (29) where r k r is the Eucidean distance between two neurons r k and r in the network. nitiay, E is appied with a arge σ vaue, which is reduced after the agorithm converges. Then, we use the new σ vaue and the earned parameters as the initia condition of the next earning phase. This process is repeated unti the vaue of σ is reduced to the pre-defined minimum vaue σ min. The above shrinking of the neighborhood (reduction of the σ vaue) can be interpreted as an anneaing process, where a arge σ vaue corresponds to a high temperature. Tabe ists the earning rues of the DAE agorithm for earning s with equa mixture weights [2] and the E agorithm. To faciitate the interpretation, we rewrite the objective function and re-estimation formuae of E in Eq. (24) and Eqs. (27)-(28), respectivey, with the new variabe win i, which denotes the index of the winning neuron of x i. or simpicity, we ony ist the re-estimation formuae of the mean vectors of the aussian components. y anayzing these two agorithms carefuy, one may view h (t) win i as a kind of posterior probabiity of θ(t) for x i in the network domain. ore precisey, x i is initiay projected into r (t) win in the network domain; then, r (t) i win is appied to i Eq. (29) as an observation of the aussian kerne centered at r to obtain the vaue of h (t) win. n both the DAE i and E agorithms, when the temperature (/β or σ) is high, the posterior distribution becomes amost uniform; hence, a the reference modes wi be moved to ocations near the center of the data sampes in this earning phase. y graduay reducing the temperature, the infuence of each x i becomes more ocaized, and the reference modes graduay spread out to fit the distribution of the data sampes. hen the temperature approaches zero, the probabiistic assignment strategy for the data sampes becomes the winner-take-a strategy, and the objective functions and earning rues of DAE and E are equivaent to those of E. The major difference between DAE and E seems to be that the posterior distribution in E is constrained by the network topoogy, but DAE does not have this property. To visuaize the transition of the objective function, we show a simuation on a simpe one-dimension, two-component aussian mixture probem in ig The training data contains 2 observations drawn from p(x; {m, v }, {m 2, v 2 }) exp( (x m ) 2 ) + exp( (x m 2) 2 ), v 2π v 2 2π 2v 2 2v 2 2 (3) where the aussian means are (m,m 2 )(-5,5); and the aussian variances are (v 2,v 2 2)(,) 4. The b network 3 Visuaization of how deterministic anneaing E/E works for function optimization is iustrated in detai in [2]. 4 The data is generated using the function gmmsamp.m in Netab software from structure is a 2 attice in [,]. The two reference modes are θ {µ, Σ } and θ 2 {µ 2, Σ 2 }, where Σ Σ 2. The objective function in Eq. (23) is cacuated with different setups for (µ,µ 2 ) to form the og-ikeihood surface. rom ig. 2, we observe that a arger σ for h k yieds a simper objective function for optimization. The og-ikeihood surface is symmetric aong µ µ 2 because of the symmetric attice structure and equa weighting of the reference modes. or the case of σ, the og-ikeihood vaue is cose to the goba maximum of the surface when both µ and µ 2 are cose to the center of the data (2.39 in this case). ith the reduction in the vaue of σ, the ocation of (µ,µ 2 ) for the goba maximum moves toward (m,m 2 ) and (m 2,m ). 2) Reation to Kohonen s batch agorithm: There are two differences between the E agorithm and Kohonen s batch agorithm. irst, E considers the neighborhood information when seecting the winning neuron, but Kohonen s agorithm does not. econd, E extends the reference vectors in Kohonen s agorithm with mutivariate aussians. n other words, if we set γ (t) r k i in E to k (x i ;θ (t) k ) j rj(xi;θ(t) ), j instead of the setting in Eq. (25), we obtain a probabiistic variant of Kohonen s batch agorithm (denoted as Kohonenaussian), where Kohonen s winner seection strategy is appied and the reference vectors are repaced with mutivariate aussians. Thus, we may view Kohonenaussian as an approximate impementation of E that optimizes E s objective function. oreover, if we set the covariance matrices in Kohonenaussian to be diagona with sma, identica variances, Kohonenaussian is equivaent to Kohonen s batch agorithm. Therefore, we can interpret the neighborhood shrinking of Kohonen s agorithms as a deterministic anneaing process, and thereby expain why they need to start with a arge neighborhood size. Recenty, Zhong and hosh [3] interpreted the neighborhood size of the agorithms that appy Kohonen s winner seection strategy as a temperature parameter in a deterministic anneaing process. However, their interpretations were not based on the optimization of an objective function, which is the essentia part of DA-based optimization. n contrast, in E, the neighborhood shrinking eads to the transition of the objective function from a simper one to a more compex one, as iustrated in ig. 2. 3) omputationa cost: t is cear from Tabe that the computationa cost of DAE is (N ), where, N, and are the numbers of reference modes, data sampes, and earning iterations, respectivey. ompared to DAE, E needs additiona ( 2 N) mutipication and addition operations for winner seection in each iteration, whie Kohonenaussian needs additiona (N) mutipications and additions.. The E agorithm As is obvious from Eq. (23), in the formuation of the objective function of the E agorithm, ony the oca couping-ikeihoods associated with the winning neurons are considered. Aternativey, we can compute the coupingikeihood of x i using the mixture ikeihood defined in Eq.

7 EEE TRANATN N NERA NETRK, V. XX, N., 7 TAE THE DAE ARTH R EARNN TH EQA XTRE EHT AND THE E ARTH. Agorithm DAE E N bjective function β (Θ; X ) in Eq. (3) i h win i og r (x i ; θ ) + onst where p(x i, ; Θ) r (x i ; θ ) osterior distribution f( x i ; Θ (t) r (x i ;θ (t) ) β r (t) r 2 ) h j r j (x i ;θ (t) (t) win i j )β win 2σ 2 ) i, 2,,, 2,, Temperature /β σ Re-estimation formuae µ (t+) N i f( x i;θ (t) )x i N f( x i;θ (t) ) i, 2,, µ (t+) N i h win (t) x i i N h i win (t) i, 2,, x 4 x 4 cassification og ikeihood cassification og ikeihood µ µ 5 2 µ µ 5 2 (a) σ (b) σ x 4 x 4 cassification og ikeihood µ µ 5 2 cassification og ikeihood µ µ 5 2 (c) σ (d) σ (i.e., h k δ k ) ig. 2. E s objective function becomes more compex with the reduction of neighborhood size (σ in h k ). (22) and appy the E agorithm to maximize the objective og-ikeihood function s (Θ; X, h) N og( w s (k)p s (x i k; Θ, h)). (3) i k The steps of the E agorithm for the proposed b, i.e., the E agorithm, are as foows. E-step: ith the mixture mode in Eq. (22), we form the auxiiary function as Q s (Θ; Θ (t) ) N i k γ (t) k i og p s(x i, k; Θ, h), (32) where γ (t) k i is the same as Eq. (25). ince p s(x i, k; Θ, h) w s (k)p s (x i k; Θ, h), Eq. (32) can be rewritten as Q s (Θ; Θ (t) ) N i k γ (t) k i og(w s(k)p s (x i k; Θ, h)). (33) As w s (k) for k, 2,, is fixed at /, by substituting

8 EEE TRANATN N NERA NETRK, V. XX, N., 8 Eq. (2) into Eq. (33), the auxiiary function can be rewritten as Q s (Θ; Θ (t) ) N i k N γ (t) k i i k h k og r (x i ; θ ) + onst. γ (t) k i h k og r (x i ; θ ) + onst. (34) r r 2 r3 4 r r.. inner seection r -step: y repacing the response r (x i ; θ ) in Eq. (34) with the mutivariate aussian density in Eq. (9) and setting the derivative of Q s with respect to individua mean vectors and covariance matrices to zero, we obtain the foowing reestimation formuae: µ (t+) Σ (t+) N i ( k γ(t) k i h k)x i N i ( k γ(t) k i h k), (35) N i ( k γ(t) k i h k)(x i µ (t+) )(x i µ (t+) ) T N i ( k γ(t) k i h k) (36) for, 2,,. hen the neighborhood size is reduced to zero (i.e., h k δ k ), E reduces to the E agorithm for earning s with equa mixture weights. There are two major differences between the E and E agorithms. irst, they earn maps based on the cassification ikeihood criterion and the mixture ikeihood criterion, respectivey. econd, E adapts the reference modes in a more goba way than E. To expain this perspective, we can consider the earning of E and E in the sense of sequentia earning. As iustrated in ig. 3, in the E agorithm (cf. Eqs. (27)-(28)), each data sampe x i ony contributes to the adaptation of the winning reference mode and its neighborhood (i.e., x i ony contributes to the earning of the topoogica order between the winning reference mode and its neighborhood). However, in the E agorithm (cf. Eqs. (35)-(36)), each data sampe x i contributes proportionay to the adaptation of each reference mode and its neighborhood according to the posterior probabiities γ (t) k i for k, 2,,. ) E - a DA variant of E for : As with the E agorithm, we can appy E to a arge neighborhood and obtain different map configurations by graduay reducing the neighborhood size. The term k γ(t) k i h k in Eqs. (35)-(36) can be considered as a kind of posterior probabiity, π( x i ; Θ (t), h), of the reference mode θ (t) for x i, which is aso constrained by the neighborhood function. ith a arge σ vaue in h k (Eq. (29)), π( x i ; Θ (t), h) for, 2,,, wi be neary a uniform distribution due to the sma variation in the vaues of γ (t) k i for k, 2,,, and the sma variation in the vaues of h k for k, 2,,, for each case of. Hence, a the reference modes wi be moved to ocations near the center of the data sampes. hen the neighborhood size is reduced to zero (i.e., h k δ k ), the E agorithm becomes the E agorithm for earning s with equa mixture weights. As with the anneaing interpretation of γ ( t) i r r r x i (a) E r.. r γ eighted winner x i ( t) 4 i (b) E r γ i ( t) ig. 3. or each data sampe x i, the adaptation of the reference modes in E is restricted to the winning reference mode and its neighborhood. However, in E, the winner is reaxed to the weighted winners by the posterior probabiities γ (t) k i, for k, 2,,. Each data sampe x i contributes proportionay to the adaptation of each reference mode and its neighborhood according to the posterior probabiities. E, E can be viewed as a topoogy-constrained deterministic anneaing variant of the E agorithm for earning s with equa mixture weights 5. 2) omputationa cost: omparing Eqs. (34)-(36) to Eqs. (26)-(28), we can see that, in each earning iteration, E and E have a simiar computationa cost in the E-step, but the former needs additiona (N) mutipication and addition operations for updating the mode parameters in the -step. D. The DAE agorithm imiar to the derivation of the deterministic anneaing E (DAE) agorithm for earning s [2], we deveoped a DAE agorithm for the proposed b, caed the DAE agorithm. ith the mixture ikeihood defined in Eq. (22), DAE first derives the posterior density in the E- step using the principe of maximum entropy. oowing the derivation of the posterior probabiity in [2] with the current 5 E yieded a simiar resut on the one-dimension, two-component aussian mixture probem in ig. 2; however, we do not present it here to avoid redundancy.

9 EEE TRANATN N NERA NETRK, V. XX, N., 9 mode s parameter set Θ (t), we obtain the posterior probabiity of the kth mixture component for x i as foows: τ (t) k i p s (x i k; Θ (t), h) β j p s(x i j; Θ (t), h) β exp(β h k og r (x i ; θ (t) )) j exp(β h j og r (x i ; θ (t) )). (37) Then, the auxiiary function to be minimized is sβ (Θ; Θ (t) ) N i k τ (t) k i og p s(x i, k; Θ, h), (38) and the re-estimation formuae for the mean vectors and covariance matrices are N µ (t+) i ( k τ (t) k i h k)x i N i ( Σ (t+) k τ (t) N i ( k τ (t) k i h k), (39) k i h k)(x i µ (t+) N i ( k τ (t) k i h k) )(x i µ (t+) ) T (4) for, 2,,. Note that the re-estimation formuae for DAE are the same as those for E, except that γ (t) (t) k i is repaced by τ k i. /β corresponds to the temperature that contros the anneaing process, in which a high temperature is appied initiay. Then, the system is cooed down by graduay reducing the temperature. hen /β, the DAE agorithm becomes the E agorithm; however, when /β, it is equivaent to the E agorithm. n other words, DAE can be viewed as a deterministic anneaing variant of E and E. y considering certain cases and approximations of - DAE, E, and E, we summarize the famiy of E-based approaches for aussian mode-based custering discussed in this section in ig. 4. oth E under the mixture-ikeihood criterion and E under the cassificationikeihood criterion are widey used mode-based data custering methods. E (E) can be appied instead of E (E) in mode-based custering if we want to preserve the spatia reationships between the resuting data custers on a network. ince DAE is a DA variant of E and E, it can be appied in mode-based data custering under both mixture-ikeihood and cassification-ikeihood criteria. ) omputationa cost: omparing Eqs. (39)-(4) to Eqs. (35)-(36), we can see that DAE and E have simiar computationa costs in each earning iteration. V. EXERENT RET A. Experiments on the organizing property Data set description: e conducted experiments on two types of data: a synthetic data set and a rea-word data set. The synthetic data set consisted of 5 points uniformy distributed in a unit square. or the rea-word data set, we used the training set of cass in the en-ased Recognition of DAE / β E / β E hk hk hk δ δ δ k k topoogyconstrained anneaing k topoogyconstrained anneaing DAE for / β E for / β E for ig. 4. The famiy of aussian mode-based custering agorithms derived from the DAE, E and E agorithms. δ k if k ; otherwise, δ k. Handwritten Digits database (denoted as enrecdigits ) in the achine earning Database Repository [35]. The data set consists of 82 6-dimensiona vectors. To demonstrate the map-earning process, we used the first two dimensions of the feature vectors as data for simuations. As a pre-processing step, we scaed down each eement of the vectors in enrecdigits to / of its origina vaue to avoid numerica traps. Experiment setup: n the experiments, an 8 8 equay spaced square attice in a unit square was used as the structure of the network. or the neighborhood function, we used the aussian kerne h k in Eq. (29). e evauated E, E, DAE, and Kohonen- aussian (Kohonen s batch agorithm that uses aussian reference modes) in 2 independent random initiaization trias and two setups for σ in h k. or each tria, data sampes were randomy seected from the data set as the initia mean vectors, µ, µ 2,, µ, of the reference modes, which were mutivariate aussians with fu covariance matrices. The initia covariance matrix Σ was set as ρ, where ρ min k { µ µ k }, for, 2,,. To avoid the singuarity probem, we appied the variance imiting step to the covariance matrices during the earning process. f the vaue of any eement of the covariance matrix was ess than., it was set at.. ) Resuts using the synthetic data: e first demonstrate the map-earning processes of E, E, and DAE using one of the 2 random initiaizations by showing the configurations of the aussian means on the maps, and then summarize the overa resuts of a the initiaizations. imuations using E: ig. 5 shows two simuations using the E agorithm. n the first simuation, E is run with the random initiaization in ig. 5 (a) and a fixed σ of in h k. As shown in ig. 5 (b), the agorithm s earning converges to an unordered map. n the second simuation, E starts with the same random initiaization as that in ig. 5 (a), but with a arger σ of. hen it converges at the

10 EEE TRANATN N NERA NETRK, V. XX, N., current σ vaue, σ is reduced by. Then, the agorithm is appied again with the new σ vaue and the reference modes obtained in the previous phase. This process continues unti E converges at σ. igs. 5 (c), (d), (e), and (f) depict the maps obtained when σ, 5,, and, respectivey. e can expain the second simuation in terms of anneaing (cf. ec. -): hen using E, we start with a arger σ vaue (a higher temperature) so that the objective function is simpe enough to be optimized. Then, we obtain the target map configuration by graduay reducing the vaue of σ (the temperature). Though the reduction in σ produces a more compex objective function for optimization, E can sti earn we because the reference modes obtained at the arger σ vaue provide a sound initiaization for the next earning phase at the smaer σ vaue. imuations using E: e conducted two simiar simuations using the E agorithm. n the first simuation, E was run with the random initiaization in ig. 6 (a) (the same as that in ig. 5 (a)) and a fixed σ of. As shown in ig. 6 (b), the earning of E converged to an unordered map. n the second simuation, E started with the random initiaization in ig. 6 (a) and a arger σ of. Then, the vaue of σ was graduay reduced to in decrements. igs. 6 (c), (d), (e), and (f) depict the maps obtained when E converges at σ, 5,, and, respectivey. imiar to E, we can interpret the reduction of σ in E as an anneaing process (cf. ec. -), which overcomes the initiaization issue. omparing igs. 6 (c)-(d) to igs. 5 (c)-(d), we observe that the map obtained by E is more concentrated than that obtained by E for the same σ vaue. This may be because E earns the map in a more goba manner than E, as noted in ec. -. n other words, each data sampe contributes to a the neurons in a more goba manner in E than in E. imuations using DAE: ig. 7 depicts the simuations using the DAE agorithm with the same random initiaization as that in ig. 5 (a) and ig. 6 (a). The vaue of σ is aso fixed at, and the initia vaue of β is set to. hen DAE converges at a β vaue, it is appied again with β new β.6 and the reference modes obtained in the previous phase. e stop the earning process at β n our experience, it is appropriate to set the maximum vaue of β within the range to 2 for practica appications. hen β, the temperature is high enough to ensure a smooth objective function. Therefore, according to the parameter update rues of DAE, the reference modes form a compact ordered map via atera interactions near the center of the data sampes, even though the neighborhood size is sma (σ in this case). hen β.4 and 7.592, DAE is amost equivaent to E and E, respectivey. n these two cases, DAE converges to the ordered maps in ig. 7 (f) and ig. 7 (i), respectivey. However, as shown in igs. 5 (a)-(b) and igs. 6 (a)-(b), E and E do not converge to an ordered map when σ, which demonstrates that the anneaing process of DAE overcomes the initiaization probem of E and E when σ. Note that DAE may not be abe to obtain any ordered map during the anneaing process if the vaue of σ is too sma to form an ordered map at a sma β vaue. Discussion: The experiment resuts obtained by the three proposed agorithms and Kohonenaussian for the 2 random initiaizations are summarized in Tabe. evera concusions can be drawn from the resuts. irst, E often converges to an ordered map even at a sma, fixed σ vaue (σ in the experiments); but Kohonenaussian and E sedom do so. This may be because E earns the map in a more goba way, as noted in ec. -; hence, it is ess sensitive to the initiaization of the parameters when σ is sma. The resuts for Kohonenaussian and E are simiar. This may be because they ony differ in the winner seection strategy. econd, the initiaization issue of Kohonenaussian, E and E can be overcome by using a arger σ vaue ( in the experiments) initiay, and then graduay reducing the vaue to the target σ vaue ( in the experiments). The reduction of σ can be interpreted as an anneaing process (cf. ec. -, ec. -2, and ec. -). Third, the experiment resuts show that DAE overcomes the initiaization issue of E and E at a sma σ vaue ( in the experiments) using the anneaing process, which is controed by the temperature parameter β. TAE RET ATN N KHNENAAN, E, E, AND DAE N 2 NDEENDENT RAND NTAZATN TRA N THE YNTHET DATA. THE ARTH ERE RN TH T ET R σ N h k. HEN σ, KHNENAAN EEDED N NVERN T AN RDERED A N NE RAND NTAZATN AE (:), T AED N THE REANN AE (:9). etup for σ σ σ initiay, and is reduced to in decrements Kohonenaussian : :2 :9 : E : :2 :9 : E :5 :2 :5 : DAE :2 - : - 2) Resuts using enrecdigits : e aso conducted experiments on rea-word data using the setups for the neighborhood function described in ec. V-A. Tabe summarizes the resuts obtained by the four b earning agorithms. rom the resuts, we can draw the same concusions as those made for the experiment resuts on the synthetic data. igs. 8, 9, and demonstrate, respectivey, the map-earning processes of E, E, and DAE using one of the 2 random initiaizations. omparing igs. 8, 9, and, we observe that these three agorithms obtain rather different resuts. E and E usuay obtain different maps because they earn the maps based on different custering criteria (cassification-ikeihood vs. mixtureikeihood). DAE and E (or E) usuay obtain different resuts because DAE s anneaing is achieved by increasing the β vaue, whie E s (or E s) anneaing is achieved by decreasing the σ vaue. omparing igs. 9 (f) and (f), athough DAE becomes equivaent to E when the vaue of β is increased to.4, their search paths on the objective function surface are different because they have rather different seed modes (ig. (e) vs. ig.

11 EEE TRANATN N NERA NETRK, V. XX, N.,.... (a) random ini.. (b) σ with rand. ini.. (c) σ with rand. ini..... (d) σ 5. (e) σ. (f) σ ig. 5. The map-earning process obtained by running the E agorithm on the synthetic data. imuation ((a)-(b)): hen E is run with the random initiaization in (a) and σ, it converges to the unordered map in (b). imuation 2 ((a) and (c)-(f)): E starts with σ and the random initiaization in (a). Then, the vaue of σ is reduced to in decrements..... (a) random ini.. (b) σ with rand. ini.. (c) σ with rand. ini..... (d) σ 5. (e) σ. (f) σ ig. 6. The map-earning process obtained by running the E agorithm on the synthetic data. imuation ((a)-(b)): hen E is run with the random initiaization in (a) and σ, it converges to the unordered map in (b). imuation 2 ((a) and (c)-(f)): E starts with σ and the random initiaization in (a). Then, the vaue of σ is reduced to in decrements. 9 (e)). Therefore, they converge to different oca maxima of the objective function and obtain different maps. ikewise, athough DAE becomes equivaent to E when the vaue of β is increased to 7.592, they converge to different oca maxima of the objective function and obtain different maps (ig. (i) vs. ig. 8 (f)).

12 EEE TRANATN N NERA NETRK, V. XX, N., (a) random ini.. (b) σ, β. (c) σ, β (d) σ, β 9. (e) σ, β 55. (f) σ, β (g) σ, β (h) σ, β (i) σ, β ig. 7. The map-earning process obtained by running the DAE agorithm on the synthetic data. The vaue of σ is fixed at, whie vaue of β is initiaized at and increased in mutipes of.6 up to TAE RET ATN N KHNENAAN, E, E, AND DAE N 2 NDEENDENT RAND NTAZATN TRA N ENREDT. THE ARTH ERE RN TH T ET R σ N h k. HEN σ, KHNENAAN EEDED N NVERN T AN RDERED A N NE RAND NTAZATN AE (:), T AED N THE REANN AE (:9). etup for σ σ σ initiay, and is reduced to in decrements Kohonenaussian : :2 :9 : E :2 :2 :8 : E :4 :2 :6 : DAE :2 - : -. Experiments to evauate the performance of data custering and visuaization Data set description: n this section, we evauate the performance of data custering and visuaization of the proposed agorithms on two data sets from the achine earning Database Repository [35]: the test set of the image segmentation database (denoted as mgeg), which consists of 2, 9-dimensiona feature vectors; and the Ecoi data set (denoted as Ecoi), which consists of dimensiona feature vectors. Here, we used the fu vector, rather than ony two dimensions, in the experiments. As a pre-processing step, we scaed down each eement of the data vectors in mgeg to / of its origina vaue to avoid numerica traps. Experiment setup: To avoid the singuarity probem that often occurs when using E or E to earn fu covariance s, we used diagona covariance aussians in the experiments. e aso appied the variance imiting step, in which the minimum vaue for a variance was set at.. or the b earning agorithms, we used five configurations for the network structure; they are 3 3, 4 4, 5 5, 6 6, and 7 7 attices equay spaced in a unit square. e used the aussian kerne h k in Eq. (29) as the neighborhood function. To avoid ambiguity, when the DAE and DAE agorithms are appied in data custering based on the

13 EEE TRANATN N NERA NETRK, V. XX, N., 3. (a) random ini.. (b) σ with rand. ini.. (c) σ with rand. ini.. (d) σ 5. (e) σ. (f) σ ig. 8. The map-earning process obtained by running the E agorithm on enrecdigits. imuation ((a)-(b)): hen E is run with the random initiaization in (a) and σ, it converges to the unordered map in (b). imuation 2 ((a) and (c)-(f)): E starts with σ and the random initiaization in (a). Then, the vaue of σ is reduced to in decrements.. (a) random ini.. (b) σ with rand. ini.. (c) σ with rand. ini.. (d) σ 5. (e) σ. (f) σ ig. 9. The map-earning process obtained by running the E agorithm on enrecdigits. imuation ((a)-(b)): hen E is run with the random initiaization in (a) and σ, it converges to the unordered map in (b). imuation 2 ((a) and (c)-(f)): E starts with σ and the random initiaization in (a). Then, the vaue of σ is reduced to in decrements. cassification-ikeihood criterion, they are denoted as DAE and DAE ; and they are denoted as DAE and DAE when appied in data custering based on the mixture-ikeihood criterion. A the agorithms discussed here were run with random initiaizations generated in the same way described in ec V-A.

14 EEE TRANATN N NERA NETRK, V. XX, N., 4. (a) random ini.. (b) σ, β. (c) σ, β 56. (d) σ, β 9. (e) σ, β 55. (f) σ, β.4. (g) σ, β (h) σ, β (i) σ, β ig.. The map-earning process obtained by running the DAE agorithm on enrecdigits. The vaue of σ is fixed at, whie vaue of β is initiaized at and increased in mutipes of.6 up to ) Experiments on mgeg by using E and - DAE : irst, we evauated the data custering performance of Kohonenaussian, E, and DAE in terms of the cassification og-ikeihood defined in Eq. (7). The performance was compared with that of E and DAE. The setting for each agorithm was as foows: DAE : The vaue of β was set at initiay, and increased to by the formua β new β.2. E: The vaue of σ in h k was set at initiay, and reduced to (i.e., h k δ k ) in.2 decrements. DAE : oth the vaues of β and σ in h k were set at initiay. To perform data custering using the cassification-ikeihood criterion, the vaue of β was increased to by the formua β new β.2 first; then, the vaue of σ was reduced to in.2 decrements. Kohonenaussian: The vaue of σ in h k was set at initiay, and reduced to in.2 decrements every 3 earning iterations 6. e ran the agorithms except E with 2 independent trias using 9, 6, 25, 36, 49 aussian components. To conduct a fair comparison of E and the proposed approaches, we ran E many trias ti the accumuated execution time was cose to that of one E tria. The mean and standard deviations (error bars) of the cassification og-ikeihood vaues over the trias for each agorithm and the best resuts of E (denoted as E-best) are shown in ig.. Note that, in the figure, we sighty separate the resuts associated with a specific aussian component number in order to distinguish between them. rom the figure, we observe that the custering performance of E, DAE, and Kohonenaussian is cose to that of DAE. oreover, they obtain arger and more stabe cassification og-ikeihoods than E. These 6 n our impementation for E, E, and DAE, the phase transition occurs when the ikeihood increase is beow a threshod or the number of earning iterations exceeds 3 in the current phase. However, Kohonenaussian does not have the convergence property; thus, we ran 3 iterations for each phase of the agorithm.

15 EEE TRANATN N NERA NETRK, V. XX, N., 5 ig.. The data custering performance of E, DAE, E, DAE, and Kohonenaussian on mgeg in terms of the cassification og-ikeihood. resuts are rationa since E is a topoogy-constrained DA variant of the E agorithm, and DAE is an anneaing variant of E with the settings for β and σ here. Next, we evauated the data visuaization abiity of Kohonenaussian, E, and DAE. To visuaize the data custers on the network, each data sampe was assigned to its winning reference mode, and then randomy potted within the neuron that associates to the reference mode [36]. Here, the winner seection strategy for DAE was the same as that of E (i.e., the -step of E). ig. 2 shows the projections of the data sampes on 7 7 attices obtained by different agorithms. The mgeg data set is comprised of seven casses, namey brickface:, sky:, foiage:, cement:, window:, path:, and grass: ; each cass consists of 3 data sampes. ig. 2 (a) depicts the initia mapping of the data obtained with a random initiaization for the reference modes. As we can see from the figure, the data custers are randomy projected to the neurons (attice nodes) and the network does not preserve the topoogica (spatia) reationships among the custers. igs. 2 (b)-(f) shows the resuts of the three b earning agorithms obtained with the random initiaization in ig. 2 (a). e see that they can preserve the topoogica reationships among the data custers on the network. oreover, it seems that the data sampes of casses,,, and are more distinguishabe and we-grouped on the network than those of the other casses. n particuar, from igs. 2 (b), (c) and (d), we see that ony cass is separated from the other casses with empty nodes; thus, we may infer that the separabiity between and the other casses is higher than that between the remaining casses. or E, as shown in igs. 2 (c) and (d), the network contains ess empty nodes at σ than at σ.6. This may be because in the former case the atera interactions have vanished, and thus the reference modes are adapted to more fit the data distribution than the atter case. omparing ig. 2 (b) to ig. 2 (d), we see that the data projection resuts of Kohonenaussian and E are rather different athough they obtain simiar cassification og-ikeihoods in ig.. However, we can draw simiar observations from the two figures. or exampe, the data sampes of cass are more cose to those of cass and than those of cass. igs. 2 (e) and (f) show the resuts obtained by DAE. e see that the resut in ig. 2 (f) is rather different from that in ig. 2 (d) athough DAE has become equivaent to E when σ. This may be because these two approaches search on the objective function surface aong different paths and converge to different oca maxima, as the expanation for the difference of igs. 9 (f) and (f) in ec. V-A2. 2) Experiments on mgeg by using E and - DAE : irst, we evauated the performance of E and DAE in earning a aussian mixture mode with equa mixture weights. The objective function was the og mixtureikeihood function in Eq. (2) with equa mixture weights. e compared the performance with that of E and DAE. The setting for each agorithm was as foows: DAE : The vaue of β was set at initiay, and increased to by the formua β new β.2. E: The vaue of σ in h k was set at initiay, and reduced to (i.e., h k δ k ) in.2 decrements. DAE : oth the vaues of β and σ in h k were set at initiay. To perform data custering using the mixture-ikeihood criterion, the vaue of β was increased to by the formua β new β.2 first; then, the vaue of σ was reduced to in.2 decrements. e ran DAE, E, and DAE with 2 independent random initiaization trias. imiar to the experiments on E, we ran E many trias ti the accumuated execution time was cose to that of one E tria. The mean and standard deviations (error bars) of the og mixture-ikeihood vaues over the trias for each agorithm and the best resuts of E (denoted as E-best) are shown in ig. 3. rom the figure, it is cear that DAE, E, and DAE achieve simiar performance. oreover, they obtain arger and more stabe og mixture-ikeihoods than E. The resuts are rationa since E is a topoogy-constrained DA variant of the E agorithm, and DAE is an anneaing variant of E with the settings for β and σ here. Next, we evauated the data visuaization abiity of E and DAE. e ran these two agorithms with a 7 7 attice and the initia reference modes used in ec. V- for evauating E; therefore, the initia projection of the data was the same as that shown in ig. 2 (a). hen custering the data sampes, each sampe was assigned to its winning reference mode using E s winner seection strategy. rom ig. 4, we observe that these two agorithms can preserve topoogica reationships among data custers (sampes). imiar to the resuts reveaed by ig. 2, data sampes of casses,, and are more distinguishabe than those of the other casses.

16 EEE TRANATN N NERA NETRK, V. XX, N., 6.. (a) random ini... (b) Kohonenaussian (σ).. (c) E (σ.6).. (d) E (σ).. (e) DAE_ (β, σ).. (f) DAE_ (β, σ) ig. 2. Data visuaization for mgeg by running Kohonenaussian ((b)), E ((c), (d)), and DAE ((e), (f)) with the random initiaization in (a). The network structure is a 7 7 equay spaced square attice in a unit square. omparing ig. 4 (b) to igs. 2 (b) and (d), it is cear that E produces ess empty nodes than Kohonenaussian and E when the vaue of σ is reduced to zero. t may be expained as foows. or Kohonenaussian and E, in the case of σ, they become the E (K-means type) agorithm where each data sampe ony adapts its winner. However, when σ, E becomes the E agorithm where each data sampe adapts a the reference modes

17 EEE TRANATN N NERA NETRK, V. XX, N., 7 the custers are spherica and of equa voume. n this case, the E agorithm is equivaent to the TVQ agorithm in [22], which was deveoped for noisy vector quantization. t is aso equivaent to the batch earning agorithm described in [2], which empoys an energy function in the earning phase of a. However, E was deveoped from a different perspective. e consider the earning of a b as a mode-based custering process. y this perspective, a couping-ikeihood mixture mode is deveoped first, and an objective function is then formuated based on the cassification ikeihood criterion. oreover, the connection between the couping-ikeihood mixture mode and the aussian mixture mode heps interpret E as a topoogy-constrained DA variant of the E agorithm for. ig. 3. earning a aussian mixture mode by appying E, DAE, E, and DAE to mgeg. according to their posterior probabiities; thus, the modes are more adapted to fit the data than the modes of the other two agorithms. 3) Experiments on Ecoi: e conducted experiments on Ecoi using the agorithms appied to mgeg in ec. V- and ec. V-2. igs. 5 (a) and (b) show the data custering performance of each agorithm in terms of the cassification og-ikeihood and the og mixture-ikeihood, respectivey. imiar to the resuts on mgeg, the b earning agorithms aso achieve decent data custering performance on Ecoi. n ig. 6, for each agorithm we show the resut at the σ vaue that the cass separabiity can be best visuaized on the network. The Ecoi data set is comprised of eight casses, namey cp:, im:, pp:, im:, om:, om:, im:, and im:. The numbers of data sampes are 43, 77, 52, 35, 2, 5, 2, and 2, respectivey. rom the figure, we can see that topoogica reationships among data custers are preserved we and data casses can be roughy separated on the network. V. REATN T THER ARTH n this section, we expore the differences and reations between the proposed agorithms and other reated agorithms. A. or E n [37], Ambroise and ovaert proposed a topoogy preserving E (TE) agorithm that introduces topoogica constraints in the E agorithm. f Kohonen s winner seection strategy is appied, E is equivaent to TE whose mixture weights are equay fixed. n E, the covariance matrix of a aussian component, Σ, can have different parameterizations for different geometric interpretations []. hen Σ λ for, 2,, (where λ is a sma positive constant and denotes the identity matrix),. or E and DAE n DAE, when Σ λ for, 2,,, DAE is equivaent to the TVQ agorithm [23], which earns the parameters by maximizing their density function predicted by the maximum entropy principe. n TVQ, the inverse temperature, β, is the agrange mutipier introduced for the constrained optimization induced by the maximum entropy principe. Heskes [25] extends TVQ s cost function to an expected quantization error. Then, an objective function is obtained by weighting the quantization error with the inverse temperature β and pusing it to an entropy term that introduces the anneaing process. ith the resuting objective function, Heskes obtained an agorithm identica to TVQ. The impementations for deterministic anneaing in TVQ and Heskes agorithm can aso be found in [38], [39], where the DA is appied for vector quantization. DAE differs from raepe et a. s TVQ and Heskes agorithm in the foowing ways. irst, the deterministic anneaing processes are impemented differenty. DAE is a DAE agorithm deveoped to earn the mixture modes with a deterministic anneaing process, which is impemented based on predicting the posterior distribution in the E-step using the maximum entropy principe. econd, the case of β was not we addressed in raepe et a. s and Heskes papers. This may be because their origina goa was to deveop a DA earning for TVQ. hen β is fixed at, however, DAE becomes the E agorithm. oreover, the connection between the proposed couping-ikeihood mixture mode and the aussian mixture mode heps interpret E as a topoogyconstrained DA variant of the E agorithm for. V. NN onsidering the earning of a probabiistic sef-organizing map (b) as a mode-based custering process, we deveop a couping-ikeihood mixture mode for b, and derive three E-type earning agorithms, namey the E, E, and DAE agorithms, for earning the mode (b). The proposed agorithms improve Kohonen s earning agorithms by incuding a cost function, an E-based convergence property, and a probabiistic framework. n addition, the proposed agorithms provide some insights into the choice of neighborhood size that woud ensure topographic ordering. rom the experiment resuts, we observe that

18 EEE TRANATN N NERA NETRK, V. XX, N., 8.. (a) E (σ.6).. (b) E (σ).. (c) DAE_ (β, σ).. (d) DAE_ (β, σ) ig. 4. Data visuaization for mgeg by running E ((a), (b)) and DAE ((c), (d)) with the random initiaization in ig. 2 (a). The network structure is a 7 7 equay spaced square attice in a unit square. (a) (b) ig. 5. The data custering performance on Ecoi in terms of (a) the cassification og-ikeihood and (b) the og mixture-ikeihood.

19 EEE TRANATN N NERA NETRK, V. XX, N., 9.. (a) random ini... (b) Kohonenaussian (σ.6).. (c) E (σ.6).. (d) E (σ.8).. (e) DAE_ (β, σ).. (f) DAE_ (β,σ) ig. 6. Data visuaization for Ecoi by running (b) Kohonenaussian, (c) E, (d) E, (e) DAE, and (f) DAE with the random initiaization in (a). The network structure is a 7 7 equay spaced square attice in a unit square.

20 EEE TRANATN N NERA NETRK, V. XX, N., 2 the earning performance of E is very sensitive to the initia setting of the reference modes when the neighborhood is sma. onversey, it is not sensitive to the initia condition when the neighborhood is sufficienty arge. To dea with the initiaization probem, we first run E with a arge neighborhood, and then graduay reduce the neighborhood size unti the earning converges to the desired map. hen using a sma neighborhood, E is ess sensitive to the initiaization than E. However, to earn an ordered map, E sti needs to start with a arge neighborhood. n both E and E, the neighborhood shrinking can be interpreted as an anneaing process that overcomes the initiaization issue. Aternativey, we can appy DAE, which is a deterministic anneaing variant of E and E, to earn a map. n our experiments, DAE overcomes the initiaization issue of E and E via the anneaing process controed by the temperature parameter. oreover, through the comparison of E and Kohonen s batch agorithm, we can aso appy the DA interpretation of neighborhood shrinking to Kohonen s agorithms to expain why they need to start with a arge neighborhood size. e have aso shown that the E and E agorithms can be interpreted, respectivey, as topoogy-constrained deterministic anneaing variants of the E and E agorithms for aussian mode-based custering. The experiment resuts show that our proposed b earning agorithms achieve an effective data custering performance, whie maintaining the topoogy-preserving property. AENDX Theoreticay, the mixture weights of the coupingikeihood mixture mode in Eq. (22) can be earned automaticay. oowing the derivations of the E, E, and DAE agorithms in ecs. -, -, and -D, the earning rues for the mixture weights are derived as foows. osterior distribution: or E and E, γ (t) k i or DAE, τ (t) k i w s (k) (t) exp( h k og r (x i ; θ (t) )) j w s(j) (t) exp( h j og r (x i ; θ (t) )). (4) (w s (k) (t) exp( h k og r (x i ; θ (t) ))) β j (w s(j) (t) exp( h j og r (x i ; θ (t) ))). β Re-estimation formuae: or E, or E, w s (k) (t+) N w s (k) (t+) N (42) (t) ˆ k. (43) N i γ (t) k i. (44) or DAE, w s (k) (t+) N N i τ (t) k i. (45) The mean vectors and covariance matrices in E, E, and DAE agorithms are updated using Eqs. (27)- (28), Eqs. (35)-(36), and Eqs. (39)-(4), respectivey, where γ (t) (t) k i and τ k i are computed by Eqs. (4) and (42), respectivey. However, in our experience, if the mixture weights are earned in the three agorithms, the earning of topoogica order is frequenty dominated by some particuar mixture components, which makes it difficut to obtain an ordered map. As an exampe, we appied E to the synthetic data set, which consisted of 5 points uniformy distributed in a unit square. The network structure was a 4 4 equay spaced square attice in a unit square. A the mixture weights were set at /6 initiay. The vaue of σ in the neighborhood function (i.e., Eq. (29)) was set at. The resuts are shown in igs. 7 (a)-(e). rom the figures, we observe that the map shrinks to near a ine after the agorithm converges (with 8 iterations). This phenomenon can be verified by inspecting the vaues of mixture weights during the earning process. As shown in Tabe V, after the agorithm converges, most of the mixture weights become zero and the earning ony maximizes the oca couping-ikeihoods of neurons 4 and 3, whose mixture weights are 4 and 96, respectivey. n contrast, as shown in ig. 7 (f), if the mixture weights are equay fixed at /6 throughout the earning process, E converges to an ordered map. or E and DAE, we obtained the simiar resuts. REERENE []. raey and A. E. Raftery, How many custers? hich custering method? Answers via mode-based custer anaysis. omputer Journa 4: , 998. [2]. raey and A. E. Raftery, ode-based custering, discriminant anaysis, and density estimation, Journa of The American tatistica Association, vo. 97, no. 458, pp. 6-63, 22. [3]. Zhong and J. hosh, A unified framework for mode-based custering, Journa of achine earning Research vo. 4 no. 6, pp. -37, 23. [4]. raey and A. E. Raftery, ayesian reguarization for norma mixture estimation and mode-based custering, Journa of assification, vo. 24, no. 2, pp. 55-8, 27. [5].. h and A. E. Raftery, ode-based custering with dissimiarities: A ayesian approach, Journa of omputationa and raphica tatistics, vo. 6, no. 3, pp , 27. [6]. J. ymons, ustering criteria and mutivariate norma mixture, iometrics, vo. 37, pp , 98. [7]. anesaingam, assification and mixture approach to custering via maximum ikeihood, Appied tatistics, vo. 38, no. 3, pp , 989. [8]. eeux and. ovaert, A cassification E agorithm for custering and two stochastic versions, omputationa tatistics & Data Anaysis, vo. 4, no. 3, pp , 992. [9] J. D. anfied and A. E. Raftery, ode-based aussian and non- aussian custering, iometrics vo. 49, no. 3, pp , 993. [] Jeff A. imes, A gente tutoria of the E agorithm and its appication to parameter estimation for aussian mixture and hidden arkov modes, nternationa omputer cience nsitute Technica Reports, TR- 97-2, Apri 998. []. J. cachan and T. Krishnan, The E agorithm and extensions, New York: John iey, 997. [2] N. eda and R. Nakano, Deterministic anneaing E agorithm, Neura Networks, vo., no. 2, pp , 998.

21 EEE TRANATN N NERA NETRK, V. XX, N., (a) initiaization. (b) weights are updated, iter5. (c) weights are updated, iter (d) weights are updated, iter6. (e) weights are updated, iter8. (f) fixed equa weights ig. 7. The map-earning process obtained by running the E agorithm on the synthetic data with an ordered initiaization in (a). imuation ((a)-(e)): The mixture weights are initiaized at, and updated in the earning process; the agorithm starts with the initiaization in (a) and converges to the 6 unordered map in (e). imuation 2 ((a) and (f)): E is performed with equa mixture weights throughout the earning process; the agorithm starts with the initiaization in (a) and converges to the map in (f). The network structure is a 4 4 square attice; the vaue of σ is set at. TAE V THE XTRE EHT EARNED Y E TH THE NTAZATN N. 7 (A). THE XTRE EHT ARE NTAZED AT 6. weight index nitia iter iter iter iter [3] N. eda, R. Nakano, Z. hahramani, and. Hinton, E agorithm for mixture modes, Neura omputation, vo. 2, no. 9, pp , 2. [4].. heng, H.. ang, and H.. u A mode-seection-based sefspitting aussian mixture earning with appication to speaker identification, ERA Journa on Appied igna rocessing, vo. 24, no. 7, pp , 24. [5] T. Kohonen, ef-rganizing aps, pringer, 2. [6] T. Kohonen, The sef-organizing maps, Neurocomputing, vo. 2, pp. -6, 998. [7]. ishop,. vensén, and. iiams, The generative topographic mapping, Neura omputation vo., no., pp , 998. [8] V. V. Toat, An anaysis of Kohonen s sef-organizing maps using a system of energy functions, ioogica ybernetics, vo. 64, no. 2, pp , 99. [9] E. Erwin, K. bermayer, and K. chuten, ef-organizing maps: ordering, convergence properties and energy functions, ioogica ybernetics, vo. 67, no., pp , 992. [2] Y. heng, onvergence and ordering of Kohonen s batch map, Neura omputation, vo. 9, no. 8, pp , 997. [2].. uttre, ef-organization: A derivation from fist principes of a cass of earning agorithm, in roc EEE nt. Joint onf. Neura Networks, 989. [22].. uttre, ode vector density in topographic mappings: caar case, EEE Trans. Neura Networks, vo. 2, no. 4, pp , 99. [23] T. raepe,. urger, and K. bermayer, hase transitions in stochastic sef-organization maps, hysica Review E, vo. 56, no. 4, pp , 997. [24] T. raepe,. urger, and K. bermayer, ef-organizing maps: eneraizations and new optimization techniques, Neurocomputing, vo. 2, pp. 73-9, 998. [25] T. Heskes, ef-organizing maps, vector quantization, and mixture modeing, EEE Trans. Neura Networks, vo. 2, no 6, pp , 2. [26] T... how and. u, An onine ceuar probabiistic seforganizing map for static and dynamica data sets, EEE Trans. ircuit and ystems, art, vo. 5, no. 4, pp , 24. [27]. u and T... how, R: A new visuaization method by hybridizing mutidimensiona scaing and sef-organizing map, EEE Trans. Neura Networks, vo. 6, no 6, pp , 25. [28].. uttre, A ayesian anaysis of sef-organizing maps, Neura omputation, vo. 6, no. 5, pp , 994. [29]. Anouar,. adran, and. Thiria, robabiistic sef-organizing map and radia basis function networks, Neurocomputing, vo. 2, pp.93-96, 998. [3] J. ampinen and T. Kostiainen, enerative probabiity density mode in the sef-organizing map,. eiffert and. Jain, editors, ef-organizing neura networks: Recent advances and appications, pp , hysica Verag, 22. [3].. Van Hue, Joint entropy maximization in kerne-based topographic maps, Neura omputation, vo. 4, no. 8, pp , 22. [32].. Van Hue, aximum ikeihood topographic map formation, Neura omputation, vo. 7, no. 3, pp , 25.

22 EEE TRANATN N NERA NETRK, V. XX, N., 22 [33] J. J. Verbeek, N. Vassis, and. J. A. Kröse, ef-organizing mixture modes, Neurocomputing, vo. 63, pp , 25. [34] J. um,.. eung,.. han, and. Xu, Yet another agorithm which can generate topography map, EEE Trans. Neura Networks, vo. 8, no 5, pp , 997. [35] achine earning Repository. mearn/repository.htm [36] T. Hastie, R. Tibshirani, and J. riedman, The eements of statistica earning, pringer, 2. [37]. Ambroise and. ovaert, onstrained custering and Kohonen seforganizing maps, Journa of assification, vo. 3, no. 2, pp , 996. [38] K. Rose, E. urewitz, and.. ox, Vector quantization by deterministic anneaing, EEE Trans. nform. Theory, vo. 38, no. 4, pp , 992. [39] K. Rose, Deterministic anneaing for custering, compression, cassification, regression, and reated optimization probems, roceedings of The EEE, vo. 86, no., pp , 998. Hsin-in ang received the.. and h.d. degrees in eectrica engineering from Nationa Taiwan niversity, Taipei, Taiwan, in 989 and 995, respectivey. n ctober 995, he joined the nstitute of nformation cience, Academia inica, Taipei, Taiwan, as a ostdoctora eow. He was promoted to Assistant Research eow and then Associate Research eow in 996 and 22, respectivey. He was an adjunct associate professor with Nationa Taipei niversity of Technoogy and Nationa hengchi niversity. He was a board member and chair of academic counci of A. He currenty serves as secretary-genera of A and as an editoria board member of nternationa Journa of omputationa inguistics and hinese anguage rocessing. His major research interests incude speech processing, natura anguage processing, spoken diaogue processing, mutimedia information retrieva, and pattern recognition. Dr. ang was a recipient of the hinese nstitute of Engineers (E) Technica aper Award in 995. He is a ife member of A and and a member of A. hih-ian heng received the.. degree in mathematics from Nationa Kaohsiung Norma niversity, Kaohsiung, Taiwan, R..., in 999 and the.. degree in computer science from Nationa hiao Tung niversity, Hsinchu, Taiwan, in 22. He is currenty pursuing the h.d. degree in the Department of omputer cience, Nationa hiao Tung niversity, Taiwan. n 22, he joined the poken anguage roup, hinese nformation rocessing aboratory, nstitute of nformation cience, Academia inica, Taipei, Taiwan, as a Research Assistant. His research interests incude machine earning, pattern recognition, speech processing, and neura networks. Hsin-hia u received the.. degree from Nationa hiao-tung niversity in Eectrica and ommunication engineering in 972, and the.. and h.d. degrees from New exico tate niversity, both in Eectrica and omputer Engineering in 975 and 98, respectivey. rom 98 to 983 he was a ember of the Technica taff at e aboratories. ince 983, he has been on the facuty of the Department of omputer science and nformation engineering at Nationa hiao-tung niversity, in Taiwan, R. He is aso the Taiwan representative of TE onsortium since 23. orm 987 to 988, he served as the director of the department of information management at the Research Deveopment and Evauation ommission, of the Executive Yuan, R. rom , he was a visiting schoar of rinceton niversity. rom 989 to 99, he served as the chairman of the Department of omputer cience and nformation Engineering. rom eptember to December of 994, he was a visiting scientist at raunhofer-nstitut for roduction ystems and Design Technoogy (K), erin ermany. His research interests incude digita signa/image processing, utimedia information processing, and neura networks. Dr. u was the corecipient of the 992 and 993 ong-term est Thesis Award with Koun Tem un and heng hin hiang, and the recipient of the 996 Xerox A paper Award. He has served as a founding member, rogram co-chair (993) and enera co-chair (995) of nternationa ymposium on Artificia Neura Networks. He has been the Technica ommittee on Neura Networks for igna rocessing of the EEE igna rocessing ociety from 997 to 2. He has authored more than technica papers, and two textbooks /XT Anaysis, and ntroduction to neura networks, by un-kung ook o., and Third ave ubishing o., respectivey. Dr. u is a member of the EEE igna rocessing and omputer ocieties, hi Tau hi, and the Eta Kappa Nu Eectrica Engineering Honor ociety.

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing theorem to characterize the stationary distribution of the stochastic process with SDEs in (3) Theorem 3

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University

More information

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008

arxiv: v2 [cond-mat.stat-mech] 14 Nov 2008 Random Booean Networks Barbara Drosse Institute of Condensed Matter Physics, Darmstadt University of Technoogy, Hochschustraße 6, 64289 Darmstadt, Germany (Dated: June 27) arxiv:76.335v2 [cond-mat.stat-mech]

More information

Fast Blind Recognition of Channel Codes

Fast Blind Recognition of Channel Codes Fast Bind Recognition of Channe Codes Reza Moosavi and Erik G. Larsson Linköping University Post Print N.B.: When citing this work, cite the origina artice. 213 IEEE. Persona use of this materia is permitted.

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers

Ant Colony Algorithms for Constructing Bayesian Multi-net Classifiers Ant Coony Agorithms for Constructing Bayesian Muti-net Cassifiers Khaid M. Saama and Aex A. Freitas Schoo of Computing, University of Kent, Canterbury, UK. {kms39,a.a.freitas}@kent.ac.uk December 5, 2013

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li)

Online Appendices for The Economics of Nationalism (Xiaohuan Lan and Ben Li) Onine Appendices for The Economics of Nationaism Xiaohuan Lan and Ben Li) A. Derivation of inequaities 9) and 10) Consider Home without oss of generaity. Denote gobaized and ungobaized by g and ng, respectivey.

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Available online at ScienceDirect. Procedia Computer Science 96 (2016 )

Available online at  ScienceDirect. Procedia Computer Science 96 (2016 ) Avaiabe onine at www.sciencedirect.com ScienceDirect Procedia Computer Science 96 (206 92 99 20th Internationa Conference on Knowedge Based and Inteigent Information and Engineering Systems Connected categorica

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

MONTE CARLO SIMULATIONS

MONTE CARLO SIMULATIONS MONTE CARLO SIMULATIONS Current physics research 1) Theoretica 2) Experimenta 3) Computationa Monte Caro (MC) Method (1953) used to study 1) Discrete spin systems 2) Fuids 3) Poymers, membranes, soft matter

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Soft Clustering on Graphs

Soft Clustering on Graphs Soft Custering on Graphs Kai Yu 1, Shipeng Yu 2, Voker Tresp 1 1 Siemens AG, Corporate Technoogy 2 Institute for Computer Science, University of Munich kai.yu@siemens.com, voker.tresp@siemens.com spyu@dbs.informatik.uni-muenchen.de

More information

An Information Geometrical View of Stationary Subspace Analysis

An Information Geometrical View of Stationary Subspace Analysis An Information Geometrica View of Stationary Subspace Anaysis Motoaki Kawanabe, Wojciech Samek, Pau von Bünau, and Frank C. Meinecke Fraunhofer Institute FIRST, Kekuéstr. 7, 12489 Berin, Germany Berin

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

Physics 235 Chapter 8. Chapter 8 Central-Force Motion Physics 35 Chapter 8 Chapter 8 Centra-Force Motion In this Chapter we wi use the theory we have discussed in Chapter 6 and 7 and appy it to very important probems in physics, in which we study the motion

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method

High Spectral Resolution Infrared Radiance Modeling Using Optimal Spectral Sampling (OSS) Method High Spectra Resoution Infrared Radiance Modeing Using Optima Spectra Samping (OSS) Method J.-L. Moncet and G. Uymin Background Optima Spectra Samping (OSS) method is a fast and accurate monochromatic

More information

A Better Way to Pretrain Deep Boltzmann Machines

A Better Way to Pretrain Deep Boltzmann Machines A Better Way to Pretrain Deep Botzmann Machines Rusan Saakhutdino Department of Statistics and Computer Science Uniersity of Toronto rsaakhu@cs.toronto.edu Geoffrey Hinton Department of Computer Science

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Evolutionary Product-Unit Neural Networks for Classification 1

Evolutionary Product-Unit Neural Networks for Classification 1 Evoutionary Product-Unit Neura Networs for Cassification F.. Martínez-Estudio, C. Hervás-Martínez, P. A. Gutiérrez Peña A. C. Martínez-Estudio and S. Ventura-Soto Department of Management and Quantitative

More information

c 2016 Georgios Rovatsos

c 2016 Georgios Rovatsos c 2016 Georgios Rovatsos QUICKEST CHANGE DETECTION WITH APPLICATIONS TO LINE OUTAGE DETECTION BY GEORGIOS ROVATSOS THESIS Submitted in partia fufiment of the requirements for the degree of Master of Science

More information

Improved Min-Sum Decoding of LDPC Codes Using 2-Dimensional Normalization

Improved Min-Sum Decoding of LDPC Codes Using 2-Dimensional Normalization Improved Min-Sum Decoding of LDPC Codes sing -Dimensiona Normaization Juntan Zhang and Marc Fossorier Department of Eectrica Engineering niversity of Hawaii at Manoa Honouu, HI 968 Emai: juntan, marc@spectra.eng.hawaii.edu

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Theory and implementation behind: Universal surface creation - smallest unitcell

Theory and implementation behind: Universal surface creation - smallest unitcell Teory and impementation beind: Universa surface creation - smaest unitce Bjare Brin Buus, Jaob Howat & Tomas Bigaard September 15, 218 1 Construction of surface sabs Te aim for tis part of te project is

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Kernel pea and De-Noising in Feature Spaces

Kernel pea and De-Noising in Feature Spaces Kerne pea and De-Noising in Feature Spaces Sebastian Mika, Bernhard Schokopf, Aex Smoa Kaus-Robert Muer, Matthias Schoz, Gunnar Riitsch GMD FIRST, Rudower Chaussee 5, 12489 Berin, Germany {mika, bs, smoa,

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception

More information

The Binary Space Partitioning-Tree Process Supplementary Material

The Binary Space Partitioning-Tree Process Supplementary Material The inary Space Partitioning-Tree Process Suppementary Materia Xuhui Fan in Li Scott. Sisson Schoo of omputer Science Fudan University ibin@fudan.edu.cn Schoo of Mathematics and Statistics University of

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Utilization of Chemical Structure Information for Analysis of Spectra Composites

Utilization of Chemical Structure Information for Analysis of Spectra Composites ESANN 214 proceedings, European Symposium on Artificia Neura Networks, Computationa Inteigence and Machine Learning Bruges (Begium), 23-25 Apri 214, i6doccom pub, ISBN 978-28741995-7 Avaiabe from http://wwwi6doccom/fr/ivre/?gcoi=281143244

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

A simple reliability block diagram method for safety integrity verification

A simple reliability block diagram method for safety integrity verification Reiabiity Engineering and System Safety 92 (2007) 1267 1273 www.esevier.com/ocate/ress A simpe reiabiity bock diagram method for safety integrity verification Haitao Guo, Xianhui Yang epartment of Automation,

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Interactive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms

Interactive Fuzzy Programming for Two-level Nonlinear Integer Programming Problems through Genetic Algorithms Md. Abu Kaam Azad et a./asia Paciic Management Review (5) (), 7-77 Interactive Fuzzy Programming or Two-eve Noninear Integer Programming Probems through Genetic Agorithms Abstract Md. Abu Kaam Azad a,*,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

Random maps and attractors in random Boolean networks

Random maps and attractors in random Boolean networks LU TP 04-43 Rom maps attractors in rom Booean networks Björn Samuesson Car Troein Compex Systems Division, Department of Theoretica Physics Lund University, Sövegatan 4A, S-3 6 Lund, Sweden Dated: 005-05-07)

More information

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design

Control Chart For Monitoring Nonparametric Profiles With Arbitrary Design Contro Chart For Monitoring Nonparametric Profies With Arbitrary Design Peihua Qiu 1 and Changiang Zou 2 1 Schoo of Statistics, University of Minnesota, USA 2 LPMC and Department of Statistics, Nankai

More information

An Approximate Fisher Scoring Algorithm for Finite Mixtures of Multinomials

An Approximate Fisher Scoring Algorithm for Finite Mixtures of Multinomials An Approximate Fisher Scoring Agorithm for Finite Mixtures of Mutinomias Andrew M. Raim, Mingei Liu, Nagaraj K. Neercha and Jorge G. More Abstract Finite mixture distributions arise naturay in many appications

More information

arxiv:hep-ph/ v1 15 Jan 2001

arxiv:hep-ph/ v1 15 Jan 2001 BOSE-EINSTEIN CORRELATIONS IN CASCADE PROCESSES AND NON-EXTENSIVE STATISTICS O.V.UTYUZH AND G.WILK The Andrzej So tan Institute for Nucear Studies; Hoża 69; 00-689 Warsaw, Poand E-mai: utyuzh@fuw.edu.p

More information

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled.

Introduction. Figure 1 W8LC Line Array, box and horn element. Highlighted section modelled. imuation of the acoustic fied produced by cavities using the Boundary Eement Rayeigh Integra Method () and its appication to a horn oudspeaer. tephen Kirup East Lancashire Institute, Due treet, Bacburn,

More information

Stochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997

Stochastic Automata Networks (SAN) - Modelling. and Evaluation. Paulo Fernandes 1. Brigitte Plateau 2. May 29, 1997 Stochastic utomata etworks (S) - Modeing and Evauation Pauo Fernandes rigitte Pateau 2 May 29, 997 Institut ationa Poytechnique de Grenobe { IPG Ecoe ationae Superieure d'informatique et de Mathematiques

More information

Tracking Control of Multiple Mobile Robots

Tracking Control of Multiple Mobile Robots Proceedings of the 2001 IEEE Internationa Conference on Robotics & Automation Seou, Korea May 21-26, 2001 Tracking Contro of Mutipe Mobie Robots A Case Study of Inter-Robot Coision-Free Probem Jurachart

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Radar/ESM Tracking of Constant Velocity Target : Comparison of Batch (MLE) and EKF Performance

Radar/ESM Tracking of Constant Velocity Target : Comparison of Batch (MLE) and EKF Performance adar/ racing of Constant Veocity arget : Comparison of Batch (LE) and EKF Performance I. Leibowicz homson-csf Deteis/IISA La cef de Saint-Pierre 1 Bd Jean ouin 7885 Eancourt Cede France Isabee.Leibowicz

More information

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING

8 APPENDIX. E[m M] = (n S )(1 exp( exp(s min + c M))) (19) E[m M] n exp(s min + c M) (20) 8.1 EMPIRICAL EVALUATION OF SAMPLING 8 APPENDIX 8.1 EMPIRICAL EVALUATION OF SAMPLING We wish to evauate the empirica accuracy of our samping technique on concrete exampes. We do this in two ways. First, we can sort the eements by probabiity

More information

VI.G Exact free energy of the Square Lattice Ising model

VI.G Exact free energy of the Square Lattice Ising model VI.G Exact free energy of the Square Lattice Ising mode As indicated in eq.(vi.35), the Ising partition function is reated to a sum S, over coections of paths on the attice. The aowed graphs for a square

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

A Sparse Covariance Function for Exact Gaussian Process Inference in Large Datasets

A Sparse Covariance Function for Exact Gaussian Process Inference in Large Datasets A Covariance Function for Exact Gaussian Process Inference in Large Datasets Arman ekumyan Austraian Centre for Fied Robotics The University of Sydney NSW 26, Austraia a.mekumyan@acfr.usyd.edu.au Fabio

More information

Mode in Output Participation Factors for Linear Systems

Mode in Output Participation Factors for Linear Systems 2010 American ontro onference Marriott Waterfront, Batimore, MD, USA June 30-Juy 02, 2010 WeB05.5 Mode in Output Participation Factors for Linear Systems Li Sheng, yad H. Abed, Munther A. Hassouneh, Huizhong

More information

A Bayesian Framework for Learning Rule Sets for Interpretable Classification

A Bayesian Framework for Learning Rule Sets for Interpretable Classification Journa of Machine Learning Research 18 (2017) 1-37 Submitted 1/16; Revised 2/17; Pubished 8/17 A Bayesian Framework for Learning Rue Sets for Interpretabe Cassification Tong Wang Cynthia Rudin Finae Doshi-Veez

More information